Commit Graph

106 Commits

Author SHA1 Message Date
Wouter Dullaert
d65c85c19f cli: Removed obsolete no-watch-config flag
Having it around creates the expectation that by default mgmt will put a watch
on the config.
2019-04-22 13:42:27 +02:00
James Shubin
eba45e6207 lib, gapi: Display deploy ID to add some clarity
This should make it easier to understand exactly when a new deploy
starts.
2019-04-16 18:11:32 -04:00
James Shubin
a5842a41b2 etcd: Rewrite embed etcd implementation
This is a giant cleanup of the etcd code. The earlier version was
written when I was less experienced with golang.

This is still not perfect, and does contain some races, but at least
it's a decent base to start from. The automatic elastic clustering
should be considered an experimental feature. If you need a more
battle-tested cluster, then you should manage etcd manually and point
mgmt at your existing cluster.
2019-04-11 21:43:48 -04:00
James Shubin
07f542b4d7 legal: Happy 2019 everyone...
Done with:

ack '2018+' -l | xargs sed -i -e 's/2018+/2019+/g'

Checked manually with:

git add -p

Hello to future James from 2020, and Happy Hacking!
2019-03-24 15:08:50 -04:00
James Shubin
753d1104ef util: Port all multierr code to new errwrap package
This cleans things up and simplifies a lot of the code. Also it's easier
to just import one error package when needed.
2019-03-12 16:51:37 -04:00
James Shubin
880652f5d4 util: Port all code to new errwrap package
This should keep things more uniform.
2019-03-12 16:49:01 -04:00
James Shubin
253ed78cc6 engine: Rewrite the core algorithm
The engine core had some unfortunate bugs that were the result of some
early design errors when I wasn't as familiar with channels. I've
finally rewritten most of the bad parts, and I think it's much more
logical and stable now.

This also simplifies the resource API, since more of the work is done
completely in the engine, and hidden from view.

Lastly, this adds a few new metaparameters and associated code.

There are still some open problems left to solve, but hopefully this
brings us one step closer.
2019-02-24 12:28:59 -05:00
James Shubin
4860d833c7 converger: Rewrite the converger module
I found a deadlock in the converger code, and I realized the code was
sufficiently bad that it needed a good clean up.
2019-02-24 12:28:59 -05:00
Johan Bloemberg
f7a06c1da9 etcd: Connection options (socket file, ipv6)
- Allow unix domain socket to be used as client url
- Using ::1 as clienturl should not create default local ipv4 listener
- Add shell tests
2019-02-13 18:55:20 +01:00
James Shubin
96dccca475 lang: Add module imports and more
This enables imports in mcl code, and is one of last remaining blockers
to using mgmt. Now we can start writing standalone modules, and adding
standard library functions as needed. There's still lots to do, but this
was a big missing piece. It was much harder to get right than I had
expected, but I think it's solid!

This unfortunately large commit is the result of some wild hacking I've
been doing for the past little while. It's the result of a rebase that
broke many "wip" commits that tracked my private progress, into
something that's not gratuitously messy for our git logs. Since this was
a learning and discovery process for me, I've "erased" the confusing git
history that wouldn't have helped. I'm happy to discuss the dead-ends,
and a small portion of that code was even left in for possible future
use.

This patch includes:

* A change to the cli interface:
You now specify the front-end explicitly, instead of leaving it up to
the front-end to decide when to "activate". For example, instead of:

mgmt run --lang code.mcl

we now do:

mgmt run lang --lang code.mcl

We might rename the --lang flag in the future to avoid the awkward word
repetition. Suggestions welcome, but I'm considering "input". One
side-effect of this change, is that flags which are "engine" specific
now must be specified with "run" before the front-end name. Eg:

mgmt run --tmp-prefix lang --lang code.mcl

instead of putting --tmp-prefix at the end. We also changed the GAPI
slightly, but I've patched all code that used it. This also makes things
consistent with the "deploy" command.

* The deploys are more robust and let you deploy after a run
This has been vastly improved and let's mgmt really run as a smart
engine that can handle different workloads. If you don't want to deploy
when you've started with `run` or if one comes in, you can use the
--no-watch-deploy option to block new deploys.

* The import statement exists and works!
We now have a working `import` statement. Read the docs, and try it out.
I think it's quite elegant how it fits in with `SetScope`. Have a look.
As a result, we now have some built-in functions available in modules.
This also adds the metadata.yaml entry-point for all modules. Have a
look at the examples or the tests. The bulk of the patch is to support
this.

* Improved lang input parsing code:
I re-wrote the parsing that determined what ran when we passed different
things to --lang. Deciding between running an mcl file or raw code is
now handled in a more intelligent, and re-usable way. See the inputs.go
file if you want to have a look. One casualty is that you can't stream
code from stdin *directly* to the front-end, it's encapsulated into a
deploy first. You can still use stdin though! I doubt anyone will notice
this change.

* The scope was extended to include functions and classes:
Go forth and import lovely code. All these exist in scopes now, and can
be re-used!

* Function calls actually use the scope now. Glad I got this sorted out.

* There is import cycle detection for modules!
Yes, this is another dag. I think that's #4. I guess they're useful.

* A ton of tests and new test infra was added!
This should make it much easier to add new tests that run mcl code. Have
a look at TestAstFunc1 to see how to add more of these.

As usual, I'll try to keep these commits smaller in the future!
2018-12-21 06:22:12 -05:00
James Shubin
cd7711bdfe gapi: Add a prefix variable in case we want to namespace on disk
This could get passed through to use as a module download path.
2018-12-20 21:21:30 -05:00
James Shubin
9969286224 engine: Resources package rewrite
This giant patch makes some much needed improvements to the code base.

* The engine has been rewritten and lives within engine/graph/
* All of the common interfaces and code now live in engine/
* All of the resources are in one package called engine/resources/
* The Res API can use different "traits" from engine/traits/
* The Res API has been simplified to hide many of the old internals
* The Watch & Process loops were previously inverted, but is now fixed
* The likelihood of package cycles has been reduced drastically
* And much, much more...

Unfortunately, some code had to be temporarily removed. The remote code
had to be taken out, as did the prometheus code. We hope to have these
back in new forms as soon as possible.
2018-04-19 01:10:58 -04:00
James Shubin
f3b99b3940 test, integration: Add an integration test framework
This adds an initial implementation of an integration test framework for
writing more complicated tests. In particular this also makes some small
additions to the mgmt core so that testing is easier.
2018-03-13 06:38:21 -04:00
James Shubin
ea52eb78d9 lib: Remove remote execution from core
I have an improved design for remote execution as a resource. Since I
need to get rid of some technical debt to clean up the resource API, and
this main loop, a good first step is to remote it's invocation. It will
be coming back as a resource as soon as possible!
2018-03-09 17:07:58 -05:00
Johan Bloemberg
ffcc2aa2af lib: Provide detailed feedback about invalid URLs 2018-02-20 10:29:19 -05:00
James Shubin
b19583e7d3 lang: Initial implementation of the mgmt language
This is an initial implementation of the mgmt language. It is a
declarative (immutable) functional, reactive, domain specific
programming language. It is intended to be a language that is:

* safe
* powerful
* easy to reason about

With these properties, we hope this language, and the mgmt engine will
allow you to model the real-time systems that you'd like to automate.

This also includes a number of other associated changes. Sorry for the
large size of this patch.
2018-01-20 08:09:29 -05:00
James Shubin
12fce52cd7 legal: Happy 2018 everyone...
Done with:

ack '2017+' -l | xargs sed -i -e 's/2017+/2018+/g'

Checked manually with:

git add -p

Hello to future James from 2019, and Happy Hacking!
2018-01-03 21:22:07 -05:00
Julien Pivotto
fdce9d6a6a prometheus: Initialize mgmt_checkapply_total metrics
It is recommended by Prometheus to initialize metrics:

https://prometheus.io/docs/practices/instrumentation/#avoid-missing-metrics

This commits initialize the mgmt_checkapply_total metric
for each registered resource.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2017-11-23 15:23:41 +01:00
Jonathan Gold
0f70c31a30 etcd: Add advertise urls to cli
This patch adds the option to specify URLs to advertise for clients and peers.
This will facilitate etcd communication through nat, where we want to listen
on a local IP, but expose a public IP to clients/peers.
2017-10-28 22:42:27 -04:00
James Shubin
46be83f8f7 legal: Re-license to GPLv3 2017-09-11 18:07:47 -04:00
James Shubin
6b489f71a1 remote: Add a Ready method to know when startup is finished
Previously, there was an extremely rare race where we would startup,
kick off the Run method in a goroutine, and then run Exit before Run got
very far in its execution. If Run ran some early sections of its code
_after_ we had Exited, we would trigger a panic due to the converger UID
being unregistered.

This patch blocks Exit from progressing until Run has started and
finished running. It also adds a Ready method so that you can monitor
this signal yourself if you'd like to add the necessary wait to your
code.
2017-06-08 03:55:03 -04:00
James Shubin
9f5057eac7 resources: Do not panic on autogrouped graph switches
Graph changes from autogrouped -> not autogrouped or vice versa cause a
panic (or I assume a leak) because we compared the auto grouped graph to
the ungrouped one, which would cause an Exit on an unstarted Vertex.
This includes a test that seems to reliably reproduces the issue.
2017-06-08 01:05:58 -04:00
James Shubin
4f420dde05 etcd: Wait for server to start before continuing
I think there was a rare race where we would make use of the etcd server
before it had fully started up. I only ever saw this occur on travis,
and with this fix hopefully we'll never see it again.

It is worth mentioning that much of my etcd code and the lib Run()
function could use a solid cleaning.
2017-06-03 01:00:35 -04:00
James Shubin
4d9d0d4548 resources: Improve AutoEdge API and pkg breakage
I previously broke the pkg auto edges because the package list wasn't
available by the time it was called. This fixes the pkg resource so that
it gets the necessary list of packages when needed. Since this means
that a possible failure could happen, we also update the AutoEdges API
to support errors. Errors can only be generated at AutoEdge struct
creation, once the struct has been returned (right before modification
of the graph structure) there is no possibility to return any errors.

It's important to remember that the AutoEdges stuff gets called because
the Init of each resource, so make sure it doesn't depend on anything
that happens there or that gets cached as a result of Init.

This is all much nicer now and has a test too :)
2017-06-02 22:15:28 -04:00
James Shubin
9cbaa892d3 gapi: Allow the GAPI implementer to specify fast and exit
This allows the implementer of the GAPI to specify three parameters for
every Next message sent on the channel. The Fast parameter tells the
agent if it should do the pause quickly or if it should finish the
sequence. A quick pause means that it will cause a pause immediately
after the currently running resources finish, where as a slow (default)
pause will allow the wave of execution to finish. This is usually
preferred in scenarios where complex graphs are used where we want each
step to complete. The Exit parameter tells the engine to exit, and the
Err parameter tells the engine that an error occurred.
2017-06-02 04:03:10 -04:00
James Shubin
c35916fad1 resources: Rename the Data struct to ResData to avoid ambiguity
There's a similarly named gapi.Data struct which we could also rename.
2017-06-02 02:53:53 -04:00
James Shubin
14c2fd1edd resources: Add proper edge compare method
Might as well do this cleanly in one place.
2017-05-31 17:27:34 -04:00
James Shubin
4150ae7307 pgraph: Replace edge struct with interface
This further cleans up the pgraph lib to be more generic.
2017-05-31 15:36:15 -04:00
James Shubin
a87288d519 pgraph, resources: Major refactoring continued
There was simply some technical debt I needed to kill off. Sorry for not
splitting this up into more patches.
2017-05-31 15:36:14 -04:00
James Shubin
3cf9639e99 pgraph, resources: Major refactor to remove pgraph to resource dep
This is the mechanical port of the remaining bits. Next to clean it up a
bit.
2017-05-29 15:43:50 -04:00
James Shubin
11c3a26c23 pgraph: Move the AutoEdges mechanism into the resource package
Remove the pgraph->resource dependency.
2017-05-29 15:43:50 -04:00
James Shubin
1c59712cbf pgraph: Move AssociateData function out of the package
This removes another dependency on the resource package.
2017-05-15 10:19:46 -04:00
James Shubin
c2cb1c9168 pgraph: Move GraphMetas function out of package
This removes a dependency on the resources package which wasn't
necessary.
2017-05-15 10:06:31 -04:00
James Shubin
70e7ee2d46 pgraph: Remove use of Flags struct in favour of Value API
One small step to completely cleaning up the pgraph package so that we
can eventually fix the code that would otherwise create a cycle!
2017-05-13 13:28:41 -04:00
James Shubin
a4858be967 lib, gapi: Next method of GAPI should generate first event
This puts the generation of the initial event into the Next method of
the GAPI. If it does not happen, then we will never get a graph. This is
important because this notifies the GAPI when we're actually ready to
try and generate a graph, rather than blocking on the Graph method if we
have a long compile for example.

This is also required for the etcd watch cleanup.
2017-04-10 03:20:58 -04:00
James Shubin
6fd5623b1f gapi: Move separate etcd Watch method into GAPI
This cleans up the API to not have a special case for etcd anymore. In
particular, this also adds the requirement that the GAPI must generate
an event on startup as soon as it is ready to generate a graph.
2017-04-10 03:20:58 -04:00
James Shubin
3e001f9a1c main: Update log messages for consistency 2017-03-16 13:14:50 -04:00
James Shubin
cd5e2e1148 pgraph: Add fast pausing and exiting of graphs
This causes a graph to actually stop processing part way through, even
if there are poke's that want to continue on. This is so that the user
experience of pressing ^C actually causes a shutdown without finishing
the graph execution. It might be preferred to have this be a user
defined setting at some point in the future, such as if the user presses
^C twice. As well, we might want to implement an interrupt API so that
individual resource execution can be asked to bail out early if
requested. This could happen on a third ^C press.
2017-03-13 07:54:03 -04:00
James Shubin
a0686b7d2b pgraph: graphviz: Update Graphviz lib to quote names properly
This also moves the library to after the graph starts so that the kind
fields will be visible.
2017-03-08 19:23:33 -05:00
James Shubin
44771a0049 gapi: Move the World interface into resources
This was necessary to fix some "import cycle" errors I was having when
adding the World api to the resource Data struct.

I think this is a good hint that I need to start splitting up existing
packages into sub files, and cleaning up and inter-package problems too.
2017-03-08 19:23:33 -05:00
James Shubin
32aae8f57a lib, pgraph, resources: Refactor data association API
This should make things cleaner and help avoid as much churn every time
we change a property.
2017-03-07 22:51:11 -05:00
James Shubin
8207e23cd9 lib: Refactor instantiation of world API 2017-03-07 22:51:11 -05:00
James Shubin
d8e19cd79a semaphore: Create a semaphore metaparam
This adds a P/V style semaphore mechanism to the resource graph. This
enables the user to specify a number of "id:count" tags associated with
each resource which will reduce the parallelism of the CheckApply
operation to that maximum count.

This is particularly interesting because (assuming I'm not mistaken) the
implementation is dead-lock free assuming that no individual resource
permanently ever blocks during execution! I don't have a formal proof of
this, but I was able to convince myself on paper that it was the case.

An actual proof that N P/V counting semaphores in a DAG won't ever
dead-lock would be particularly welcome! Hint: the trick is to acquire
them in alphabetical order while respecting the DAG flow. Disclaimer,
this assumes that the lock count is always > 0 of course.
2017-02-27 02:57:06 -05:00
Julien Pivotto
7d92ab335a prometheus: Add mgmt_pgraph_start_time_seconds metric
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2017-02-26 15:28:43 +01:00
James Shubin
98bc96c911 golint: Fixup issues found in the report
This also increases the max allowed to 5% -- I'm happy to make this
lower if someone asks.
2017-02-22 22:18:55 -05:00
James Shubin
e070a85ae0 lib: Misc cleanups and new log message 2017-02-22 17:45:16 -05:00
James Shubin
2da21f90f4 pgraph, resources: Improve Init/Close and Worker status
This should do some rough cleanups around the Init/Close of resources,
and tracking of Worker function status.
2017-02-21 18:42:07 -05:00
James Shubin
a981cfa053 legal: Oh yeah, it is 2017 2017-02-16 01:34:32 -05:00
James Shubin
35d3328e3e etcd: Remove stuttering in package
This is a good first step to cleaning up the package.
2017-02-12 22:51:46 -05:00
Julien Pivotto
e8855f7621 prometheus: Implement mgmt_checkapply_total metric
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2017-02-12 23:45:47 +01:00