The old system with vendor/ and git submodules worked great,
unfortunately FUD around git submodules seemed to scare people away and
golang moved to a go.mod system that adds a new lock file format instead
of using the built-in git version. It's now almost impossible to use
modern golang without this, so we've switched.
So much for the golang compatibility promise-- turns out it doesn't
apply to the useful parts that I actually care about like this.
Thanks to frebib for his incredibly valuable contributions to this
patch. This snide commit message is mine alone.
This patch also mixes in some changes due to legacy golang as we've also
bumped the minimum version to 1.16 in the docs and tests.
Lastly, we had to disable some tests and fix up a few other misc things
to get this passing. We've definitely hot bugs in the go.mod system, and
our Makefile tries to workaround those.
This moves to the newest etcd release, and also updates the imports to
the new go.etcd.io path. I think this is a bit of a pain, but might as
well get it done.
This adds the first reversible resource (file) and the necessary engine
API hooks to make it all work. This allows a special "reversed" resource
to be added to the subsequent graph in the stream when an earlier
version "disappears". This disappearance can happen if it was previously
in an if statement that then becomes false.
It might be wise to combine the use of this meta parameter with the use
of the `realize` meta parameter to ensure that your reversed resource
actually runs at least once, if there's a chance that it might be gone
for a while.
This patch also adds a new test harness for testing resources. It
doesn't test the "live" aspect of resources, as it doesn't run Watch,
but it was designed to ensure CheckApply works as intended, and it runs
very quickly with a simplified timeline of happenings.
If running mgmt from a systemd unit, this enables the
STATE_DIRECTORY environment variable to be used for creating the
cache directory defined by StateDirectory= in the unit file. It
also enables the XDG_CACHE_HOME environment variable to be used.
If the user isn't root and the environment variable isn't set,
it will use the default XDG_CACHE_HOME directory.
This is a giant cleanup of the etcd code. The earlier version was
written when I was less experienced with golang.
This is still not perfect, and does contain some races, but at least
it's a decent base to start from. The automatic elastic clustering
should be considered an experimental feature. If you need a more
battle-tested cluster, then you should manage etcd manually and point
mgmt at your existing cluster.
The engine core had some unfortunate bugs that were the result of some
early design errors when I wasn't as familiar with channels. I've
finally rewritten most of the bad parts, and I think it's much more
logical and stable now.
This also simplifies the resource API, since more of the work is done
completely in the engine, and hidden from view.
Lastly, this adds a few new metaparameters and associated code.
There are still some open problems left to solve, but hopefully this
brings us one step closer.
This enables imports in mcl code, and is one of last remaining blockers
to using mgmt. Now we can start writing standalone modules, and adding
standard library functions as needed. There's still lots to do, but this
was a big missing piece. It was much harder to get right than I had
expected, but I think it's solid!
This unfortunately large commit is the result of some wild hacking I've
been doing for the past little while. It's the result of a rebase that
broke many "wip" commits that tracked my private progress, into
something that's not gratuitously messy for our git logs. Since this was
a learning and discovery process for me, I've "erased" the confusing git
history that wouldn't have helped. I'm happy to discuss the dead-ends,
and a small portion of that code was even left in for possible future
use.
This patch includes:
* A change to the cli interface:
You now specify the front-end explicitly, instead of leaving it up to
the front-end to decide when to "activate". For example, instead of:
mgmt run --lang code.mcl
we now do:
mgmt run lang --lang code.mcl
We might rename the --lang flag in the future to avoid the awkward word
repetition. Suggestions welcome, but I'm considering "input". One
side-effect of this change, is that flags which are "engine" specific
now must be specified with "run" before the front-end name. Eg:
mgmt run --tmp-prefix lang --lang code.mcl
instead of putting --tmp-prefix at the end. We also changed the GAPI
slightly, but I've patched all code that used it. This also makes things
consistent with the "deploy" command.
* The deploys are more robust and let you deploy after a run
This has been vastly improved and let's mgmt really run as a smart
engine that can handle different workloads. If you don't want to deploy
when you've started with `run` or if one comes in, you can use the
--no-watch-deploy option to block new deploys.
* The import statement exists and works!
We now have a working `import` statement. Read the docs, and try it out.
I think it's quite elegant how it fits in with `SetScope`. Have a look.
As a result, we now have some built-in functions available in modules.
This also adds the metadata.yaml entry-point for all modules. Have a
look at the examples or the tests. The bulk of the patch is to support
this.
* Improved lang input parsing code:
I re-wrote the parsing that determined what ran when we passed different
things to --lang. Deciding between running an mcl file or raw code is
now handled in a more intelligent, and re-usable way. See the inputs.go
file if you want to have a look. One casualty is that you can't stream
code from stdin *directly* to the front-end, it's encapsulated into a
deploy first. You can still use stdin though! I doubt anyone will notice
this change.
* The scope was extended to include functions and classes:
Go forth and import lovely code. All these exist in scopes now, and can
be re-used!
* Function calls actually use the scope now. Glad I got this sorted out.
* There is import cycle detection for modules!
Yes, this is another dag. I think that's #4. I guess they're useful.
* A ton of tests and new test infra was added!
This should make it much easier to add new tests that run mcl code. Have
a look at TestAstFunc1 to see how to add more of these.
As usual, I'll try to keep these commits smaller in the future!
This giant patch makes some much needed improvements to the code base.
* The engine has been rewritten and lives within engine/graph/
* All of the common interfaces and code now live in engine/
* All of the resources are in one package called engine/resources/
* The Res API can use different "traits" from engine/traits/
* The Res API has been simplified to hide many of the old internals
* The Watch & Process loops were previously inverted, but is now fixed
* The likelihood of package cycles has been reduced drastically
* And much, much more...
Unfortunately, some code had to be temporarily removed. The remote code
had to be taken out, as did the prometheus code. We hope to have these
back in new forms as soon as possible.
This adds an initial implementation of an integration test framework for
writing more complicated tests. In particular this also makes some small
additions to the mgmt core so that testing is easier.
I have an improved design for remote execution as a resource. Since I
need to get rid of some technical debt to clean up the resource API, and
this main loop, a good first step is to remote it's invocation. It will
be coming back as a resource as soon as possible!
This is an initial implementation of the mgmt language. It is a
declarative (immutable) functional, reactive, domain specific
programming language. It is intended to be a language that is:
* safe
* powerful
* easy to reason about
With these properties, we hope this language, and the mgmt engine will
allow you to model the real-time systems that you'd like to automate.
This also includes a number of other associated changes. Sorry for the
large size of this patch.
This patch adds the option to specify URLs to advertise for clients and peers.
This will facilitate etcd communication through nat, where we want to listen
on a local IP, but expose a public IP to clients/peers.
Previously, there was an extremely rare race where we would startup,
kick off the Run method in a goroutine, and then run Exit before Run got
very far in its execution. If Run ran some early sections of its code
_after_ we had Exited, we would trigger a panic due to the converger UID
being unregistered.
This patch blocks Exit from progressing until Run has started and
finished running. It also adds a Ready method so that you can monitor
this signal yourself if you'd like to add the necessary wait to your
code.
Graph changes from autogrouped -> not autogrouped or vice versa cause a
panic (or I assume a leak) because we compared the auto grouped graph to
the ungrouped one, which would cause an Exit on an unstarted Vertex.
This includes a test that seems to reliably reproduces the issue.
I think there was a rare race where we would make use of the etcd server
before it had fully started up. I only ever saw this occur on travis,
and with this fix hopefully we'll never see it again.
It is worth mentioning that much of my etcd code and the lib Run()
function could use a solid cleaning.
I previously broke the pkg auto edges because the package list wasn't
available by the time it was called. This fixes the pkg resource so that
it gets the necessary list of packages when needed. Since this means
that a possible failure could happen, we also update the AutoEdges API
to support errors. Errors can only be generated at AutoEdge struct
creation, once the struct has been returned (right before modification
of the graph structure) there is no possibility to return any errors.
It's important to remember that the AutoEdges stuff gets called because
the Init of each resource, so make sure it doesn't depend on anything
that happens there or that gets cached as a result of Init.
This is all much nicer now and has a test too :)
This allows the implementer of the GAPI to specify three parameters for
every Next message sent on the channel. The Fast parameter tells the
agent if it should do the pause quickly or if it should finish the
sequence. A quick pause means that it will cause a pause immediately
after the currently running resources finish, where as a slow (default)
pause will allow the wave of execution to finish. This is usually
preferred in scenarios where complex graphs are used where we want each
step to complete. The Exit parameter tells the engine to exit, and the
Err parameter tells the engine that an error occurred.
This puts the generation of the initial event into the Next method of
the GAPI. If it does not happen, then we will never get a graph. This is
important because this notifies the GAPI when we're actually ready to
try and generate a graph, rather than blocking on the Graph method if we
have a long compile for example.
This is also required for the etcd watch cleanup.
This cleans up the API to not have a special case for etcd anymore. In
particular, this also adds the requirement that the GAPI must generate
an event on startup as soon as it is ready to generate a graph.
This causes a graph to actually stop processing part way through, even
if there are poke's that want to continue on. This is so that the user
experience of pressing ^C actually causes a shutdown without finishing
the graph execution. It might be preferred to have this be a user
defined setting at some point in the future, such as if the user presses
^C twice. As well, we might want to implement an interrupt API so that
individual resource execution can be asked to bail out early if
requested. This could happen on a third ^C press.
This was necessary to fix some "import cycle" errors I was having when
adding the World api to the resource Data struct.
I think this is a good hint that I need to start splitting up existing
packages into sub files, and cleaning up and inter-package problems too.
This adds a P/V style semaphore mechanism to the resource graph. This
enables the user to specify a number of "id:count" tags associated with
each resource which will reduce the parallelism of the CheckApply
operation to that maximum count.
This is particularly interesting because (assuming I'm not mistaken) the
implementation is dead-lock free assuming that no individual resource
permanently ever blocks during execution! I don't have a formal proof of
this, but I was able to convince myself on paper that it was the case.
An actual proof that N P/V counting semaphores in a DAG won't ever
dead-lock would be particularly welcome! Hint: the trick is to acquire
them in alphabetical order while respecting the DAG flow. Disclaimer,
this assumes that the lock count is always > 0 of course.