This is an initial implementation of the mgmt language. It is a
declarative (immutable) functional, reactive, domain specific
programming language. It is intended to be a language that is:
* safe
* powerful
* easy to reason about
With these properties, we hope this language, and the mgmt engine will
allow you to model the real-time systems that you'd like to automate.
This also includes a number of other associated changes. Sorry for the
large size of this patch.
I previously broke the pkg auto edges because the package list wasn't
available by the time it was called. This fixes the pkg resource so that
it gets the necessary list of packages when needed. Since this means
that a possible failure could happen, we also update the AutoEdges API
to support errors. Errors can only be generated at AutoEdge struct
creation, once the struct has been returned (right before modification
of the graph structure) there is no possibility to return any errors.
It's important to remember that the AutoEdges stuff gets called because
the Init of each resource, so make sure it doesn't depend on anything
that happens there or that gets cached as a result of Init.
This is all much nicer now and has a test too :)
Now that we're using our meta wrapper graph struct instead of the
pgraph, we can re-implement our SetValue hacks in terms of struct fields
and the implementation is now cleaner.
It's up to the end user to decide who is writing and/or overwriting
them.
It could also be useful to reimplement (refactor) some of the existing
World API's to be implemented in terms of these primitives.
This is required if we're going to have out of package resources. In
particular for third party packages, and also for if we decide to split
out each resource into a separate sub package.
This cleans up the API to not have a special case for etcd anymore. In
particular, this also adds the requirement that the GAPI must generate
an event on startup as soon as it is ready to generate a graph.
Avoid use of the reflect package, and use an extensible list of registred
resource kinds. This also has the benefit of removing the empty VirtRes and
AugeasRes struct types when compiling without libvirt and libaugeas.
This prevents some nasty races where a BackPoke could arrive on a paused
vertex either during a resume or pause operation. Previously we might
also have poked an excessive number of resources on resume.
The solution was to discard BackPokes during pause or resume. On pause,
they can be discarded because we've asked the graph to quiesce, and any
further work can be done on resume, and on resume we ignore them because
this should only happen during the unrolling (reverse topological resume
of the graph) and at the end of this the indegree == 0 vertices will
initiate a series of pokes which should deal with any BackPoke that was
possibly discarded.
One other aspect of this which is important: if an indegree == 0 vertex
is poked (Process runs) but it's already in the correct state, it should
still transmit the Poke through itself so that subsequent vertices know
to run. Currently this is done correctly in Process().
I'm a bit ashamed that this wasn't done properly in the engine earlier,
but I suppose that's what comes out of running fancier graphs and really
thinking in detail about what's truly correct. Hopefully I got it right
this time!
This prevents a nasty race that can happen in a graph with more than one
resource. If a resource has someone that it can BackPoke, and then
suppose an event comes in. It runs the obj.Event() method (from inside
its Watch loop) and then *before* the resulting Process method can run
it receives a pause event and pauses. Then the parent resource pauses as
well. Finally (it's a race) the Process gets around to running, and
decides it needs to BackPoke. At this point since the parent resource is
paused, it receives the BackPoke at a time when it can't handle
receiving one, and it panics!
As a result, we now track the number of running Process possibilities
via a WaitGroup which gets incremented from the obj.Event() and we don't
finish our pause or exit operations until it has quiesced and our
WaitGroup lets us know via Wait(). Lastly in order to prevent repeated
replays, we detect when we're quiescing and suspend replaying until post
pause. We don't need to save the replay (playback variable) explicitly
because its state remains during pause, and on exit it would get
re-checked anyways.
This was necessary to fix some "import cycle" errors I was having when
adding the World api to the resource Data struct.
I think this is a good hint that I need to start splitting up existing
packages into sub files, and cleaning up and inter-package problems too.
If two resources are grouped, then the result should contain the
semaphores of both resources. This is because the user is expecting
(independently) resource A and resource B to have a limiting choke
point. If when combined those choke points aren't preserved, then we
have broken an important promise to the user.
This adds a P/V style semaphore mechanism to the resource graph. This
enables the user to specify a number of "id:count" tags associated with
each resource which will reduce the parallelism of the CheckApply
operation to that maximum count.
This is particularly interesting because (assuming I'm not mistaken) the
implementation is dead-lock free assuming that no individual resource
permanently ever blocks during execution! I don't have a formal proof of
this, but I was able to convince myself on paper that it was the case.
An actual proof that N P/V counting semaphores in a DAG won't ever
dead-lock would be particularly welcome! Hint: the trick is to acquire
them in alphabetical order while respecting the DAG flow. Disclaimer,
this assumes that the lock count is always > 0 of course.
This cleans up some of the resource events and also reorganizes the
struct for simplicity. This should hopefully kill off at least one race
which would cause unnecessary blocking!
Yes this patch is a bit yucky, but so was the bug I was fighting with!
I'm still working on reducing the size of the monster patches that I
land, but I'm exercising the priviledge as the initial author. In any
case, this refactors worker into two, and cleans up the passing around
of the processChan. This puts common code into Init and Close.
When creating new resources, we didn't specify the defaults, which for
the limit metaparam caused invalid resources by default. It would be
nice to change the limit param to have the 1/X (reciprocal) as the
default, although the problem with that is that (1) it is illogical, and
(2) it's not clear if the precision for the common cases is enough.
If someone wants to investigate this further, please do! Zero value
structs are definitely more useful! In any case, we can now specify the
default. It's not entirely obvious to me if this is the best way to do
it, or if there is a superior method.
The mgmt graph depends on state tracking to eliminate redundant pokes.
With the Watch loop now able to produce events quickly, it should no
longer play a part in determining the vertex state. This simplifies the
resource API as well!
The default UnmarshalYAML on *BaseRes doesn't work properly at the
moment, so hack in a default so that we don't need to specify one if the
MetaParams struct isn't specified. The problem is that if there isn't a
meta value added, its UnmarshalYAML doesn't get a chance to run.
This adds rate limiting with the limit and burst meta parameters. The
limits apply to how often the Process check is called. As a result, it
might get called more often than there are Watch events due to possible
Poke/BackPoke events.
This system might need to get rethought in the future depending on its
usefulness.