This patch adds the option to specify URLs to advertise for clients and peers.
This will facilitate etcd communication through nat, where we want to listen
on a local IP, but expose a public IP to clients/peers.
Previously, there was an extremely rare race where we would startup,
kick off the Run method in a goroutine, and then run Exit before Run got
very far in its execution. If Run ran some early sections of its code
_after_ we had Exited, we would trigger a panic due to the converger UID
being unregistered.
This patch blocks Exit from progressing until Run has started and
finished running. It also adds a Ready method so that you can monitor
this signal yourself if you'd like to add the necessary wait to your
code.
Graph changes from autogrouped -> not autogrouped or vice versa cause a
panic (or I assume a leak) because we compared the auto grouped graph to
the ungrouped one, which would cause an Exit on an unstarted Vertex.
This includes a test that seems to reliably reproduces the issue.
I think there was a rare race where we would make use of the etcd server
before it had fully started up. I only ever saw this occur on travis,
and with this fix hopefully we'll never see it again.
It is worth mentioning that much of my etcd code and the lib Run()
function could use a solid cleaning.
I previously broke the pkg auto edges because the package list wasn't
available by the time it was called. This fixes the pkg resource so that
it gets the necessary list of packages when needed. Since this means
that a possible failure could happen, we also update the AutoEdges API
to support errors. Errors can only be generated at AutoEdge struct
creation, once the struct has been returned (right before modification
of the graph structure) there is no possibility to return any errors.
It's important to remember that the AutoEdges stuff gets called because
the Init of each resource, so make sure it doesn't depend on anything
that happens there or that gets cached as a result of Init.
This is all much nicer now and has a test too :)
This allows the implementer of the GAPI to specify three parameters for
every Next message sent on the channel. The Fast parameter tells the
agent if it should do the pause quickly or if it should finish the
sequence. A quick pause means that it will cause a pause immediately
after the currently running resources finish, where as a slow (default)
pause will allow the wave of execution to finish. This is usually
preferred in scenarios where complex graphs are used where we want each
step to complete. The Exit parameter tells the engine to exit, and the
Err parameter tells the engine that an error occurred.
This puts the generation of the initial event into the Next method of
the GAPI. If it does not happen, then we will never get a graph. This is
important because this notifies the GAPI when we're actually ready to
try and generate a graph, rather than blocking on the Graph method if we
have a long compile for example.
This is also required for the etcd watch cleanup.
This cleans up the API to not have a special case for etcd anymore. In
particular, this also adds the requirement that the GAPI must generate
an event on startup as soon as it is ready to generate a graph.
This causes a graph to actually stop processing part way through, even
if there are poke's that want to continue on. This is so that the user
experience of pressing ^C actually causes a shutdown without finishing
the graph execution. It might be preferred to have this be a user
defined setting at some point in the future, such as if the user presses
^C twice. As well, we might want to implement an interrupt API so that
individual resource execution can be asked to bail out early if
requested. This could happen on a third ^C press.
This was necessary to fix some "import cycle" errors I was having when
adding the World api to the resource Data struct.
I think this is a good hint that I need to start splitting up existing
packages into sub files, and cleaning up and inter-package problems too.
This adds a P/V style semaphore mechanism to the resource graph. This
enables the user to specify a number of "id:count" tags associated with
each resource which will reduce the parallelism of the CheckApply
operation to that maximum count.
This is particularly interesting because (assuming I'm not mistaken) the
implementation is dead-lock free assuming that no individual resource
permanently ever blocks during execution! I don't have a formal proof of
this, but I was able to convince myself on paper that it was the case.
An actual proof that N P/V counting semaphores in a DAG won't ever
dead-lock would be particularly welcome! Hint: the trick is to acquire
them in alphabetical order while respecting the DAG flow. Disclaimer,
this assumes that the lock count is always > 0 of course.