Commit Graph

122 Commits

Author SHA1 Message Date
James Shubin
e67d97d9da pgraph: Replace CompareMatch with VertexMatchFn
This removes a reference to the resources package in pgraph.
2017-05-13 13:55:42 -04:00
James Shubin
d74c2115fd pgraph: Untangle the semaphore code from the pgraph implementation
This re-implements the semaphore code on top of the graph kv store.
2017-05-13 13:28:41 -04:00
James Shubin
70e7ee2d46 pgraph: Remove use of Flags struct in favour of Value API
One small step to completely cleaning up the pgraph package so that we
can eventually fix the code that would otherwise create a cycle!
2017-05-13 13:28:41 -04:00
James Shubin
d11854f4e8 pgraph: Clean up pgraph module to get ready for clean lib status
The graph of dependencies in golang is a DAG, and as such doesn't allow
cycles. Clean up this lib so that it eventually doesn't import our
resources module or anything else which might want to import it.

This patch makes adjacency private, and adds a generalized key store to
the graph struct.
2017-05-13 13:28:41 -04:00
James Shubin
4bb553e015 pgraph: Use the correct vertex handle to prevent a race
Small typo made that is now fixed! These need to get caught with golint!
2017-05-13 10:08:38 -04:00
James Shubin
9b9ff2622d resources: Make resource kind and baseuid fields public
This is required if we're going to have out of package resources. In
particular for third party packages, and also for if we decide to split
out each resource into a separate sub package.
2017-04-11 01:52:21 -04:00
James Shubin
028ef14cc0 misc: Replace sloppy use of %v with %s 2017-03-16 13:18:36 -04:00
Julien Pivotto
33d20ac6d8 prometheus: Add detailed metrics
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2017-03-16 14:18:46 +01:00
James Shubin
cd5e2e1148 pgraph: Add fast pausing and exiting of graphs
This causes a graph to actually stop processing part way through, even
if there are poke's that want to continue on. This is so that the user
experience of pressing ^C actually causes a shutdown without finishing
the graph execution. It might be preferred to have this be a user
defined setting at some point in the future, such as if the user presses
^C twice. As well, we might want to implement an interrupt API so that
individual resource execution can be asked to bail out early if
requested. This could happen on a third ^C press.
2017-03-13 07:54:03 -04:00
James Shubin
074da4da19 pgraph, resources: Run the resource Setup in parallel
This is a reasonable thing to do at this time.
2017-03-13 07:54:03 -04:00
James Shubin
e4e39d820c pgraph: semaphore: Refactor semaphore size function and test 2017-03-13 07:49:29 -04:00
James Shubin
e5dbb214a2 pgraph: Move the BackPoke to before the semaphores
I can't think of a reason we should grab a semaphore before backpoking.
The semaphore is intended to block around the actual work in CheckApply,
not the dependency resolution of the correct vertex.
2017-03-13 07:49:29 -04:00
James Shubin
91af528ff8 pgraph: Move the quiesce done indicator to avoid deadlock
This avoids a deadlock on resource failure when retry==0. Without this
we would never exit. This adds a test in too!
2017-03-12 13:52:35 -04:00
James Shubin
6d9be15035 pgraph: semaphore: Add lock around semaphore map
I forgot about the `concurrent map write` race, but now it's fixed. I
suppose we could probably pre-create all semaphores in the graph at once
before Start, and remove this lock, but that's an optimization for a
later day.
2017-03-11 09:06:18 -05:00
James Shubin
95a1c6e7fb pgraph, resources: Discard BackPokes during pause and resume
This prevents some nasty races where a BackPoke could arrive on a paused
vertex either during a resume or pause operation. Previously we might
also have poked an excessive number of resources on resume.

The solution was to discard BackPokes during pause or resume. On pause,
they can be discarded because we've asked the graph to quiesce, and any
further work can be done on resume, and on resume we ignore them because
this should only happen during the unrolling (reverse topological resume
of the graph) and at the end of this the indegree == 0 vertices will
initiate a series of pokes which should deal with any BackPoke that was
possibly discarded.

One other aspect of this which is important: if an indegree == 0 vertex
is poked (Process runs) but it's already in the correct state, it should
still transmit the Poke through itself so that subsequent vertices know
to run. Currently this is done correctly in Process().

I'm a bit ashamed that this wasn't done properly in the engine earlier,
but I suppose that's what comes out of running fancier graphs and really
thinking in detail about what's truly correct. Hopefully I got it right
this time!
2017-03-09 06:35:15 -05:00
James Shubin
0b1a4a0f30 pgraph, resources: Quiesce when pausing or exiting the resource
This prevents a nasty race that can happen in a graph with more than one
resource. If a resource has someone that it can BackPoke, and then
suppose an event comes in. It runs the obj.Event() method (from inside
its Watch loop) and then *before* the resulting Process method can run
it receives a pause event and pauses. Then the parent resource pauses as
well. Finally (it's a race) the Process gets around to running, and
decides it needs to BackPoke. At this point since the parent resource is
paused, it receives the BackPoke at a time when it can't handle
receiving one, and it panics!

As a result, we now track the number of running Process possibilities
via a WaitGroup which gets incremented from the obj.Event() and we don't
finish our pause or exit operations until it has quiesced and our
WaitGroup lets us know via Wait(). Lastly in order to prevent repeated
replays, we detect when we're quiescing and suspend replaying until post
pause. We don't need to save the replay (playback variable) explicitly
because its state remains during pause, and on exit it would get
re-checked anyways.
2017-03-09 02:50:55 -05:00
James Shubin
a0686b7d2b pgraph: graphviz: Update Graphviz lib to quote names properly
This also moves the library to after the graph starts so that the kind
fields will be visible.
2017-03-08 19:23:33 -05:00
James Shubin
32aae8f57a lib, pgraph, resources: Refactor data association API
This should make things cleaner and help avoid as much churn every time
we change a property.
2017-03-07 22:51:11 -05:00
James Shubin
4a62a290d8 pgraph: Clean up tests
This splits the tests into multiple files.
2017-02-28 16:47:16 -05:00
James Shubin
018399cb1f semaphore, pgraph: Add semaphore grouping and tests
If two resources are grouped, then the result should contain the
semaphores of both resources. This is because the user is expecting
(independently) resource A and resource B to have a limiting choke
point. If when combined those choke points aren't preserved, then we
have broken an important promise to the user.
2017-02-28 16:40:53 -05:00
James Shubin
d8e19cd79a semaphore: Create a semaphore metaparam
This adds a P/V style semaphore mechanism to the resource graph. This
enables the user to specify a number of "id:count" tags associated with
each resource which will reduce the parallelism of the CheckApply
operation to that maximum count.

This is particularly interesting because (assuming I'm not mistaken) the
implementation is dead-lock free assuming that no individual resource
permanently ever blocks during execution! I don't have a formal proof of
this, but I was able to convince myself on paper that it was the case.

An actual proof that N P/V counting semaphores in a DAG won't ever
dead-lock would be particularly welcome! Hint: the trick is to acquire
them in alphabetical order while respecting the DAG flow. Disclaimer,
this assumes that the lock count is always > 0 of course.
2017-02-27 02:57:06 -05:00
Julien Pivotto
7d92ab335a prometheus: Add mgmt_pgraph_start_time_seconds metric
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2017-02-26 15:28:43 +01:00
Julien Pivotto
46260749c1 prometheus: Move the prometheus nil check inside the prometheus function
That pattern will be reused in future metrics.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2017-02-26 09:33:34 +01:00
James Shubin
12160ab539 pgraph: Wait for Process routine to exit
Wait for the innerWorker's Process routine to exit, before we exit too.
2017-02-25 21:01:02 -05:00
James Shubin
2462ea0892 pgraph, resources: Wait for innerWorker to exit cleanly
Don't run the Close() method until the innerWorker has exited cleanly.
This is a guarantee which we make to the resources.
2017-02-25 21:00:38 -05:00
James Shubin
98bc96c911 golint: Fixup issues found in the report
This also increases the max allowed to 5% -- I'm happy to make this
lower if someone asks.
2017-02-22 22:18:55 -05:00
James Shubin
49594b8435 pgraph, resources: Clean up the event system around the resources
This cleans up some of the resource events and also reorganizes the
struct for simplicity. This should hopefully kill off at least one race
which would cause unnecessary blocking!

Yes this patch is a bit yucky, but so was the bug I was fighting with!
2017-02-22 17:45:16 -05:00
James Shubin
18ea05c837 pgraph, resources: Add proper start/stop signals
We need to perform some operations in lock step between graph
transitions. This should help with that!
2017-02-21 18:48:27 -05:00
James Shubin
86c3072515 resources: The user should not call Init
Even in tests...
2017-02-21 18:42:07 -05:00
James Shubin
fccf508dde resources, pgraph: Refactor Worker and simplify API
I'm still working on reducing the size of the monster patches that I
land, but I'm exercising the priviledge as the initial author. In any
case, this refactors worker into two, and cleans up the passing around
of the processChan. This puts common code into Init and Close.
2017-02-21 18:42:07 -05:00
James Shubin
2da21f90f4 pgraph, resources: Improve Init/Close and Worker status
This should do some rough cleanups around the Init/Close of resources,
and tracking of Worker function status.
2017-02-21 18:42:07 -05:00
James Shubin
69b0913315 test: Fix tests by hooking up go test properly
The internal golang tests broke when we turned everything into packages.
This resurrects them with the hopes that we'll add more!
2017-02-20 16:40:40 -05:00
James Shubin
a981cfa053 legal: Oh yeah, it is 2017 2017-02-16 01:34:32 -05:00
Julien Pivotto
e8855f7621 prometheus: Implement mgmt_checkapply_total metric
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2017-02-12 23:45:47 +01:00
James Shubin
ed268ad683 pgraph, yamlgraph: Allow specifying notify value from YAML 2017-02-06 16:29:02 -05:00
James Shubin
dd8454161f pgraph: Set the false starter value too
We might have left re-used nodes as true, even if they no longer were
anymore due to graph changes, which would have caused additional pokes.
2017-01-27 17:43:24 -05:00
James Shubin
9421f2cddd resources: Rename GetUIDs to UIDs
This is more in line with the style guide for golang.
2017-01-25 14:51:23 -05:00
James Shubin
357102fdb5 converger: Block converging in the engine
This more appropriately blocks converging in the engine, since we are
now 1-1 decoupled from the Watch resource. This simplifies resource
writing, and should be more accurate around small converged timeouts.

We don't block in the Worker routine when we are polling, because we
expect to get constant poll events, and we can instead be more careful
about these by looking at CheckApply results.

If we can do this for all resources in the future, it would be
excellent!
2017-01-25 11:06:02 -05:00
James Shubin
7e15a9e181 pgraph: Add debug messages
These two messages in particular make graph analysis easier.
2017-01-25 09:52:34 -05:00
James Shubin
12e0b2d6f7 pgraph: Parallelize the BackPoke facility
I can't guarantee this has a significant effect, but it's likely to add
some efficiency when sending multiple BackPoke's at the same time, so
that erroneous ones can be cancelled out easier.
2017-01-25 09:13:59 -05:00
James Shubin
11b40bf32f resources: Update state checks
The mgmt graph depends on state tracking to eliminate redundant pokes.
With the Watch loop now able to produce events quickly, it should no
longer play a part in determining the vertex state. This simplifies the
resource API as well!
2017-01-25 09:13:59 -05:00
James Shubin
8d2b53373f pgraph: Remove unnecessary indentation in Process code path
We can indent the simple BackPoke code path instead!
2017-01-25 09:13:59 -05:00
James Shubin
4f34f7083b resources: rate limiting: Implement resource rate limiting
This adds rate limiting with the limit and burst meta parameters. The
limits apply to how often the Process check is called. As a result, it
might get called more often than there are Watch events due to possible
Poke/BackPoke events.

This system might need to get rethought in the future depending on its
usefulness.
2017-01-25 09:13:59 -05:00
James Shubin
51c83116a2 resources: Overhaul legacy code around the resource API
This patch makes a number of changes in the engine surrounding the
resource API. In particular:

* Cleanup of send/read event.
* Cleanup of DoSend (now Event) in the Watch method.
* Events are now more consistently pointers.
* Exiting within Watch is now done in a single place.
* Multiple incoming events will be combined into a single action.
* Events in flight during an action are played back after CheckApply.
* Addition of Close method to API

This gets things ready for rate limiting and semaphore metaparams!
2017-01-22 05:59:15 -05:00
James Shubin
56efef69ba resources: Officially add Validate method
This officially adds the Validate method to the resource API, and also
cleans up the ordering in existing resources.
2017-01-09 05:10:26 -05:00
James Shubin
b921aabbed resources: Add poll metaparameter
This allows a resource to use polling instead of the event based
mechanism. This isn't recommended, but it could be useful, and it was
certainly fun to code!
2016-12-24 00:51:39 -05:00
James Shubin
19760be0bc golint: Fix some golint issues 2016-12-21 03:10:25 -05:00
James Shubin
5b3425a689 pgraph: Remember to unpause the vertices!
Forgot this part earlier, sorry! Should work correctly now :)
2016-12-21 02:39:54 -05:00
James Shubin
a3d157bde6 pgraph: Mutex must be a pointer
This should be done the same way as the WaitGroup so that we don't
panic!
2016-12-20 05:49:17 -05:00
James Shubin
2c8c9264a4 pgraph: Simplify graph exit waiting
I think the vertex resource exiting can be done in a single stage
instead of the previous two stage exit.
2016-12-20 05:49:17 -05:00