Commit Graph

531 Commits

Author SHA1 Message Date
James Shubin
a358135e41 resources: exec: Remove the pollint parameter
Since we now have a poll metaparameter, we don't need the resource
specific code.
2017-03-12 10:49:26 -04:00
James Shubin
6d9be15035 pgraph: semaphore: Add lock around semaphore map
I forgot about the `concurrent map write` race, but now it's fixed. I
suppose we could probably pre-create all semaphores in the graph at once
before Start, and remove this lock, but that's an optimization for a
later day.
2017-03-11 09:06:18 -05:00
James Shubin
b740e0b78a git: Add more features to tag.sh script
This helps me make releases and probably won't help you, but why not be
transparent about things and tools!
2017-03-11 08:41:56 -05:00
James Shubin
9546949945 git: Improve tagging script 0.0.10 2017-03-11 08:29:53 -05:00
James Shubin
8ff048d055 test: Disable prometheus-3.sh test temporarily
It seems to be failing, and I'm not sure where the regression is, or if
there is a race. Sorry roidelapluie.
2017-03-09 11:46:21 -05:00
James Shubin
95a1c6e7fb pgraph, resources: Discard BackPokes during pause and resume
This prevents some nasty races where a BackPoke could arrive on a paused
vertex either during a resume or pause operation. Previously we might
also have poked an excessive number of resources on resume.

The solution was to discard BackPokes during pause or resume. On pause,
they can be discarded because we've asked the graph to quiesce, and any
further work can be done on resume, and on resume we ignore them because
this should only happen during the unrolling (reverse topological resume
of the graph) and at the end of this the indegree == 0 vertices will
initiate a series of pokes which should deal with any BackPoke that was
possibly discarded.

One other aspect of this which is important: if an indegree == 0 vertex
is poked (Process runs) but it's already in the correct state, it should
still transmit the Poke through itself so that subsequent vertices know
to run. Currently this is done correctly in Process().

I'm a bit ashamed that this wasn't done properly in the engine earlier,
but I suppose that's what comes out of running fancier graphs and really
thinking in detail about what's truly correct. Hopefully I got it right
this time!
2017-03-09 06:35:15 -05:00
James Shubin
0b1a4a0f30 pgraph, resources: Quiesce when pausing or exiting the resource
This prevents a nasty race that can happen in a graph with more than one
resource. If a resource has someone that it can BackPoke, and then
suppose an event comes in. It runs the obj.Event() method (from inside
its Watch loop) and then *before* the resulting Process method can run
it receives a pause event and pauses. Then the parent resource pauses as
well. Finally (it's a race) the Process gets around to running, and
decides it needs to BackPoke. At this point since the parent resource is
paused, it receives the BackPoke at a time when it can't handle
receiving one, and it panics!

As a result, we now track the number of running Process possibilities
via a WaitGroup which gets incremented from the obj.Event() and we don't
finish our pause or exit operations until it has quiesced and our
WaitGroup lets us know via Wait(). Lastly in order to prevent repeated
replays, we detect when we're quiescing and suspend replaying until post
pause. We don't need to save the replay (playback variable) explicitly
because its state remains during pause, and on exit it would get
re-checked anyways.
2017-03-09 02:50:55 -05:00
James Shubin
22b48e296a resources, yamlgraph: Drop the kind capitalization
This stopped making sense now that we have a resource with two primary
capitals. It was just a silly formatting hack anyways. Welcome kv!
2017-03-09 02:50:55 -05:00
James Shubin
c696ebf53c resources: svc: Add failed state
Services can be in a failed state too.
2017-03-08 19:23:33 -05:00
James Shubin
a0686b7d2b pgraph: graphviz: Update Graphviz lib to quote names properly
This also moves the library to after the graph starts so that the kind
fields will be visible.
2017-03-08 19:23:33 -05:00
James Shubin
8d94be8924 resources: kv: Add new KV resource which sets key value pairs
This is a new resource for setting key value pairs in our global world
database. Currently only etcd is supported. Some of the implications and
possibilities of this resource will become more obvious with future
commits!

You can bother/test this resource with these commands:

ETCDCTL_API=3 etcdctl get "/_mgmt/strings/" --prefix=true
ETCDCTL_API=3 etcdctl put "/_mgmt/strings/KEY/HOSTNAME" 42

Replace the KEY and HOSTNAME variables with the actual values you'd like
to use. The 42 is the value that is set.
2017-03-08 19:23:33 -05:00
James Shubin
e97ac5033f resources: Split util functions into separate file
This also adds errwrap to their implementation.
2017-03-08 19:23:33 -05:00
James Shubin
44771a0049 gapi: Move the World interface into resources
This was necessary to fix some "import cycle" errors I was having when
adding the World api to the resource Data struct.

I think this is a good hint that I need to start splitting up existing
packages into sub files, and cleaning up and inter-package problems too.
2017-03-08 19:23:33 -05:00
James Shubin
32aae8f57a lib, pgraph, resources: Refactor data association API
This should make things cleaner and help avoid as much churn every time
we change a property.
2017-03-07 22:51:11 -05:00
James Shubin
8207e23cd9 lib: Refactor instantiation of world API 2017-03-07 22:51:11 -05:00
James Shubin
a469029698 etcd: Switch to buffered channel to remove duplicates
Since we don't return the actual values and instead only tell about
events (which leaves the `Get` of the value as a second operation) then
we don't have to use a channel with backpressure since all the events
are identical.
2017-03-07 22:51:11 -05:00
James Shubin
203d866643 gapi, etcd: Define and implement a string sharing API for the World
This adds a new set of methods to the World API (for sharing data
throughout the cluster) and adds an etcd backed implementation.
2017-03-07 22:51:11 -05:00
Michael Borden
1488e5ec4d travis: Add go_import_path
This should fix an issue with turning on travis for any mgmt fork.
2017-03-07 14:17:53 -08:00
Michael Borden
af66138a17 misc: Add pacman support 2017-03-03 21:04:29 -08:00
James Shubin
5f060d60a7 test: Avoid matching three X's
This helps my "WIP" detector script avoid false positives. It is a
simple script which helps me find release critical problems.
2017-03-01 22:37:08 -05:00
James Shubin
73ccbb69ea travis: Ensure recent golang versions in test
This ensures travis picks the latest versions. Apparently writing 1.x is
also a valid choice.
2017-03-01 21:45:02 -05:00
James Shubin
be60440b20 readme: Add new blog post about metaparameters
This also documents the recent semaphore work.
2017-03-01 16:18:14 -05:00
James Shubin
837efb78e6 spelling: Fix typos as found by goreportcard 2017-02-28 23:48:34 -05:00
James Shubin
4a62a290d8 pgraph: Clean up tests
This splits the tests into multiple files.
2017-02-28 16:47:16 -05:00
James Shubin
018399cb1f semaphore, pgraph: Add semaphore grouping and tests
If two resources are grouped, then the result should contain the
semaphores of both resources. This is because the user is expecting
(independently) resource A and resource B to have a limiting choke
point. If when combined those choke points aren't preserved, then we
have broken an important promise to the user.
2017-02-28 16:40:53 -05:00
James Shubin
646a576358 remote: Replace builtin semaphore type with common util lib
Refactor code to use the new fancy lib!
2017-02-27 02:57:06 -05:00
James Shubin
d8e19cd79a semaphore: Create a semaphore metaparam
This adds a P/V style semaphore mechanism to the resource graph. This
enables the user to specify a number of "id:count" tags associated with
each resource which will reduce the parallelism of the CheckApply
operation to that maximum count.

This is particularly interesting because (assuming I'm not mistaken) the
implementation is dead-lock free assuming that no individual resource
permanently ever blocks during execution! I don't have a formal proof of
this, but I was able to convince myself on paper that it was the case.

An actual proof that N P/V counting semaphores in a DAG won't ever
dead-lock would be particularly welcome! Hint: the trick is to acquire
them in alphabetical order while respecting the DAG flow. Disclaimer,
this assumes that the lock count is always > 0 of course.
2017-02-27 02:57:06 -05:00
James Shubin
757cb0cf23 test: Small fixups to t4 and a rename 2017-02-26 20:54:22 -05:00
Julien Pivotto
7d92ab335a prometheus: Add mgmt_pgraph_start_time_seconds metric
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2017-02-26 15:28:43 +01:00
James Shubin
46c6d6f656 compiling: Add -i flag to go build to speed up builds
So builds take about 30s on my shitty machine which is pretty long.
Turns out golang caches things in $GOPATH/pkg/ but is not clever enough
to erase things from there that are out of date. As a result, it was
rebuilding everything (including the unchanged dependencies) every
build!

By completely wiping out $GOPATH/pkg/ and then running `go build -i`,
this now takes builds down to about 8 seconds. (After one full build is
finished.)

This is basically the same as running `go install`, but without copying
junk to $GOPATH/bin.

Hopefully the tooling will be smart enough to know when to throw out
stuff in $GOPATH/pkg automatically and avoid this problem entirely.

Is it wrong to send Google a bill for all the extra cpu cycles I've
used? ;)
2017-02-26 04:06:36 -05:00
Julien Pivotto
46260749c1 prometheus: Move the prometheus nil check inside the prometheus function
That pattern will be reused in future metrics.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2017-02-26 09:33:34 +01:00
Julien Pivotto
50664fe115 compilation: Make build the default target
It is really strange that whe I run make, it does not build mgmt. This
commit makes build the default target, without moving the target,
therefore we keep as much as we can the order of the file.

This also removes the confusion for designers that would run "make"
instead of "make art", whose work would be disrupted when we add a --
let's say -- make alpharelese command.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2017-02-26 09:04:20 +01:00
James Shubin
c480bd94db resources: virt: Remove unnecessary early exit from CheckApply
I don't think this early exit is necessary any more, since the main
CheckApply function really just spawns out to the different sub workers
which all individually check the apply variable.

If I'm wrong, we can revert this. It was @roidelapluie that noticed the
check here to begin with.
0.0.9
2017-02-25 21:29:08 -05:00
James Shubin
79923a939b resources: virt: Catch bad calls to CheckApply
If the engine cheats, we'll know!
2017-02-25 21:28:35 -05:00
James Shubin
327b22113a resources: virt: Don't block exit in callbacks
This prevents us blocking an exit if we close when a callback was about
to run. This is because the callbacks are called from the
EventRunDefaultImpl method, which waits for their return to exit and
release the WaitGroup.

I think we should probably get rid of the obj.wg since the engine is
supposed to guarantee that Close doesn't happen before Watch finishes.
2017-02-25 21:04:36 -05:00
James Shubin
12160ab539 pgraph: Wait for Process routine to exit
Wait for the innerWorker's Process routine to exit, before we exit too.
2017-02-25 21:01:02 -05:00
James Shubin
2462ea0892 pgraph, resources: Wait for innerWorker to exit cleanly
Don't run the Close() method until the innerWorker has exited cleanly.
This is a guarantee which we make to the resources.
2017-02-25 21:00:38 -05:00
James Shubin
8be09eadd4 test: Fix up probable timeout failures due to slow ci
We had *occasional* failures most likely due to slow ci and compounded
by low entropy. We also weren't pointing at the right test!
2017-02-22 23:08:09 -05:00
James Shubin
98bc96c911 golint: Fixup issues found in the report
This also increases the max allowed to 5% -- I'm happy to make this
lower if someone asks.
2017-02-22 22:18:55 -05:00
James Shubin
b0fce6a80d travis: Start building golang 1.8 as well
Require success from 1.7 since we'll probably move to it as the default
within six months.
2017-02-22 22:18:55 -05:00
James Shubin
53b8a21d1e resources: virt: Cleanup cleanly on Close
Don't block accidentally on error!
2017-02-22 22:18:55 -05:00
James Shubin
1346492d72 yamlgraph: Close the recwatcher properly after use
Don't leave it running unnecessarily! This might have contributed to a
block, but it was hard to isolate if this was the cause or if this was
one of many causes.
2017-02-22 18:37:43 -05:00
James Shubin
e5bb8d7992 recwatch: Close the ConfigWatch properly
This improves the shutdown process so that there is no change of
blocking if the sender runs close without having emptied the channel.
2017-02-22 17:45:16 -05:00
James Shubin
49594b8435 pgraph, resources: Clean up the event system around the resources
This cleans up some of the resource events and also reorganizes the
struct for simplicity. This should hopefully kill off at least one race
which would cause unnecessary blocking!

Yes this patch is a bit yucky, but so was the bug I was fighting with!
2017-02-22 17:45:16 -05:00
James Shubin
3bd37a7906 test: Don't block on graph transitions
Improvements in the engine have uncovered some annoying race conditions
which would cause the engine to block between transitions. This is a
test which catches the most obvious file based ones.

This requires inotify to work in the test environment.
2017-02-22 17:45:16 -05:00
James Shubin
e070a85ae0 lib: Misc cleanups and new log message 2017-02-22 17:45:16 -05:00
James Shubin
c189278e24 recwatch: Unblock from sending on exit
If we receive an exit signal while we are waiting to send a message, we
should still be able to shut down.
2017-02-22 16:41:47 -05:00
James Shubin
2a8606bd98 recwatch: Close cleanly after Watcher finishes
This cleans up the recwatch code to avoid some possible races. While you
were previously able to call Close more than once, this was really
something that would mask bugs, rather than a useful feature.
2017-02-22 16:41:26 -05:00
James Shubin
18ea05c837 pgraph, resources: Add proper start/stop signals
We need to perform some operations in lock step between graph
transitions. This should help with that!
2017-02-21 18:48:27 -05:00
James Shubin
86c3072515 resources: The user should not call Init
Even in tests...
2017-02-21 18:42:07 -05:00