A back poke is the deferral or delay of a Process/CheckApply. This is
because we notice that we're not truly ready to CheckApply due to some
timestamp issue. When Process errors, we should accept that, but not
treat it as a success.
The retry and limit "satellite" event loops didn't allow pausing or
resuming, and instead you needed to wait until either was done before
you could pause.
The downside of this patch is that for very fast graph transitions, we
wouldn't be really obeying the limits anymore, however now that we have
per resource kind+name uid, we can persist the limits across graph swaps
if we want to.
Most importantly, this allows us to exit entirely when we're stuck in
one of these satellite loops.
This adds a meta state store that is preserved between graph switches if
the kind and name match. This is useful so that rapid graph changes
don't necessarily reset their retry count if they've only changed one
resource field.
This simplifies the pause mechanism and also avoids a deadlock on error.
If the Worker shuts down completely, but before we've been removed from
the graph, then an attempted pause would deadlock if we didn't have an
escape hatch here.
This removes the unnecessary ack mechanism now that we have a
synchronous channel send to represent the pausing, rather than an
asynchronous channel closing.
There's always the fear that there is either a panic or a deadlock in
the highly concurrent engine resource code. I have not seen one recently
and I've been running some pretty concurrent tests. In the meantime, and
with my hopefully improved knowledge of concurrency, I decided to
rewrite some of the "uglier" parts of the engine. I think it is a lot
clearer now, and much less likely that there is a concurrency issue.
This has been tested by running the examples/lang/fastcount.mcl example.
There were a bunch of packages that weren't well documented. With the
recent split up of the lang package, I figured it would be more helpful
for new contributors who want to learn the structure of the project.
Since we'll want to use them elsewhere, we should make these helper
functions. It also makes the code look a lot neater. Unfortunately, it
adds a bit more indirection, but this isn't a critical flaw here.
This ensures that docstring comments are wrapped to 80 chars. ffrank
seemed to be making this mistake far too often, and it's a silly thing
to look for manually. As it turns out, I've made it too, as have many
others. Now we have a test that checks for most cases. There are still a
few stray cases that aren't checked automatically, but this can be
improved upon if someone is motivated to do so.
Before anyone complains about the 80 character limit: this only checks
docstring comments, not source code length or inline source code
comments. There's no excuse for having docstrings that are badly
reflowed or over 80 chars, particularly if you have an automated test.
For some reason we get errors when we try to remove a non-existent state
file. There's a slight possibility that it could be a bug we're working
around, but it's not clear that this is the case, and I think it's
possible that a state file could have gotten nuked by the user somehow,
although this was occurring "naturally" when running reverse1.mcl so
let's keep that working for now.
This adds the first reversible resource (file) and the necessary engine
API hooks to make it all work. This allows a special "reversed" resource
to be added to the subsequent graph in the stream when an earlier
version "disappears". This disappearance can happen if it was previously
in an if statement that then becomes false.
It might be wise to combine the use of this meta parameter with the use
of the `realize` meta parameter to ensure that your reversed resource
actually runs at least once, if there's a chance that it might be gone
for a while.
This patch also adds a new test harness for testing resources. It
doesn't test the "live" aspect of resources, as it doesn't run Watch,
but it was designed to ensure CheckApply works as intended, and it runs
very quickly with a simplified timeline of happenings.
If you ran some extremely absurd code, it turns out you can cause a
race. This was found by roiedelapluie experimenting! In this case, it
would panic with: fatal error: concurrent map read and map write. This
patch adds the mutex to avoid this rare race.
This is a giant cleanup of the etcd code. The earlier version was
written when I was less experienced with golang.
This is still not perfect, and does contain some races, but at least
it's a decent base to start from. The automatic elastic clustering
should be considered an experimental feature. If you need a more
battle-tested cluster, then you should manage etcd manually and point
mgmt at your existing cluster.
Part of this was rotten, and not fully functional. This fixes the rot,
adds some tests, and improves the type checking that occurs when sending
and receiving values. In addition, a significant portion of this happens
at compile time.
There is still more work to be done here, but this should get us a good
chunk of the way for now.
This adds back the retry loop around Process. This is done as a
separate commit so you can more easily see the logic of the retry magic
This commit is similar but different to the earlier commit adding retry
around Watch.
The engine core had some unfortunate bugs that were the result of some
early design errors when I wasn't as familiar with channels. I've
finally rewritten most of the bad parts, and I think it's much more
logical and stable now.
This also simplifies the resource API, since more of the work is done
completely in the engine, and hidden from view.
Lastly, this adds a few new metaparameters and associated code.
There are still some open problems left to solve, but hopefully this
brings us one step closer.
It's plausible that we send on a closed channel if we're running a back
poke and it tries to send a poke on something that has already closed.
If it detects this condition, it will exit.
Unfortunately, it's not clear if the wait group will protect this case,
but hopefully this will hold us until we can re-write the needed parts
of the engine.