This is a giant cleanup of the etcd code. The earlier version was
written when I was less experienced with golang.
This is still not perfect, and does contain some races, but at least
it's a decent base to start from. The automatic elastic clustering
should be considered an experimental feature. If you need a more
battle-tested cluster, then you should manage etcd manually and point
mgmt at your existing cluster.
Named return params aren't a favourite feature of mine, and they're
rarely used in the resource writing. They keep popping up because I once
used them and we've been copying and pasting code ever since. Get rid of
them all to help prevent the unnecessary spread.
Part of this was rotten, and not fully functional. This fixes the rot,
adds some tests, and improves the type checking that occurs when sending
and receiving values. In addition, a significant portion of this happens
at compile time.
There is still more work to be done here, but this should get us a good
chunk of the way for now.
This new utility function makes verifying send/recv struct comparisons
consistent. Unfortunately it doesn't yet support coercing from *string
to string or from string to *string.
This replaces the static obj.path and obj.isDir with live variants. I
don't know why I ever cared about caching these before, and if we ever
care we can memoize properly in the future.
This caused a small bug to actually be masked in the gob code. It is now
fixed in the previous commit.
The fields that we might want to store with encoding/gob or any other
encoding package, need to be public. We currently don't use any of these
at the moment, but we might in the future.
Not sure how I let this in, but we should never do this. Hopefully the
Validate should catch this issue in advance, and if not, at least we'll
only error.
If the file res was defined with state => "exists" but no content
specified, it was not created. This patch fixes this bug and adds a test
and an example.
The exec resource was an early addition to the project, and it was due
for some fixes and integration into our automated tests. This patch
fixes a number of issues, and makes it ready for more general use.
Add option RefreshOnly (default to false) on print ressource, to print only when
notified by other resource. When a print is RefreshOnly, it can't be grouped anymore.
This adds the ability to wait with a timeout for CheckApply happenings
in a resource. This helps avoid unnecessary long sleeping and timing
guesses. This also adds a cleanup function to run at the end.
Some of the early code I wrote probably wouldn't pass my own reviews
today. Here's one example of a rare deadlock that could sometimes occur
when a Process/CheckApply caused a shutdown, but the bufio tried to send
on a channel that nobody was going to read any more. Now we properly
unblock that send with a context.
This adds back the retry loop around Process. This is done as a
separate commit so you can more easily see the logic of the retry magic
This commit is similar but different to the earlier commit adding retry
around Watch.
The engine core had some unfortunate bugs that were the result of some
early design errors when I wasn't as familiar with channels. I've
finally rewritten most of the bad parts, and I think it's much more
logical and stable now.
This also simplifies the resource API, since more of the work is done
completely in the engine, and hidden from view.
Lastly, this adds a few new metaparameters and associated code.
There are still some open problems left to solve, but hopefully this
brings us one step closer.
This adds a simulated engine that can run and test single resources. It
can't test all aspects and features that the engine supports, but is
probably pretty decent for testing the actual CheckApply and Watch
semantics. Be warned that it actually applies changes on your machine,
so please don't write tests that make undesirable changes.
After some investigation, it appears that SocketSet.Shutdown() and
SocketSet.Close() are not synchronous operations. The sendto system call
called in SocketSet.Shutdown() is not a blocking send. That means there
is a race in which SocketSet.Shutdown() sends a message to a file
descriptor to unblock select, while SocketSet.Close() will close the
file descriptor that the message is being sent to. If SocketSet.Close()
wins the race, select is listening on a dead file descriptor and will
hang indefinitely.
This is fixed in the current master by putting SocketSet.Close() inside
of the goroutine in which data from the socket is being received. It
relies on SocketSet.Shutdown() being called to terminate the goroutine.
While this works most of the time, there is a race here. All the
goroutines can also be terminated by a closeChan. If the goroutine
receives an event (thus unblocking select) and then closeChan is
triggered, both SocketSet.Shutdown() and SocketSet.Close() race, leading
to undefined behavior.
This patch ensures the ordering of the two function calls by pulling
them both out of the goroutine and separating them with a WaitGroup.
Co-authored-by: James Shubin <james@shubin.ca>
Add the ReceiveBytes, ReceiveNetlinkMessage, and ReceiveUEvent methods.
This is because not everything passed through a netlink socket cannot
reliably be parsed using the ParseNetLinkMessage function.
With the ReceiveUEvent method, we add support for "uevent" kernel
events, which updates us about the state of devices currently on the
system. To make using this method easier, we add a UEvent struct, that
has the action (what event), Devpath (where the device lives in /proc or
/sysfs), and Subsystem (what subsystem this event belows to).
It's plausible that we send on a closed channel if we're running a back
poke and it tries to send a poke on something that has already closed.
If it detects this condition, it will exit.
Unfortunately, it's not clear if the wait group will protect this case,
but hopefully this will hold us until we can re-write the needed parts
of the engine.
Occasionally when a back poke happens downstream of an upstream vertex
which has already exited, it could get back poked, which would cause a
panic. This moves the deletion of the state struct until the entire
graph has completed so that it won't panic. It doesn't matter if a back
poke is lost, we're shutting down or pausing, and in this scenario it
can be lost.
Somewhere after the engine re-write we seem to have regressed and
converge early even if some resource is dirty. This adds an additional
timer so that we don't start the individual resource converged countdown
until our state is okay.
This signals to an interested consumer that two or more compatible
resources can be merged safely. This is so that we can avoid the
"duplicate resource" design problem that puppet had.
To test this, you can run:
./mgmt run --tmp-prefix lang --lang 'pkg "cowsay" { state => "installed", } pkg "cowsay" { state => "newest", }'
which should work.