This makes this logically more separate! :) As an aside...
I really hate the way golang does dependencies and packages. Yes, some
people insist on nesting their code deep into a $GOPATH, which is fine
if you're a google dev and are forced to work this way, but annoying for
the rest of the world. Your code shouldn't need a git commit to switch
to a a different vcs host! Gah I hate this so much.
All resources can now set a retry limit (-1 for infinite) and a delay
between retries. This applies to both the CheckApply methods, and the
Watch methods as well. They each have their own separate counts, but use
the same input meta param, since I decided it wouldn't be useful to have
a separate watchRetry and watchDelay set of meta parameters.
In the process, we got rid of about 15 error cases which would normally
panic.
This patch required a slight overhaul of the Event system.
The previous commit is an earlier version of this patch which I decided
to leave in to "show my work" as I used to have to do in math class.
It's slightly more correct with the current event system, and this
version is less correct and has a few bugs, but that is because the
event system needs a massive overhaul, and once that's done this should
all work properly for the corner cases.
This was the initial cut of the retry and delay meta parameters.
Instead, I decided to move the delay action into the common space
outside of the Watch resource. This is more complicated in the short
term, but will be more beneficial in the long run as each resource won't
have to implement this part itself (even if it uses boiler plate).
This is the first version of this patch without this fix. I decided to
include it because I think it has more correct event processing.
On Nixos and GNUIX-SD, bash is chroot in package store path.
In my case : /nix/store/qvccmr6fsis4kqlvlk8pb1c8c0r0cwai-system-path/bin/bash
In any case, using `/usr/bin/env bash` is the recommended way to get bash
portable across UNIX-like systems.
The file resource contained some of the early golang code that I wrote
for this project. Needless to say, some of it was quite yucky, and it
was also lacking a number of important features. This patch builds upon
it so that it starts being usable for directories of files too.
Many thanks to Sam Gélineau for helping with the recursive watching. My
brain officially didn't want to look at that code anymore.
This allows the system to converge during corner cases where there is an
error, or when there are no remotes being used, but we are using the
--no-watch variable.
I deliberately left this in as a separate commit instead of rebasing
into the remote execution development branch because the placement of
the Unregister() and semaphore.V(1) were quite subtle and easy to forget
about.
This patch extends the --converged-timeout argument so that when used
with --remote it waits for the entire set of remote mgmt agents to
converge simultaneously before exiting.
purpleidea says: This particular part of the patch probably took as much
work as all of the work required for the initial remote patches alone!
This implements the new extensions to the converged UUID API so that we
can keep a consistent timer running and reset it when needed. This is
useful because we now allow certain Watcher callbacks and other
operations to explicitly _not_ cause a convergerUUID to un-converge.
This is a necessary dependency for the distributed remote
converged-timeout work.
This adds a new method of marking whether a particular UUID has
converged or not. You can now Start, Stop, or Reset a convergence timer
on the individual UUID's. This wraps the existing SetConverged calls
with a hidden go routine. It is not recommended to use the SetConverged
calls and the Timer calls on the same UUID.
This adds new helper methods to the API such as the ability to query the
set timeout, and to set the state change function after initialization.
The first makes it easier to pass in the timeout value to functions and
structs, because you don't need to pass it separately from the converger
object. The second makes it easy to specify the state change functon
when you don't know what it is at creation time. In addition, it is more
powerful now, and tells you when we converge or un-converge in case we
want to take different actions on each.
We updated the API and behaviour to make two changes:
1) We remove the log.* stuff, and replace the Fatal calls with straight
calls to panic(). These are meant to be like an assert, and shouldn't
happen unless there is a user error or a bug.
2) We made the !uuid.IsValid() checks return an error instead of causing
a panic. These returns errors instead, and makes the process safer for
users who are okay calling a function that won't have an effect.
If you don't give your two host cluster enough time to "feel healthy",
it will generate an error if you do operations within five seconds. This
is a regression and the five seconds is also quite arbitrary. This is
detailed at: https://github.com/coreos/etcd/issues/6305
This seems to be a bit of a race condition, even with a 10s timer, so
this also disables the StrictReconfigCheck. Re-enable this as soon as
possible.