Commit Graph

257 Commits

Author SHA1 Message Date
James Shubin
8003202beb resources: golint fixes 2016-09-02 02:03:26 -04:00
James Shubin
b46432b5b6 misc: Golint fixes 2016-09-02 01:46:45 -04:00
James Shubin
5e3f03df06 etcd: Catch possible raft grpc error 2016-09-02 00:35:05 -04:00
James Shubin
8ab8e6679a test: provider usage text for shell test runner 2016-09-01 22:52:32 -04:00
James Shubin
786b896018 remote: small cleanups to update misc notes 2016-08-31 22:56:00 -04:00
James Shubin
40723f8705 docs: Add additional documentation about remote execution 2016-08-31 22:42:09 -04:00
James Shubin
2a0721bddf remote: allow converge during corner cases
This allows the system to converge during corner cases where there is an
error, or when there are no remotes being used, but we are using the
--no-watch variable.

I deliberately left this in as a separate commit instead of rebasing
into the remote execution development branch because the placement of
the Unregister() and semaphore.V(1) were quite subtle and easy to forget
about.
2016-08-31 22:42:09 -04:00
James Shubin
ff01e4a5e7 remote: Add distributed converged timeout
This patch extends the --converged-timeout argument so that when used
with --remote it waits for the entire set of remote mgmt agents to
converge simultaneously before exiting.

purpleidea says: This particular part of the patch probably took as much
work as all of the work required for the initial remote patches alone!
2016-08-31 21:55:19 -04:00
James Shubin
6794aff77c miscellaneous cleanups and fixes 2016-08-31 21:55:19 -04:00
James Shubin
636f2a36b1 etcd: Use new converged timers and allow skipping them
This implements the new extensions to the converged UUID API so that we
can keep a consistent timer running and reset it when needed. This is
useful because we now allow certain Watcher callbacks and other
operations to explicitly _not_ cause a convergerUUID to un-converge.
This is a necessary dependency for the distributed remote
converged-timeout work.
2016-08-31 21:55:19 -04:00
James Shubin
eee652cefe converger: Add new timer system for determining convergence
This adds a new method of marking whether a particular UUID has
converged or not. You can now Start, Stop, or Reset a convergence timer
on the individual UUID's. This wraps the existing SetConverged calls
with a hidden go routine. It is not recommended to use the SetConverged
calls and the Timer calls on the same UUID.
2016-08-31 21:55:19 -04:00
James Shubin
6d45cd45d1 converger: Add new methods to the API
This adds new helper methods to the API such as the ability to query the
set timeout, and to set the state change function after initialization.
The first makes it easier to pass in the timeout value to functions and
structs, because you don't need to pass it separately from the converger
object. The second makes it easy to specify the state change functon
when you don't know what it is at creation time. In addition, it is more
powerful now, and tells you when we converge or un-converge in case we
want to take different actions on each.
2016-08-31 21:55:19 -04:00
James Shubin
f5fb135793 converger: Update the API for errors and naming
We updated the API and behaviour to make two changes:

1) We remove the log.* stuff, and replace the Fatal calls with straight
calls to panic(). These are meant to be like an assert, and shouldn't
happen unless there is a user error or a bug.

2) We made the !uuid.IsValid() checks return an error instead of causing
a panic. These returns errors instead, and makes the process safer for
users who are okay calling a function that won't have an effect.
2016-08-31 21:55:19 -04:00
James Shubin
6bf32c978a etcd: Rename loop to be more consistent in messages
Small nitpick fixups
2016-08-31 21:55:19 -04:00
James Shubin
8d3011fb9c etcd: Add a timeout for etcd server to start correctly
This also updates etcd to a newer version with a fix that allows this
detection and timeout operation to be possible.
2016-08-31 21:55:19 -04:00
James Shubin
9260066fa3 tests: Workaround regression in two host etcd clusters
If you don't give your two host cluster enough time to "feel healthy",
it will generate an error if you do operations within five seconds. This
is a regression and the five seconds is also quite arbitrary. This is
detailed at: https://github.com/coreos/etcd/issues/6305

This seems to be a bit of a race condition, even with a 10s timer, so
this also disables the StrictReconfigCheck. Re-enable this as soon as
possible.
2016-08-31 21:55:19 -04:00
James Shubin
5e45c5805b Improve internal etcd error handling 2016-08-31 21:55:19 -04:00
James Shubin
db4de12767 Add more flexibility around the prefixes available
This allows you to specify a custom prefix, or a tmp prefix which is
chosen automatically.
2016-08-31 21:55:19 -04:00
James Shubin
d429795737 Improve the configWatcher array to allow N files
This simplifies the code in main and hides it in the watcher!
2016-08-31 21:55:19 -04:00
James Shubin
276219a691 Run tests with a tmp prefix
This will avoid failures if /var/lib/mgmt/ isn't writable.
2016-08-31 21:55:19 -04:00
James Shubin
03c1df98f4 Improve prefix creation and feedback
This makes the default /var/lib/mgmt/ directory, but also allows you to
use a prefix in /tmp/ automatically if you can't write anywhere else.
2016-08-31 21:55:19 -04:00
James Shubin
79ba750dd5 Automatically update remote files on change
This extends the automatic watching of graph definition files across the
remote SSH boundary.
2016-08-31 21:55:19 -04:00
James Shubin
1d0e187838 Etcd: switch to using a directory prefix 2016-08-31 21:55:19 -04:00
James Shubin
ad1e48aa2d Add caching for remote execution
This speeds up copying of the binary for slow connections. It also
finally adds a universal directory prefix for mgmt!
2016-08-31 21:55:19 -04:00
James Shubin
7032eea045 Remote "agent-less" mode
This is a new mode to be used for bootstrapping mgmt clusters or in
situations with tight operational restrictions.

This includes the basics, additional functionality will follow!
2016-08-31 21:55:19 -04:00
James Shubin
bdb970203c Start converger even if graph is empty 2016-08-30 17:47:25 -04:00
James Shubin
fa4f5abc78 Fix up tests, and one small bug 2016-08-30 17:47:25 -04:00
James Shubin
0c7b05b233 Check for exit status of tests 2016-08-30 17:47:25 -04:00
Joe Julian
4ca98b5f17 Allow make overrides of version if git fails
If the $(shell ...) command fails, the ':=' operator fails as well
preventing variable overrides from functioning.

Wrap the assignment in an $(or ...) to prevent the shell script from
running if the variable is set from the command line or the environment.

Fixes issue #58
2016-08-30 14:44:03 -07:00
Joe Julian
4e00c78410 Change "uname -i" to "uname -m" to be portable
According to the documentation for uname:

       -m, --machine
              print the machine hardware name

       -i, --hardware-platform
              print the hardware platform (non-portable)

So use the portable -m version.
2016-08-30 11:55:45 -07:00
James Shubin
17adb19c0d Fix typo 2016-08-04 00:44:50 -04:00
James Shubin
1db936e253 Update docs because they were out of date 2016-08-03 05:28:18 -04:00
James Shubin
7194ba7e0e Update docs to add automatic clustering 2016-08-03 05:16:47 -04:00
James Shubin
59b9b6f091 Docs: Add FAQ entry about similarly named band 2016-08-02 04:29:56 -04:00
James Shubin
c1ec8d15f3 Improve README for first time users
We need --recursive because we have a dependency vendored this way.
2016-08-02 04:25:35 -04:00
James Shubin
24ba6abc6b Don't block on member exits
The MemberList() function which apparently looks at all of the endpoints
was blocking right after a member exited, because we still had a stale
list of endpoints, and while a member that exited safely would update
the endpoints lists, the other member would (unavoidably) race to get
that message, and so it might call the MemberList() with the now stale
endpoint list. This way we invalidate an endpoint we know to be gone
immediately.

This also adds a simple test case to catch this scenario.
2016-07-26 04:23:46 -04:00
James Shubin
f6c1bba3b6 Avoid a rare panic if DestroyServer is called early
I never actually hit this bug, but I noticed it was possible when
examining the WaitGroup code that gets .Done() by DestroyServer().
2016-07-26 01:58:13 -04:00
James Shubin
a606961a22 Be safe when closing in destroy in case client is nil 2016-07-25 21:46:08 -04:00
James Shubin
cafe0e4ec2 Round of golint fixes to improve documentation.
This is really boring :(
2016-07-25 21:36:09 -04:00
James Shubin
e28c1266cf Do some gofmt simplifications 2016-07-25 20:56:33 -04:00
James Shubin
c1605a4f22 Add test case for urfave regression
Credit to jerith for helping me hack this together :)
2016-07-25 20:25:08 -04:00
James Shubin
7aeb55de70 Port embedded etcd to embed API
This also updates the vendored version of etcd to current git master,
which is the only place this is supported at the moment.
2016-07-25 19:10:09 -04:00
James Shubin
8ca65f9fda Revert "Copy in out of tree patches"
This reverts commit d26b503dca.

Use new etcd "embed" API.
2016-07-24 00:08:58 -04:00
James Shubin
94524d1156 Revert "Revert "Allow 1.6 to fail for now""
This reverts commit 78d769797f.
2016-07-20 03:09:17 -04:00
Sharad Ganapathy
a1ed03478b Adding timer resource and usage examples 2016-07-17 14:01:36 -04:00
James Shubin
402a6379b9 Add exec3 example
This is meant to be easier to understand than just sleep's.
2016-07-14 17:37:36 -04:00
Jeremy Thurgood
5d45bcd552 Check for old golang versions while installing dependencies. 2016-07-07 10:14:56 +02:00
Jack Henschel
f1fa64c170 Add recording and slides from DebConf16 presentation by James Shubin 2016-07-06 08:40:30 +02:00
Jack Henschel
50fc78564c Remove noop functionality from TODO
noop functionality (Bug #21) has been implemented:
6bbce039aa
9f56e4a582
2016-07-05 21:33:59 +02:00
James Shubin
3e5863dc8a Get travis results faster!
Thanks to jerith for the tip
2016-07-05 07:07:44 -04:00