The retry and limit "satellite" event loops didn't allow pausing or
resuming, and instead you needed to wait until either was done before
you could pause.
The downside of this patch is that for very fast graph transitions, we
wouldn't be really obeying the limits anymore, however now that we have
per resource kind+name uid, we can persist the limits across graph swaps
if we want to.
Most importantly, this allows us to exit entirely when we're stuck in
one of these satellite loops.
This adds a meta state store that is preserved between graph switches if
the kind and name match. This is useful so that rapid graph changes
don't necessarily reset their retry count if they've only changed one
resource field.
This simplifies the pause mechanism and also avoids a deadlock on error.
If the Worker shuts down completely, but before we've been removed from
the graph, then an attempted pause would deadlock if we didn't have an
escape hatch here.
This removes the unnecessary ack mechanism now that we have a
synchronous channel send to represent the pausing, rather than an
asynchronous channel closing.
There's always the fear that there is either a panic or a deadlock in
the highly concurrent engine resource code. I have not seen one recently
and I've been running some pretty concurrent tests. In the meantime, and
with my hopefully improved knowledge of concurrency, I decided to
rewrite some of the "uglier" parts of the engine. I think it is a lot
clearer now, and much less likely that there is a concurrency issue.
This has been tested by running the examples/lang/fastcount.mcl example.
This adds a new test functions package and also a new "fastcount"
function which counts up from zero as fast as is possible. You probably
don't want to use this in production, but it is useful for performance
and deadlock testing the resource and function engines.
This ports the integration tests to use a standalone etcd server instead
of depending on the flaky elastic etcd clustering. Hopefully we will
polish and/or reimplement that at some point in the future, but at least
for now let's make things reliable.
Standalone etcd is useful for when we don't want to use the embedded
version to make it easier to deploy somewhere or for testing.
This pulls in about the same amount of code since we already embedded
etcd previously. Since the embedded etcd feature of mgmt is not very
stable, we'll add this for now.
When mgmt is in etcd-client-only mode and using an external etcd server,
we don't want to unset our only known endpoint since this would deadlock
our etcd client since it can't connect to anyone. This could have
happened because a plain etcd server didn't set any endpoints to follow,
and as a result we noticed it was empty and decided to use that instead.
To workaround this issue on an earlier version of mgmt, you would have
had to run:
etcdctl put /_mgmt/endpoints/etcd http://localhost:2379
to set this magic key on the initial etcd server.
There are many reasonable cases where we might want to allow a dynamic
format string. Support that situation by adding the new invariants that
are needed for those cases.
We were skipping over being fully consistent with all of the generator
invariants when running the solver. This allowed us to miss some of the
conditions that a generator might impose. Usually this caused us to be
"solved" when in fact we had an invalid program.
As per https://go.dev/blog/rebuild if we include -trimpath then we'll
not store full build paths in the build artifacts.
We still have some CGO dependencies, but we'll look into those
separately.
There were some bugs about setting resource fields that were structs
with various fields. This makes things more strict and correct. Now we
check for duplicate field names earlier (duplicates due to identical
aliases) and we also don't try and set private fields, or incorrectly
set partial structs.
Most interestingly, this also cleans up all of the resources and ensures
that each one has nicer docs and a clear struct tag for fields that we
want to use in mcl. These are mandatory now, and if you're missing the
tag, then we will ignore the field.
We also add a backup fix to avoid a panic in case we ever hit a new
unification bug that lets something through, we can at least turn it
into a runtime issue. This adds a test as well.
We want to add the equality to the main list of equalities in case it is
needed somewhere else. This doesn't have any effect on any of our test
cases, but it doesn't seem to be harmful, and it could conceivably be
useful in the future. This is a separate commit in case we want to check
if this behaviour still holds true well into the future.
These two vars are used pretty deeply through this code, so rename them
to prevent any confusion.
We also switch from a range loop to a counter so that the loop list can
be changed while it's looping.
The set of initial invariants that we see might include:
?8 = func(inputs []int, function ?4) ?3
?8 = func(inputs []int, function ?10) ?9
?8 = func arg0 []int, arg1 ?6) ?7
From this we can infer that since they are all equal, that we also know
that ?4, ?10 and ?6 must also be equal. The same is true of ?3, ?9 and
?10. Those new equalities are sometimes necessary in order to complete
the full unification.
The second interesting aspect is when we have dissimilar equalities:
?2 = func(x ?1) str
?4 = func(a int) ?5
?10 = func(a int) ?11
In this example we also have an additional equality:
?6 = ?2
From this and the above we can determine that ?2, ?4, ?6 and ?10 are all
equal. We only know about ?4, ?6, and ?10 from the direct relationship,
and we add in ?2 from the indirect (graph) relationship. These
relationships let us determine new information that ?5 and ?11 are both
str and that ?1 is an int.
Two important reminders:
1) Arg names don't have to match. It would impossible to build such a
system where this was both possible, but also let us name our
functions sanely.
2) None of this guarantees we won't find an inconsistency in our
solution. If this is found, it simply means that someone wrote code
which does not type check.
When we were running our partial unification to compare similar looking
function invariants, we would sometimes learn new information, even if
it didn't entirely solve that invariant. When this new information was
learned, it's important that we globally mark it as solved so that
others can use it to complete their invariants. When it was a critical
piece of information, this would result in new information, which would
eventually lead to that original partial invariant that we learned that
information from to becoming solved.
This presents things in a more formal way for those who are more
familiar with standard type unification syntax.
Patch created by Sam, and cleaned up by James.
This removes the `Close() error` and replaces it with a more modern
Stream API that takes a context. This removes boilerplate and makes
integration with concurrent code easier. The only downside is that there
isn't an explicit cleanup step, but only one function was even using
that and it was possible to switch it to a defer in Stream.
This also renames the functions from polyfunc to just func which we
determine by API not naming.
This adds the BoundedReadSemaphore mutex that I invented. It's not that
is necessarily particularly revolutionary, but it is useful. I wish I
had a better name for it, but I couldn't think of anything. It's fairly
obvious what it does, so if you have a suggestion of how to name it,
please do so!
Not sure how this snuck in!
I've now ran a quick grep across the code base, and I can't find any
similar mistakes.
ack '.Done()' | grep -v defer | grep -iv ctx # then check these