engine: graph: Cleanup pause/resume code

There's always the fear that there is either a panic or a deadlock in
the highly concurrent engine resource code. I have not seen one recently
and I've been running some pretty concurrent tests. In the meantime, and
with my hopefully improved knowledge of concurrency, I decided to
rewrite some of the "uglier" parts of the engine. I think it is a lot
clearer now, and much less likely that there is a concurrency issue.

This has been tested by running the examples/lang/fastcount.mcl example.
This commit is contained in:
James Shubin
2023-08-30 21:17:05 -04:00
parent 2edae22a65
commit 2773a621a2
3 changed files with 82 additions and 55 deletions

View File

@@ -255,9 +255,9 @@ func (obj *Engine) Commit() error {
free := []func() error{} // functions to run after graphsync to reset...
vertexRemoveFn := func(vertex pgraph.Vertex) error {
// wait for exit before starting new graph!
close(obj.state[vertex].removeDone) // causes doneCtx to cancel
obj.state[vertex].Resume() // unblock from resume
obj.waits[vertex].Wait() // sync
close(obj.state[vertex].removeDone) // causes doneCtx to cancel
close(obj.state[vertex].resumeSignal) // unblock (it only closes here)
obj.waits[vertex].Wait() // sync
// close the state and resource
// FIXME: will this mess up the sync and block the engine?
@@ -372,8 +372,22 @@ func (obj *Engine) Resume() error {
reversed := pgraph.Reverse(topoSort)
for _, vertex := range reversed {
// The very first resume is skipped as those resources are
// already running! We could do that by checking here, but it is
// more convenient to just have a state struct field (paused) to
// track things for this instead. As a bonus, it helps us know
// if a resource is paused or not if we print for debugging.
//if !obj.state[vertex].initialStartupDone {
// obj.state[vertex].initialStartupDone = true
// continue
//}
//obj.state[vertex].starter = (indegree[vertex] == 0)
obj.state[vertex].Resume() // doesn't error
// This always works because if a resource errored while it was
// paused, then we're in the paused state and we can still exit
// from there. If a resource errors when we're trying to Pause
// then it will only succeed without error if the resource ACKs.
}
// we wait for everyone to start before exiting!
obj.paused = false