engine: lang: util: Kill race in socketset
After some investigation, it appears that SocketSet.Shutdown() and SocketSet.Close() are not synchronous operations. The sendto system call called in SocketSet.Shutdown() is not a blocking send. That means there is a race in which SocketSet.Shutdown() sends a message to a file descriptor to unblock select, while SocketSet.Close() will close the file descriptor that the message is being sent to. If SocketSet.Close() wins the race, select is listening on a dead file descriptor and will hang indefinitely. This is fixed in the current master by putting SocketSet.Close() inside of the goroutine in which data from the socket is being received. It relies on SocketSet.Shutdown() being called to terminate the goroutine. While this works most of the time, there is a race here. All the goroutines can also be terminated by a closeChan. If the goroutine receives an event (thus unblocking select) and then closeChan is triggered, both SocketSet.Shutdown() and SocketSet.Close() race, leading to undefined behavior. This patch ensures the ordering of the two function calls by pulling them both out of the goroutine and separating them with a WaitGroup. Co-authored-by: James Shubin <james@shubin.ca>
This commit is contained in:
@@ -108,18 +108,22 @@ func TestNfd(t *testing.T) {
|
||||
|
||||
// test SocketSet.Shutdown()
|
||||
func TestShutdown(t *testing.T) {
|
||||
wg := &sync.WaitGroup{}
|
||||
defer wg.Wait()
|
||||
|
||||
// pass 0 so we create a socket that doesn't receive any events
|
||||
ss, err := NewSocketSet(0, "pipe.sock", 0)
|
||||
if err != nil {
|
||||
t.Errorf("could not create SocketSet: %+v", err)
|
||||
}
|
||||
// waitgroup for netlink receive goroutine
|
||||
wg := &sync.WaitGroup{}
|
||||
defer ss.Close()
|
||||
// We must wait for the Shutdown() AND the select inside of SocketSet to
|
||||
// complete before we Close, since the unblocking in SocketSet is not a
|
||||
// synchronous operation.
|
||||
defer wg.Wait()
|
||||
defer ss.Shutdown() // close the netlink socket and unblock conn.receive()
|
||||
|
||||
closeChan := make(chan struct{})
|
||||
defer close(closeChan)
|
||||
defer ss.Close()
|
||||
defer ss.Shutdown()
|
||||
|
||||
// create a listener that never receives any data
|
||||
wg.Add(1) // add a waitgroup to ensure this will block if we don't properly unblock Select
|
||||
|
||||
Reference in New Issue
Block a user