engine: lang: util: Kill race in socketset

After some investigation, it appears that SocketSet.Shutdown() and
SocketSet.Close() are not synchronous operations. The sendto system call
called in SocketSet.Shutdown() is not a blocking send. That means there
is a race in which SocketSet.Shutdown() sends a message to a file
descriptor to unblock select, while SocketSet.Close() will close the
file descriptor that the message is being sent to. If SocketSet.Close()
wins the race, select is listening on a dead file descriptor and will
hang indefinitely.

This is fixed in the current master by putting SocketSet.Close() inside
of the goroutine in which data from the socket is being received. It
relies on SocketSet.Shutdown() being called to terminate the goroutine.
While this works most of the time, there is a race here. All the
goroutines can also be terminated by a closeChan. If the goroutine
receives an event (thus unblocking select) and then closeChan is
triggered, both SocketSet.Shutdown() and SocketSet.Close() race, leading
to undefined behavior.

This patch ensures the ordering of the two function calls by pulling
them both out of the goroutine and separating them with a WaitGroup.

Co-authored-by: James Shubin <james@shubin.ca>
This commit is contained in:
Kevin Kuehler
2019-01-21 14:08:13 -08:00
parent 0c17a0b4f2
commit 5d0efce278
3 changed files with 22 additions and 12 deletions

View File

@@ -191,15 +191,19 @@ func (obj *NetRes) Close() error {
// TODO: currently gets events from ALL interfaces, would be nice to reject
// events from other interfaces.
func (obj *NetRes) Watch() error {
// waitgroup for netlink receive goroutine
wg := &sync.WaitGroup{}
defer wg.Wait()
// create a netlink socket for receiving network interface events
conn, err := socketset.NewSocketSet(rtmGrps, obj.socketFile, unix.NETLINK_ROUTE)
if err != nil {
return errwrap.Wrapf(err, "error creating socket set")
}
// waitgroup for netlink receive goroutine
wg := &sync.WaitGroup{}
defer conn.Close()
// We must wait for the Shutdown() AND the select inside of SocketSet to
// complete before we Close, since the unblocking in SocketSet is not a
// synchronous operation.
defer wg.Wait()
defer conn.Shutdown() // close the netlink socket and unblock conn.receive()
// watch the systemd-networkd configuration file
@@ -220,7 +224,6 @@ func (obj *NetRes) Watch() error {
wg.Add(1)
go func() {
defer wg.Done()
defer conn.Close() // close the pipe when we're done with it
defer close(nlChan)
for {
// receive messages from the socket set