Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/exec: race between Wait and ctx.Done #16028

Closed
ianlancetaylor opened this issue Jun 10, 2016 · 8 comments
Closed

os/exec: race between Wait and ctx.Done #16028

ianlancetaylor opened this issue Jun 10, 2016 · 8 comments
Milestone

Comments

@ianlancetaylor
Copy link
Contributor

There is an unlikely race condition when using os/exec.CommandContext. In Cmd.Wait, one goroutine sleeps in a select on ctx.Done and waitDone, while another goroutine sleeps in c.Process.Wait. If the process terminates at the same time as the context, then it is possible for the first goroutine to wake up and receive from ctx.Done and call c.Process.Kill just as the code in os.Process.Wait returns from syscall.Wait4. At that point the process ID is available for reuse, and a new process may be started with the same process ID. The call to c.Process.Kill will call c.Process.Signal. Because Process.wait has not yet had a chance to call Process.setDone, c.Process.Signal will call syscall.Kill with the old process ID. This could potentially kill an unrelated process.

This is similar to the problem described in #13987; the race described there has now been brought into the standard library.

Marking as 1.8 because I don't see a simple way to fix this.

@ianlancetaylor ianlancetaylor added this to the Go1.8 milestone Jun 10, 2016
@cespare
Copy link
Contributor

cespare commented Jun 10, 2016

Would it be better to remove CommandContext for this release if we cannot ship a fixed version?

@ianlancetaylor
Copy link
Contributor Author

Here is a complex way to fix the problem. Add the following internal API to the runtime package.

// sigchldCount returns the number of SIGCHLD signals received since the process started.
runtime.sigchldCount() int
// waitForSIGCHLD sleeps until more than c SIGCHLD signals have been received.
runtime.waitForSIGCHLD(c int)

With those functions, the code in os.Process.wait can be written as

for {
    c := runtime.sigchldCount()
    pid1, e := syscall.Wait4(p.Pid, &status, syscall.WNOHANG, &rusage)
    if e != nil {
        return nil, NewSyscallError("wait", e)
    }
    if pid1 != 0 {
        break
    }
    runtime.waitForSIGCHLD(c)
}
p.setDone()

@ianlancetaylor
Copy link
Contributor Author

@cespare I'm inclined to leave it in. It's a very unlikely race. Not only do the process and the context have to end at exactly the same time quantum, but the process ID has to be reused within that same time quantum as well.

Happy to hear other opinions, though.

@ianlancetaylor
Copy link
Contributor Author

I think I see how to do this on systems that support the POSIX.1-2008 waitid function.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/23967 mentions this issue.

gopherbot pushed a commit that referenced this issue Jun 10, 2016
On systems that support the POSIX.1-2008 waitid function, we can use it
to block until a wait will succeed. This avoids a possible race
condition: if a program calls p.Kill/p.Signal and p.Wait from two
different goroutines, then it is possible for the wait to complete just
before the signal is sent. In that case, it is possible that the system
will start a new process using the same PID between the wait and the
signal, causing the signal to be sent to the wrong process. The
Process.isdone field attempts to avoid that race, but there is a small
gap of time between when wait returns and isdone is set when the race
can occur.

This CL avoids that race by using waitid to wait until the process has
exited without actually collecting the PID. Then it sets isdone, then
waits for any active signals to complete, and only then collects the PID.

No test because any plausible test would require starting enough
processes to recycle all the process IDs.

Update #13987.
Update #16028.

Change-Id: Id2939431991d3b355dfb22f08793585fc0568ce8
Reviewed-on: https://go-review.googlesource.com/23967
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@ianlancetaylor
Copy link
Contributor Author

This is now fixed on GNU/Linux.

Since the fix was to the os package, not the os/exec package, I'm now declaring this issue a dup of #13987.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/24021 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/24020 mentions this issue.

@mikioh mikioh modified the milestones: Go1.7, Go1.8 Jun 13, 2016
gopherbot pushed a commit that referenced this issue Jun 13, 2016
This change is a followup to https://go-review.googlesource.com/23967
for Darwin.

Updates #13987.
Updates #16028.

Change-Id: Ib1fb9f957fafd0f91da6fceea56620e29ad82b00
Reviewed-on: https://go-review.googlesource.com/24020
Reviewed-by: Ian Lance Taylor <iant@golang.org>
gopherbot pushed a commit that referenced this issue Jun 13, 2016
This change is a followup to https://go-review.googlesource.com/23967
for FreeBSD.

Updates #13987.
Updates #16028.

Change-Id: I0f0737372fce6df89d090fe9847305749b79eb4c
Reviewed-on: https://go-review.googlesource.com/24021
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@golang golang locked and limited conversation to collaborators Jun 13, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants