-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/exec: tests break testing package if testlog file is not file descriptor 3 #24285
Comments
/cc @ianlancetaylor |
This is a strange error. The file is an ordinary file, not a pipe. Why would a write fail with Can you run running |
I don't know if I'm reproducing it, but I'm having something unusual happen:
Here is the strace output: strace-out.txt. |
Okay, if I generate and run |
Thanks. Judging by the strace output, file descriptor 4 is already open in your environment when the process starts. The test does not expect that. Are you aware of anything that would cause the process to be started with file descriptor already open? |
I'm not aware of anything in my environment that would cause file descriptor 4 to already be open, but the next time I have this test failure I'll see what lsof says about my shell's file descriptors and so on. (However, looking at the strace output appears to show some use of file descriptor 4 when opening things, which seems odd if it's in use when the test run starts. There are quite a lot of uses of |
Those cases where |
(I've been focusing on file descriptor 4 for no good reason, but it seems that file descriptor 3 is also unexpectedly open.) |
I've now discovered what I do that causes the unexpectedly opened file descriptors, including the pipe. My sequence is:
It doesn't appear to matter if the pipe has been closed by This doesn't happen consistently because I don't always start a new xterm from inside the less. Sometimes I read everything before starting the build, and sometimes I start a new xterm from my window manager (doing that is slightly less convenient because I then have to |
I don't see a clean way to avoid this problem. Fundamentally we want to check that the file descriptor of the testlog file is |
…node software' Some context: I'm trying to build Go 1.10 under slapos and get failures like: ok os 0.782s PASS testing: can't write /tmp/go-build511454695/b682/testlog.txt: write /tmp/go-build511454695/b682/testlog.txt: broken pipe FAIL os/exec 0.947s ... ok cmd/vet/internal/cfg 0.002s 2018/04/03 18:13:29 Failed: exit status 1 golang1.10: Command failed with exit code 1: cd src && ls -l /proc/self/fd && ./all.bash && cp -alf .. /srv/slapgrid/slappart5/srv/runner/software/099a7368550e95d120639cb03b6d47c8/parts/golang1.10 golang1.10: Compilation error. The package is left as is at /srv/slapgrid/slappart5/srv/runner/software/099a7368550e95d120639cb03b6d47c8/parts/golang1.10__compile__/go where you can inspect what went wrong While: Installing golang1.10. Error: System error This was traced to golang/go#24285, which in turn was traced by me to the fact that golang os/exec tests close all file descriptors > 4 assuming that only stdin, stdout, stderr, epoll and testlog file descriptors are there: https://github.com/golang/go/blob/26e0e8a840/src/os/exec/exec_test.go#L402 https://github.com/golang/go/blob/26e0e8a840/src/os/exec/exec_test.go#L415 However when go build is run under slapos it starts with the following descriptors inherited in the environment: lr-x------ 1 slapuser5 slapuser5 64 Apr 3 17:57 0 -> pipe:[2969903608] l-wx------ 1 slapuser5 slapuser5 64 Apr 3 17:57 1 -> pipe:[2969903609] l-wx------ 1 slapuser5 slapuser5 64 Apr 3 17:57 2 -> pipe:[2969903609] l-wx------ 1 slapuser5 slapuser5 64 Apr 3 17:57 3 -> /srv/slapgrid/slappart5/srv/runner/software.log i.e. the file descriptor for software.log (under `slapos node software`) is passed opened to spawned buildout commands -> and oops - go build failure. I'm not saying go behaviour is correct. However for slapos it is also not good to leak e.g. log file descriptors to spawned processes (e.g. what if spawned child writes there by mistake?). Since fixing on Go side is not very easy, because it will probably require non-small testing infrastructure refactoring, I decided to fix on SlapOS side, because it is a) easier, and b) correct & good to do in any way. ---- To fix we just pass close_fds=True to subprocess.Popen run from under SlapPopen, which means that popen will close all file descriptors besides stdin, stdout & stderr for child. close_fds default is already True for python3 https://www.python.org/dev/peps/pep-0433/#subprocess-close for the reason that it is frequent mistake for programmers to not leak file descriptors to spawned processes. However on python2, which we are still using, close_fds default is False. So let's fix the default on our side. /reviewed-by @jerome, @rafael /reviewed-on https://lab.nexedi.com/nexedi/slapos.core/merge_requests/42
For the reference I'm also having this problem when building Go1.10 under SlapOS: the builder was (wrongly) leaking opened descriptor to log file and this is causing the same
failure. Things are now fixed on SlapOS side, but it will take time for the fix to propagate everywhere. On the other hand for the Go side maybe it makes more sense to not assume fixed fd number for the testlog.txt file and do things similarly to how os/exec test is already checking for file descriptor used by runtime for polling: func closeUnexpectedFds(t *testing.T, m string) {
for fd := basefds(); fd <= 101; fd++ {
if fd == poll.PollDescriptor() { // <- NOTE here
continue
}
err := os.NewFile(fd, "").Close()
if err == nil {
t.Logf("%s: Something already leaked - closed fd %d", m, fd)
}
}
} (https://github.com/golang/go/blob/26e0e8a840/src/os/exec/exec_test.go#L415) because after all it is a regression compared to Go1.9 which did not had this problem. Thanks beforehand, |
Unfortunately I don't think that approach will work, as the test will still fail. It will fail because the child process will see the testlog argument but will not have interpreted it so the actual testlog file will not be known. And it won't work because it will still see an unexpectedly open file descriptor, although at least in that case the error message will be more clear. |
This same issue happen to me and I found that its related with the terminal, for example I was using hyper terminal and once I tried in iterm2 and the regular mac terminal worked. I hope this helps, cheers. |
Seeing this issue on an Ubuntu 18.04 machine (using Konsole, FWIW) when running the 1.11 |
This is breaking our internal builders for our host toolchain for Fuchsia as well. Our CI/CD hooks up an extra pipe into the subprocess it calls, and there's not a lovely workaround for this. @ianlancetaylor What if this test has an init function that runs on systems that provide /proc/self/fd and use the largest fd number returned from there? And then basefds returns that value instead of its current dance if that value is nonzero? |
(Or alternatively just closed fd 3 in init.) |
Oh, right, none of that works, ignore me. I'll see what I can do about closing the descriptor before running run.bash. |
I'm getting this same error when running
|
@coderste See previous discussion; is there an open file descriptor 3 or greater while running all.bash? |
Is this still an issue? |
I can't reproduce this with my original test (and I don't think I've seen this in general for some time). I suspect that it's gone, but maybe other people are still seeing this in their environments. |
The aforementioned CL was merged 2019-11-13, so that's consistent with the theory that it fixed the failure. 🙂 |
I think this is basically the same as #35469 which we fixed. So let's call this one fixed as well. Thanks. |
What version of Go are you using (
go version
)?go version devel +2c0c68d621 Tue Mar 6 23:33:28 2018 +0000 linux/amd64
(This is the current git tip as I write this.)
What operating system and processor architecture are you using (
go env
)?What did you do?
On one of my slower machines I'm getting frequent but not completely consistent test failures in the test phase of
all.bash
during testing ofos.exec
. This reproduces when testing just os/exec on its own:This failure happens on an i5-2500 (running Fedora 27). A Fedora 27 Ryzen 1800X machine consistently passes. This only started happening on recent versions of master (initially I wrote it off as flakiness on my machine).
It's possible that this is related to #24050.
The text was updated successfully, but these errors were encountered: