Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/exec: tests break testing package if testlog file is not file descriptor 3 #24285

Closed
siebenmann opened this issue Mar 7, 2018 · 23 comments
Closed
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@siebenmann
Copy link

What version of Go are you using (go version)?

go version devel +2c0c68d621 Tue Mar 6 23:33:28 2018 +0000 linux/amd64

(This is the current git tip as I write this.)

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOOS="linux"

What did you do?

On one of my slower machines I'm getting frequent but not completely consistent test failures in the test phase of all.bash during testing of os.exec. This reproduces when testing just os/exec on its own:

$ go test os/exec
PASS
testing: can't write /tmp/go-build685356926/b001/testlog.txt: write /tmp/go-build685356926/b001/testlog.txt: broken pipe
FAIL    os/exec 0.400s

This failure happens on an i5-2500 (running Fedora 27). A Fedora 27 Ryzen 1800X machine consistently passes. This only started happening on recent versions of master (initially I wrote it off as flakiness on my machine).

It's possible that this is related to #24050.

@andybons andybons added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 7, 2018
@andybons
Copy link
Member

andybons commented Mar 7, 2018

/cc @ianlancetaylor

@andybons andybons added this to the Go1.11 milestone Mar 7, 2018
@ianlancetaylor
Copy link
Contributor

This is a strange error. The file is an ordinary file, not a pipe. Why would a write fail with EPIPE? I don't see any pipe here.

Can you run running go test -c os/exec and then ./exec.test -test.short -test.testlogfile=/tmp/x? If you can recreate the problem that way, please attach the output of strace -f ./exec.test -test.short -test.testlogfile=/tmp/x. Thanks.

@siebenmann
Copy link
Author

I don't know if I'm reproducing it, but I'm having something unusual happen:

$ ./exec.test -test.short -test.testlogfile=/tmp/x
--- FAIL: TestCatGoodAndBadFile (0.00s)
        exec_test.go:141: expected test code; got "Error: open exec_test.go: no such file or directory\n" (len 52)
FAIL
testing: can't write /tmp/x: write /tmp/x: bad file descriptor

Here is the strace output: strace-out.txt.

@siebenmann
Copy link
Author

Okay, if I generate and run exec.test from src/os/exec, it reports that it passes and then has the testing: error. Here's the strace output for that all-passing case, in case it makes a difference:
strace-out.txt

@ianlancetaylor
Copy link
Contributor

Thanks. Judging by the strace output, file descriptor 4 is already open in your environment when the process starts. The test does not expect that. Are you aware of anything that would cause the process to be started with file descriptor already open?

@siebenmann
Copy link
Author

I'm not aware of anything in my environment that would cause file descriptor 4 to already be open, but the next time I have this test failure I'll see what lsof says about my shell's file descriptors and so on.

(However, looking at the strace output appears to show some use of file descriptor 4 when opening things, which seems odd if it's in use when the test run starts. There are quite a lot of uses of openat that return fd 4, along with eg socket and fcntl being used to duplicate fds.)

@ianlancetaylor
Copy link
Contributor

Those cases where openat or socket return descriptor 4 are after a long string of close calls that close all open file descriptors starting at 4. That string of close calls is from closeUnexpectedFds in os/exec/exec_test.go. Before that string of close calls, calls to openat, etc., are returning 5, not 4.

@ianlancetaylor
Copy link
Contributor

(I've been focusing on file descriptor 4 for no good reason, but it seems that file descriptor 3 is also unexpectedly open.)

@siebenmann
Copy link
Author

I've now discovered what I do that causes the unexpectedly opened file descriptors, including the pipe. My sequence is:

  1. git pull to update to the latest master
  2. git log <a>...<b> to read the changes, which starts less as my pager.
  3. inside less, use !xterm & to start a new xterm where I will run a build while I'm reading the changes.
  4. this creates a shell inside the new xterm that has inherited already opened file descriptors from less; it appears that fd 3 is /dev/tty and fd 4 is the pipe from git.
  5. start building with all.bash.

It doesn't appear to matter if the pipe has been closed by git by the time the Go build reaches testing os/exec; either way things fail.

This doesn't happen consistently because I don't always start a new xterm from inside the less. Sometimes I read everything before starting the build, and sometimes I start a new xterm from my window manager (doing that is slightly less convenient because I then have to cd to the Go source directory, which is why I often start it from inside less).

@ianlancetaylor
Copy link
Contributor

I don't see a clean way to avoid this problem. Fundamentally we want to check that the file descriptor of the testlog file is 3, but I don't see a good way to find that out.

@ianlancetaylor ianlancetaylor added the NeedsFix The path to resolution is known, but the work has not been done. label Mar 7, 2018
@gopherbot gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 7, 2018
@ianlancetaylor ianlancetaylor added help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Mar 7, 2018
@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Unplanned Mar 7, 2018
@gopherbot gopherbot removed the NeedsFix The path to resolution is known, but the work has not been done. label Mar 7, 2018
@ianlancetaylor ianlancetaylor changed the title testing: overall test failure in os/exec with a broken pipe error os/exec: tests break testing package if testlog file is not file descriptor 3 Mar 7, 2018
@ianlancetaylor ianlancetaylor added the NeedsFix The path to resolution is known, but the work has not been done. label Mar 7, 2018
@gopherbot gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 7, 2018
NexediGitlab pushed a commit to SlapOS/slapos.core that referenced this issue Apr 4, 2018
…node software'


Some context: I'm trying to build Go 1.10 under slapos and get failures
like:

	ok   os      0.782s
	PASS
	testing: can't write /tmp/go-build511454695/b682/testlog.txt: write /tmp/go-build511454695/b682/testlog.txt: broken pipe
	FAIL os/exec 0.947s
	...
	ok   cmd/vet/internal/cfg    0.002s
	2018/04/03 18:13:29 Failed: exit status 1
	golang1.10: Command failed with exit code 1:   cd src && ls -l /proc/self/fd && ./all.bash && cp -alf .. /srv/slapgrid/slappart5/srv/runner/software/099a7368550e95d120639cb03b6d47c8/parts/golang1.10
	golang1.10: Compilation error. The package is left as is at /srv/slapgrid/slappart5/srv/runner/software/099a7368550e95d120639cb03b6d47c8/parts/golang1.10__compile__/go where you can inspect what went wrong
	While:
	  Installing golang1.10.
	Error: System error

This was traced to golang/go#24285, which in turn was
traced by me to the fact that golang os/exec tests close all file descriptors >
4 assuming that only stdin, stdout, stderr, epoll and testlog file descriptors
are there:

	https://github.com/golang/go/blob/26e0e8a840/src/os/exec/exec_test.go#L402
	https://github.com/golang/go/blob/26e0e8a840/src/os/exec/exec_test.go#L415

However when go build is run under slapos it starts with the following
descriptors inherited in the environment:

	lr-x------ 1 slapuser5 slapuser5 64 Apr  3 17:57 0 -> pipe:[2969903608]
	l-wx------ 1 slapuser5 slapuser5 64 Apr  3 17:57 1 -> pipe:[2969903609]
	l-wx------ 1 slapuser5 slapuser5 64 Apr  3 17:57 2 -> pipe:[2969903609]
	l-wx------ 1 slapuser5 slapuser5 64 Apr  3 17:57 3 -> /srv/slapgrid/slappart5/srv/runner/software.log

i.e. the file descriptor for software.log (under `slapos node software`) is
passed opened to spawned buildout commands -> and oops - go build failure.

I'm not saying go behaviour is correct. However for slapos it is also not good
to leak e.g. log file descriptors to spawned processes (e.g. what if spawned
child writes there by mistake?).

Since fixing on Go side is not very easy, because it will probably require
non-small testing infrastructure refactoring, I decided to fix on SlapOS side,
because it is a) easier, and b) correct & good to do in any way.

----

To fix we just pass close_fds=True to subprocess.Popen run from under
SlapPopen, which means that popen will close all file descriptors besides
stdin, stdout & stderr for child. close_fds default is already True for python3

	https://www.python.org/dev/peps/pep-0433/#subprocess-close

for the reason that it is frequent mistake for programmers to not leak file
descriptors to spawned processes.

However on python2, which we are still using, close_fds default is False.

So let's fix the default on our side.

/reviewed-by @jerome, @rafael
/reviewed-on https://lab.nexedi.com/nexedi/slapos.core/merge_requests/42
@navytux
Copy link
Contributor

navytux commented Apr 4, 2018

For the reference I'm also having this problem when building Go1.10 under SlapOS: the builder was (wrongly) leaking opened descriptor to log file and this is causing the same

testing: can't write /tmp/go-build685356926/b001/testlog.txt: write /tmp/go-build685356926/b001/testlog.txt: broken pipe

failure.

Things are now fixed on SlapOS side, but it will take time for the fix to propagate everywhere.

On the other hand for the Go side maybe it makes more sense to not assume fixed fd number for the testlog.txt file and do things similarly to how os/exec test is already checking for file descriptor used by runtime for polling:

func closeUnexpectedFds(t *testing.T, m string) {
        for fd := basefds(); fd <= 101; fd++ {
                if fd == poll.PollDescriptor() {          // <- NOTE here
                        continue
                }
                err := os.NewFile(fd, "").Close()
                if err == nil {
                        t.Logf("%s: Something already leaked - closed fd %d", m, fd)
                }
        }
}

(https://github.com/golang/go/blob/26e0e8a840/src/os/exec/exec_test.go#L415)

because after all it is a regression compared to Go1.9 which did not had this problem.

Thanks beforehand,
Kirill

@ianlancetaylor
Copy link
Contributor

Unfortunately I don't think that approach will work, as the test will still fail. It will fail because the child process will see the testlog argument but will not have interpreted it so the actual testlog file will not be known. And it won't work because it will still see an unexpectedly open file descriptor, although at least in that case the error message will be more clear.

@ar4mirez
Copy link

This same issue happen to me and I found that its related with the terminal, for example I was using hyper terminal and once I tried in iterm2 and the regular mac terminal worked.

I hope this helps, cheers.

@jefferai
Copy link

Seeing this issue on an Ubuntu 18.04 machine (using Konsole, FWIW) when running the 1.11 all.bash file.

@dhobsd
Copy link
Contributor

dhobsd commented Feb 8, 2019

This is breaking our internal builders for our host toolchain for Fuchsia as well. Our CI/CD hooks up an extra pipe into the subprocess it calls, and there's not a lovely workaround for this.

@ianlancetaylor What if this test has an init function that runs on systems that provide /proc/self/fd and use the largest fd number returned from there? And then basefds returns that value instead of its current dance if that value is nonzero?

@dhobsd
Copy link
Contributor

dhobsd commented Feb 8, 2019

(Or alternatively just closed fd 3 in init.)

@dhobsd
Copy link
Contributor

dhobsd commented Feb 8, 2019

Oh, right, none of that works, ignore me. I'll see what I can do about closing the descriptor before running run.bash.

@coderste
Copy link

coderste commented Aug 16, 2019

I'm getting this same error when running ./all.bash after downloading the source code following the contribution guidelines :/

Go version

go version go1.12.9 darwin/amd64

Error I'm getting is

testing: can't write /var/folders/dc/pxx2c5t503jdfy7c29qrlmp9nkvqm9/T/go-build291055610/b822/testlog.txt: write /var/folders/dc/pxx2c5t503jdfy7c29qrlmp9nkvqm9/T/go-build291055610/b822/testlog.txt: socket is not connected
FAIL    os/exec 2.471s

@ianlancetaylor
Copy link
Contributor

@coderste See previous discussion; is there an open file descriptor 3 or greater while running all.bash?

@bcmills
Copy link
Contributor

bcmills commented Dec 14, 2022

Is this still an issue? closeUnexpectedFds was removed in CL 206939.

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Dec 14, 2022
@siebenmann
Copy link
Author

I can't reproduce this with my original test (and I don't think I've seen this in general for some time). I suspect that it's gone, but maybe other people are still seeing this in their environments.

@bcmills
Copy link
Contributor

bcmills commented Dec 14, 2022

and I don't think I've seen this in general for some time

The aforementioned CL was merged 2019-11-13, so that's consistent with the theory that it fixed the failure. 🙂

@bcmills bcmills added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed help wanted WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Dec 14, 2022
@gopherbot gopherbot removed the NeedsFix The path to resolution is known, but the work has not been done. label Dec 14, 2022
@ianlancetaylor
Copy link
Contributor

I think this is basically the same as #35469 which we fixed. So let's call this one fixed as well. Thanks.

@golang golang locked and limited conversation to collaborators Dec 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

10 participants