Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/exec: TestExtraFilesRace flake on plan9 since 2022-11-16 #57180

Closed
millerresearch opened this issue Dec 8, 2022 · 12 comments
Closed

os/exec: TestExtraFilesRace flake on plan9 since 2022-11-16 #57180

millerresearch opened this issue Dec 8, 2022 · 12 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Plan9
Milestone

Comments

@millerresearch
Copy link
Contributor

#!watchflakes
post <- pkg == "os/exec" && test == "TestExtraFilesRace" && goos == "plan9"

On the plan9-arm builder the os/exec test sometimes fails like this:

--- FAIL: TestExtraFilesRace (0.50s)
    exec_test.go:850: iteration 2, process B got:
        
        want:
        fd3: listener 127.0.0.1:40881
        
FAIL
FAIL	os/exec	1.262s

This has been seen since 2022-11-16 (earliest log here).

The last change to TextExtraFilesRace before that date was the addition of t.Parallel() in CL 439196. The comment for that change says

Tests that use Setenv or Chdir or check for FDs opened during the test
still cannot be parallelized,

@bcmills, I haven't looked deeply into this but it does seem a FD is being opened during the test - is this a potential trouble spot?

I have managed to replicate the error by running go tool dist test go_test:net/url go_test:os go_test:os/exec enough times. If I remove the t.Parallel(), the error seems not to occur.

@bcmills bcmills added this to the Backlog milestone Dec 8, 2022
@bcmills bcmills added OS-Plan9 NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Dec 8, 2022
@bcmills
Copy link
Contributor

bcmills commented Dec 8, 2022

I haven't looked deeply into this but it does seem a FD is being opened during the test - is this a potential trouble spot?

If I'm reading correctly, it's the opposite: we expect the subprocess to print an address for FD 3 (for the listener opened with lb := listen()), but for some reason cmdDescribeFiles in the subprocess fails to wrap it in a net.FileListener.

@bcmills
Copy link
Contributor

bcmills commented Dec 8, 2022

The one on 2022-09-13 has a funky failure mode, which suggests the regression happened on or about 2022-11-16.

@bcmills
Copy link
Contributor

bcmills commented Dec 8, 2022

(Note that I added the call to t.Parallel() in that test in CL 439196, merged 2022-10-10.)

@millerresearch
Copy link
Contributor Author

The one on 2022-09-13 has a funky failure mode

Errors like "/boot/workdir/... does not exist' are evidence of a flake in the underlying Plan 9 filesystem, not specifically relevant to this issue.

@millerresearch
Copy link
Contributor Author

for some reason cmdDescribeFiles in the subprocess fails to wrap it in a net.FileListener.

The error returned from the attempt to wrap it is file+net fd3: fd2path: fd out of range or not open

@bcmills
Copy link
Contributor

bcmills commented Dec 9, 2022

That suggests perhaps a problem in either syscall.StartProcess (failing to actually forward the extra FD) or (*net.TCPListener).File (silently failing to produce a valid FD, or producing a duplicate?).

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/457115 mentions this issue: syscall: fix closing of reordered FDs in plan9 ForkExec

@millerresearch
Copy link
Contributor Author

That suggests perhaps a problem in either syscall.StartProcess (failing to actually forward the extra FD)

You're right, a logic error was sometimes causing the newly dup'ed FD to be closed.

@bcmills
Copy link
Contributor

bcmills commented Dec 13, 2022

suggests the regression happened on or about 2022-11-16.

That turned out to be off by 10½ years. 😅

(Based on the fix CL, it appears that FD forwarding on plan9 has been subtly broken ~forever, but the specific bug here was part of an attempted fix for a related bug in CL 6009046.)

@gopherbot gopherbot moved this to Done in Test Flakes Dec 13, 2022
@gopherbot
Copy link
Contributor

Found new dashboard test flakes for:

    #!watchflakes
    post <- pkg == "os/exec" && test == "TestExtraFilesRace" && goos == "plan9"
2022-12-01 21:00 plan9-arm go@cd133abc os/exec.TestExtraFilesRace (log)
--- FAIL: TestExtraFilesRace (0.48s)
    exec_test.go:847: iteration 2, process A got:

        want:
        fd3: listener 127.0.0.1:49573
2022-12-02 23:40 plan9-arm go@c0497d1a os/exec.TestExtraFilesRace (log)
--- FAIL: TestExtraFilesRace (0.49s)
    exec_test.go:850: iteration 2, process B got:

        want:
        fd3: listener 127.0.0.1:36814
2022-12-05 16:41 plan9-arm go@ad55b878 os/exec.TestExtraFilesRace (log)
--- FAIL: TestExtraFilesRace (0.50s)
    exec_test.go:847: iteration 0, process A got:

        want:
        fd3: listener 127.0.0.1:48597
2022-12-05 22:01 plan9-arm go@185e1a7b os/exec.TestExtraFilesRace (log)
--- FAIL: TestExtraFilesRace (0.50s)
    exec_test.go:850: iteration 2, process B got:

        want:
        fd3: listener 127.0.0.1:40881
2022-12-09 04:05 plan9-arm go@80f7484a os/exec.TestExtraFilesRace (log)
--- FAIL: TestExtraFilesRace (0.52s)
    exec_test.go:850: iteration 2, process B got:

        want:
        fd3: listener 127.0.0.1:55404

watchflakes

@gopherbot gopherbot reopened this Jan 16, 2023
@bcmills
Copy link
Contributor

bcmills commented Jan 17, 2023

The above failures predate the fix (see #57632).

@bcmills bcmills closed this as completed Jan 17, 2023
@golang golang locked and limited conversation to collaborators Jan 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Plan9
Projects
Archived in project
Development

No branches or pull requests

3 participants