Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky test: test-child-process-flush-stdio #4125

Closed
jbergstroem opened this issue Dec 3, 2015 · 10 comments
Closed

flaky test: test-child-process-flush-stdio #4125

jbergstroem opened this issue Dec 3, 2015 · 10 comments
Labels
child_process Issues and PRs related to the child_process subsystem. test Issues and PRs related to the tests.

Comments

@jbergstroem
Copy link
Member

@jbergstroem jbergstroem added the test Issues and PRs related to the tests. label Dec 3, 2015
@cjihrig
Copy link
Contributor

cjihrig commented Dec 3, 2015

This test was just changed in 34b535f. Either the commit will have to be reverted, or the test is flakey.

EDIT: Obviously the test is flakey. I meant it might be possible that only the test needs to be updated.

@cjihrig
Copy link
Contributor

cjihrig commented Dec 3, 2015

Also, cc: @davidvgalbraith

@davidvgalbraith
Copy link

Shucks! I'll see what I can do for this tonight.

@mscdex mscdex added the child_process Issues and PRs related to the child_process subsystem. label Dec 3, 2015
@davidvgalbraith
Copy link

I still can't see the CI output, is it just the close event handler never getting called?

@cjihrig
Copy link
Contributor

cjihrig commented Dec 3, 2015

Yes, here is the output:

not ok 36 test-child-process-flush-stdio.js
# Mismatched <anonymous> function calls. Expected 1, actual 0.
# at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos5-64/test/parallel/test-child-process-flush-stdio.js:8:22)
# at Module._compile (module.js:399:26)
# at Object.Module._extensions..js (module.js:406:10)
# at Module.load (module.js:345:32)
# at Function.Module._load (module.js:302:12)
# at Function.Module.runMain (module.js:431:10)
# at startup (node.js:138:18)
# at node.js:976:3

@davidvgalbraith
Copy link

I've been running while :; do ./node test/parallel/test-child-process-flush-stdio.js ; done in three parallel terminals on my Mac for the last 2 hours, which adds up to tens of thousands of runs, and every one has passed. In another terminal, I've got while :; do /usr/bin/python tools/test.py --mode=release parallel -J; done; running to see if there's some interplay between tests that breaks it, and that's gone 30 runs without the test in question failing.

I notice that the failed tests happened on centos5, so I'm thinking there's some platform compatibility issue at play here. Currently building Node on a centos5 VM, when that finishes up I'll run similar testing and report back.

@jbergstroem
Copy link
Member Author

Looks pretty consistent on both centos5-64 and -32.

@davidvgalbraith
Copy link

Yep, I have it failing 100% of the time on my centos5 VM. Let's see if I can fix it...

@davidvgalbraith
Copy link

Ok, so here's the order of operations on my Mac:

  1. child_process.createSocket creates the this.stdout socket, which kicks off a call to Socket.Readable.read.
  2. echo completes, calling this._handle.onexit from internal/child_process.js.
  3. The child process calls flushStdio, which calls flush() on this.stdout, setting its state.flowing to true.
  4. The read from step 1 finishes at this point, sees that state.flowing is true in readableAddChunk, emits the data and kicks off another read which hits EOF. In net.js's onread, the condition if (self._readableState.length === 0) prevails and the socket is destroyed, triggering the close event and a successful test.

And here's the order of operations on centos5:

  1. child_process.createSocket creates the this.stdout socket, which kicks off a call to Socket.Readable.read.
  2. The read from step 1 finishes at this point and sees that state.flowing is null in readableAddChunk, so it buffers the data and kicks off another read.
  3. The read from step 2 finishes and gets EOF. In ChildProcess's onEofChunk, thanks to streams: update .readable/.writable to false #4083, stdout's .readable is set to false. Since we have buffered data, the stream does not close at this point.
  4. echo completes, calling this._handle.onexit from internal/child_process.js.
  5. The child process calls flushStdio, but since stdout.readable is false, it doesn't call resume on it.
  6. That's all! stdout sits never closes, so the ChildProcess never closes and the test fails.

My 100% Mac pass rates were on a branch that didn't have #4083; pulling in that commit, I get a ~1% fail rate. On centos5 it fails every time.

So my view is that #4083 was a little hasty and didn't account for ChildProcess's expectation that stream.readable remains true after onEofChunk is called. If I take out the stream.readable = false; line from onEofChunk, the test passes on centos5 every time. Does that make sense?

CC @mscdex

mscdex added a commit to mscdex/io.js that referenced this issue Dec 3, 2015
As of 34b535f, test-child-process-flush-stdio was failing
on CentOS 5 systems in CI due to the change in stream state
checking in `child_process`. This commit fixes those failures
by making readable streams less eager in setting their readable
flag on EOF.

Fixes: nodejs#4125
@cjihrig
Copy link
Contributor

cjihrig commented Dec 4, 2015

@davidvgalbraith thank you for digging into this

@mscdex mscdex closed this as completed in 2922188 Dec 4, 2015
mscdex added a commit that referenced this issue Dec 5, 2015
As of 34b535f, test-child-process-flush-stdio was failing
on CentOS 5 systems in CI due to the change in stream state
checking in `child_process`. This commit fixes those failures
by making readable streams less eager in setting their readable
flag on EOF.

Fixes: #4125
PR-URL: #4141
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
scovetta pushed a commit to scovetta/node that referenced this issue Apr 2, 2016
As of 34b535f, test-child-process-flush-stdio was failing
on CentOS 5 systems in CI due to the change in stream state
checking in `child_process`. This commit fixes those failures
by making readable streams less eager in setting their readable
flag on EOF.

Fixes: nodejs#4125
PR-URL: nodejs#4141
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
child_process Issues and PRs related to the child_process subsystem. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants