Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix child-process-fork-regr-gh-2847 #5121

Closed

Conversation

santigimeno
Copy link
Member

The test would sometimes time out on some platforms. Take the initial
version of the test and instead of sending 100 handles, just send 2.
It still fails when run with the code before the change was introduced
and passes afterwards.

@santigimeno
Copy link
Member Author

I've checked that it passes when run on 36b969f (where the test was initially introduced) and fails with the previous commit 6ac79bc in OS X and Debian Jessie 64.

@mscdex mscdex added child_process Issues and PRs related to the child_process subsystem. test Issues and PRs related to the tests. labels Feb 6, 2016
@Trott
Copy link
Member

Trott commented Feb 6, 2016

This one has been failing a lot on pi2-raspbian-wheezy in CI a lot lately, so here's a stress test with this fix:

And here's a stress test using current master:

(CI is closed off from public view for another couple of days until the security release of node is out, but I'll summarize the results here.)

@Trott
Copy link
Member

Trott commented Feb 6, 2016

A change that just landed on master today was to mark this test as flaky in test/parallel/parallel.status so this PR should probably update that file too.

@Trott
Copy link
Member

Trott commented Feb 6, 2016

I stopped the master branch stress test because the test failed 5 times out of 9 runs. So there's the unsurprising confirmation that the test is highly flaky in current master.

With this fix in place, the stress test has run 250 times without a single failure. So there's confirmation that this fix improves the situation.

@Trott
Copy link
Member

Trott commented Feb 6, 2016

@santigimeno Once you push the change to the parallel.status file, I (or someone else) will run this through a full CI. (Nice work, as always!)

The test would sometimes time out on some platforms. Take the initial
version of the test and instead of sending 100 handles, just send 2.
It still fails when run with the code before the change was introduced
and passes afterwards.
Remove test from parallel.status.
@santigimeno santigimeno force-pushed the child-process-fork-regr-gh-2847 branch from cf59b37 to 65de3e4 Compare February 6, 2016 21:53
@santigimeno
Copy link
Member Author

@Trott PR updated. Thanks!

@Trott
Copy link
Member

Trott commented Feb 6, 2016

@Trott
Copy link
Member

Trott commented Feb 7, 2016

LGTM, but given the extent of the rewrite, I'd like to get a second from at least one of:

  • @indutny (who wrote the test)
  • @mhdawson (who made the test AIX-friendly)
  • @mscdex (who made the test not-flaky on Windows)

@mscdex
Copy link
Contributor

mscdex commented Feb 7, 2016

Has this been stress tested on Windows? Also, I know this commit didn't change this part, but I'd be a bit concerned about the low timeout value (especially on pi1).

@Trott
Copy link
Member

Trott commented Feb 7, 2016

Stress tests on:

console.log(err);
sendcount++;
});
// Send 2 handles to make `process.disconnect()` wait
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. Where does the test call process.disconnect()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnoordhuis The original version of that comment landed in 36b969f and was authored by @indutny. I think it might be referring to target.disconnect() being called in lib/internal/child_process.js in channel.onread() but I'm not actually sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what @Trott says. It was in the original version of the test. I can try to find out the reason

@mscdex
Copy link
Contributor

mscdex commented Feb 7, 2016

It looks like the pi1 stress test ran 0 times?

@Trott
Copy link
Member

Trott commented Feb 7, 2016

@mscdex That's what happens when I tell the test runner to run the test 000 times when I mean to type 999. Guess I better try that again. Here's hoping no typos this time: https://ci.nodejs.org/job/node-stress-single-test/442/nodes=pi1-raspbian-wheezy/console

@mhdawson
Copy link
Member

mhdawson commented Feb 9, 2016

I'm ok with the change provided it still catches the original problem and it passes on AIX. CI run on AIX here: https://ci.nodejs.org/job/node-test-commit-aix/37/ (unfortunately is does take a while to run)

@mscdex
Copy link
Contributor

mscdex commented Feb 9, 2016

LGTM

@Trott
Copy link
Member

Trott commented Feb 9, 2016

I confirmed that the test still fails (as expected) in Node.js v3.3.1 (with ES6-isms removed from common.js) and passes in Node.js v4.0.0. (This is a bit redundant because @santigimeno already confirmed that it fails with the fix commit backed out.)

LGTM but awaiting AIX results...

@mhdawson
Copy link
Member

mhdawson commented Feb 9, 2016

lgtm tests still passs on AIX.

Trott pushed a commit to Trott/io.js that referenced this pull request Feb 10, 2016
The test would sometimes time out on some platforms. Take the initial
version of the test and instead of sending 100 handles, just send 2.
It still fails when run with the code before the change was introduced
and passes afterwards.

Remove test from parallel.status.

PR-URL: nodejs#5121
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Fixes: nodejs#3635
@Trott
Copy link
Member

Trott commented Feb 10, 2016

Landed in 4e4b260

Feels good to get that one off the flaky list...Thanks, @santigimeno !

@Trott Trott closed this Feb 10, 2016
rvagg pushed a commit that referenced this pull request Feb 15, 2016
The test would sometimes time out on some platforms. Take the initial
version of the test and instead of sending 100 handles, just send 2.
It still fails when run with the code before the change was introduced
and passes afterwards.

Remove test from parallel.status.

PR-URL: #5121
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Fixes: #3635
MylesBorins pushed a commit that referenced this pull request Feb 18, 2016
The test would sometimes time out on some platforms. Take the initial
version of the test and instead of sending 100 handles, just send 2.
It still fails when run with the code before the change was introduced
and passes afterwards.

Remove test from parallel.status.

PR-URL: #5121
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Fixes: #3635
MylesBorins pushed a commit that referenced this pull request Feb 18, 2016
The test would sometimes time out on some platforms. Take the initial
version of the test and instead of sending 100 handles, just send 2.
It still fails when run with the code before the change was introduced
and passes afterwards.

Remove test from parallel.status.

PR-URL: #5121
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Fixes: #3635
@MylesBorins MylesBorins mentioned this pull request Feb 18, 2016
MylesBorins pushed a commit that referenced this pull request Mar 2, 2016
The test would sometimes time out on some platforms. Take the initial
version of the test and instead of sending 100 handles, just send 2.
It still fails when run with the code before the change was introduced
and passes afterwards.

Remove test from parallel.status.

PR-URL: #5121
Reviewed-By: Brian White <mscdex@mscdex.net>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Fixes: #3635
@refack refack mentioned this pull request Apr 27, 2017
3 tasks
refack added a commit to refack/node that referenced this pull request May 19, 2017
According to the explanation in nodejs#3635#issuecomment-157714683
And as a continuation to nodejs#5422 we also ignore EMFILE
"No more file descriptors are available,so no more files can be opened"

PR-URL: nodejs#12698
Fixes: nodejs#10286
Refs: nodejs#3635 (comment)
Refs: nodejs#5178
Refs: nodejs#5179
Refs: nodejs#4005
Refs: nodejs#5121
Refs: nodejs#5422
Refs: nodejs#12621 (comment)
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
MylesBorins pushed a commit that referenced this pull request Jul 17, 2017
According to the explanation in #3635#issuecomment-157714683
And as a continuation to #5422 we also ignore EMFILE
"No more file descriptors are available,so no more files can be opened"

PR-URL: #12698
Fixes: #10286
Refs: #3635 (comment)
Refs: #5178
Refs: #5179
Refs: #4005
Refs: #5121
Refs: #5422
Refs: #12621 (comment)
Reviewed-By: Gibson Fahnestock <gibfahn@gmail.com>
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
child_process Issues and PRs related to the child_process subsystem. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants