Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix test-cluster-dgram-1 flakiness #8383

Closed

Conversation

santigimeno
Copy link
Member

Checklist
  • make -j4 test (UNIX), or vcbuild test nosign (Windows) passes
  • commit message follows commit guidelines
Affected core subsystem(s)

test

Description of change

Check for the number of messages received in the exit event listener
instead of the disconnect listener.

Fixes: #8380

Check for the number of messages received in the `exit` event listener
instead of the `disconnect` listener.

Fixes: nodejs#8380
@nodejs-github-bot nodejs-github-bot added the test Issues and PRs related to the tests. label Sep 2, 2016
@santigimeno santigimeno added cluster Issues and PRs related to the cluster subsystem. dgram Issues and PRs related to the dgram subsystem / UDP. labels Sep 2, 2016
@cjihrig
Copy link
Contributor

cjihrig commented Sep 2, 2016

LGTM

@mhdawson
Copy link
Member

mhdawson commented Sep 2, 2016

Test results on AIX

For the original before the refactor I got 0 failures out of 200 runs.

After the refactor I get 46/150 failures

With this fix the frequency of failures goes done to about 3/150 but it still fails consistently with: (note it says parallel/test-cluster-dgram-3 as opposed to parallel/test-cluster-dgram-1 simply because I copied the new version into a different file for testing).

Mismatched function calls. Expected 10, actual 0.
at worker (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-cluster-dgram-3.js:82:31)
at Object. (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-cluster-dgram-3.js:20:3)
at Module._compile (module.js:409:26)
at Object.Module._extensions..js (module.js:416:10)
at Module.load (module.js:343:32)
at Function.Module._load (module.js:300:12)
at Function.Module.runMain (module.js:441:10)
at startup (node.js:139:18)
at node.js:974:3
assert.js:89
throw new assert.AssertionError({
^
AssertionError: 0 === 10
at Worker. (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-cluster-dgram-3.js:70:14)
at Worker. (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/common.js:401:15)
at emitTwo (events.js:87:13)
at Worker.emit (events.js:172:7)
at ChildProcess. (cluster.js:364:14)
at ChildProcess.g (events.js:260:16)
at emitTwo (events.js:87:13)
at ChildProcess.emit (events.js:172:7)
at Process.ChildProcess._handle.onexit (internal/child_process.js:200:12)

So the next is that it makes things better but does not completely resolve the flakiness at least on AIX.

@santigimeno
Copy link
Member Author

@mhdawson I have pushed a fix. Can you try again? Thanks!

@mhdawson
Copy link
Member

mhdawson commented Sep 2, 2016

@santigimeno that seems to do the trick 0 failures out of 450 so LGTM.

@mhdawson
Copy link
Member

mhdawson commented Sep 2, 2016

@jasnell
Copy link
Member

jasnell commented Sep 2, 2016

FWIW, given how this is being fixed, I could be wrong but it looks like the refactor didn't actually break the test as much as highlight a failure that had already been happening but hadn't been caught.

@jasnell
Copy link
Member

jasnell commented Sep 2, 2016

LGTM

@jasnell
Copy link
Member

jasnell commented Sep 2, 2016

I'd say given the breakage that the changes in the test are causing in CI, if this is non-controversial we shouldn't need to wait the 48 hours to land. /cc @Trott

mhdawson pushed a commit that referenced this pull request Sep 2, 2016
Check for the number of messages received in the `exit` event listener
instead of the `disconnect` listener.

Fixes: #8380
PR-URL: #8383
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed By: James M Snell <jasnell@gmail.com>
@Trott
Copy link
Member

Trott commented Sep 2, 2016

Agreed on the "landing sooner than 48 hours" suggestion.

@mhdawson
Copy link
Member

mhdawson commented Sep 2, 2016

landed as 2d2a2d7

@mhdawson mhdawson closed this Sep 2, 2016
@santigimeno
Copy link
Member Author

I understand the hurry, but I was counting on amending the commit message before merging because, after the fixup commit, the fix explanation was different.

@Fishrock123 Fishrock123 mentioned this pull request Sep 6, 2016
Fishrock123 pushed a commit to Fishrock123/node that referenced this pull request Sep 8, 2016
Check for the number of messages received in the `exit` event listener
instead of the `disconnect` listener.

Fixes: nodejs#8380
PR-URL: nodejs#8383
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed By: James M Snell <jasnell@gmail.com>
Fishrock123 pushed a commit that referenced this pull request Sep 9, 2016
Check for the number of messages received in the `exit` event listener
instead of the `disconnect` listener.

Fixes: #8380
PR-URL: #8383
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed By: James M Snell <jasnell@gmail.com>
@MylesBorins
Copy link
Contributor

This does not land cleanly in LTS. Added dont-land label. Please feel free to manually backport

santigimeno added a commit to santigimeno/node that referenced this pull request Oct 15, 2016
Check for the number of messages received in the `exit` event listener
instead of the `disconnect` listener.

Fixes: nodejs#8380
Ref: nodejs#8383
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed By: James M Snell <jasnell@gmail.com>
@santigimeno
Copy link
Member Author

@thealphanerd backport to 4.x here: #9109

MylesBorins pushed a commit that referenced this pull request Oct 24, 2016
Check for the number of messages received in the `exit` event listener
instead of the `disconnect` listener.

Fixes: #8380
Ref: #8383
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed By: James M Snell <jasnell@gmail.com>
MylesBorins pushed a commit that referenced this pull request Oct 26, 2016
Check for the number of messages received in the `exit` event listener
instead of the `disconnect` listener.

Fixes: #8380
Ref: #8383
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster Issues and PRs related to the cluster subsystem. dgram Issues and PRs related to the dgram subsystem / UDP. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AIX: parallel/test-cluster-dgram-1 failures
7 participants