Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate flaky test-performance-eventlooputil #35309

Closed
Trott opened this issue Sep 23, 2020 · 3 comments
Closed

investigate flaky test-performance-eventlooputil #35309

Trott opened this issue Sep 23, 2020 · 3 comments
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI.

Comments

@Trott
Copy link
Member

Trott commented Sep 23, 2020

  • Test: parllel/test-performance-eventlooputil
  • Platform: centos7-64-gcc8
  • Console Output:
09:46:55 not ok 1670 parallel/test-performance-eventlooputil
09:46:55   ---
09:46:55   duration_ms: 0.336
09:46:55   severity: fail
09:46:55   exitcode: 1
09:46:55   stack: |-
09:46:55     assert.js:385
09:46:55         throw err;
09:46:55         ^
09:46:55     
09:46:55     AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
09:46:55     
09:46:55       assert.ok(elu2.active > elu3.active)
09:46:55     
09:46:55         at Timeout._onTimeout (/home/iojs/build/workspace/node-test-commit-linux/nodes/centos7-64-gcc8/test/parallel/test-performance-eventlooputil.js:41:10)
09:46:55         at listOnTimeout (internal/timers.js:555:17)
09:46:55         at processTimers (internal/timers.js:498:7) {
09:46:55       generatedMessage: true,
09:46:55       code: 'ERR_ASSERTION',
09:46:55       actual: false,
09:46:55       expected: true,
09:46:55       operator: '=='
09:46:55     }
09:46:55   ...
@Trott Trott added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Sep 23, 2020
@Trott
Copy link
Member Author

Trott commented Oct 7, 2020

  • Platform: ubuntu1804_sharedlibs_debug_x64
  • Console Output:
00:14:39 not ok 1623 parallel/test-performance-eventlooputil
00:14:39   ---
00:14:39   duration_ms: 0.270
00:14:39   severity: fail
00:14:39   exitcode: 1
00:14:39   stack: |-
00:14:39     assert.js:386
00:14:39         throw err;
00:14:39         ^
00:14:39     
00:14:39     AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
00:14:39     
00:14:39       assert.ok(elu2.active > elu3.active)
00:14:39     
00:14:39         at Timeout._onTimeout (/home/iojs/build/workspace/node-test-commit-linux-containered/test/parallel/test-performance-eventlooputil.js:41:10)
00:14:39         at listOnTimeout (internal/timers.js:555:17)
00:14:39         at processTimers (internal/timers.js:498:7) {
00:14:39       generatedMessage: true,
00:14:39       code: 'ERR_ASSERTION',
00:14:39       actual: false,
00:14:39       expected: true,
00:14:39       operator: '=='
00:14:39     }
00:14:39   ...

@Trott
Copy link
Member Author

Trott commented Oct 7, 2020

The test seems inherently flaky, likely on all or nearly all platforms. I don't even need to run it concurrently locally to get it to fail. Here's a stress test to see if we can get a handle on how often it fails. https://ci.nodejs.org/job/node-stress-single-test/185/

(EDIT: 1 failure in 856 runs so far on debian9-64. Might be interesting to see if it fails more or less often on slow machines vs. fast machines and if common.platformTimeout() helps. Also might be interesting to experiment with parallel test jobs to see if it is affected by being in parallel or not.)

@Flarna
Copy link
Member

Flarna commented Nov 1, 2020

One more which has some more info in assert:

AssertionError [ERR_ASSERTION]: 50.29698935913086 <= 50.68267250061035
    at Timeout.r (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604-64/test/parallel/test-performance-eventlooputil.js:46:10)
    at Timeout._onTimeout (/home/iojs/build/workspace/node-test-commit-linux/nodes/ubuntu1604-64/test/common/index.js:377:15)
    at listOnTimeout (node:internal/timers:555:17)
    at processTimers (node:internal/timers:498:7) {
  generatedMessage: false,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}

https://ci.nodejs.org/job/node-test-commit-linux/37969/nodes=ubuntu1604-64/testReport/junit/(root)/test/parallel_test_performance_eventlooputil/

@Trott Trott closed this as completed in 135dd88 Nov 10, 2020
danielleadams pushed a commit that referenced this issue Nov 10, 2020
Fix two races in test-performance-eventlooputil resulting in a flaky
test.

elu1 was capture after start time t from spin look. If OS descides to
reschedule the process after capturing t but before getting elu for
>=50ms the spin loop is actually a nop. elu1 doesn't show this and as
a result elut3 = eventLoopUtilization(elu1) results in
elu3.active === 0.
Moving capturing of t after capturing t, just before the spin look
avoids this.

Similar if OS decides to shedule a different process between getting
the total elu from start and the diff elu showing the spin loop the
check to verify that total active time is long then the spin loop
fails.
Exchanging these statements avoids this race.

PR-URL: #36028
Fixes: #35309
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
richardlau pushed a commit that referenced this issue Mar 29, 2021
Fix two races in test-performance-eventlooputil resulting in a flaky
test.

elu1 was capture after start time t from spin look. If OS descides to
reschedule the process after capturing t but before getting elu for
>=50ms the spin loop is actually a nop. elu1 doesn't show this and as
a result elut3 = eventLoopUtilization(elu1) results in
elu3.active === 0.
Moving capturing of t after capturing t, just before the spin look
avoids this.

Similar if OS decides to shedule a different process between getting
the total elu from start and the diff elu showing the spin loop the
check to verify that total active time is long then the spin loop
fails.
Exchanging these statements avoids this race.

PR-URL: #36028
Fixes: #35309
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
targos pushed a commit that referenced this issue May 1, 2021
Fix two races in test-performance-eventlooputil resulting in a flaky
test.

elu1 was capture after start time t from spin look. If OS descides to
reschedule the process after capturing t but before getting elu for
>=50ms the spin loop is actually a nop. elu1 doesn't show this and as
a result elut3 = eventLoopUtilization(elu1) results in
elu3.active === 0.
Moving capturing of t after capturing t, just before the spin look
avoids this.

Similar if OS decides to shedule a different process between getting
the total elu from start and the diff elu showing the spin loop the
check to verify that total active time is long then the spin loop
fails.
Exchanging these statements avoids this race.

PR-URL: #36028
Fixes: #35309
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants