-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: fix races in test-performance-eventlooputil #36028
Conversation
Fix two races in test-performance-eventlooputil resulting in a flaky test. elu1 was capture after start time t from spin look. If OS descides to reschedule the process after capturing t but before getting elu for >=50ms the spin loop is actually a nop. elu1 doesn't show this and as a result elut3 = eventLoopUtilization(elu1) results in elu3.active === 0. Moving capturing of t after capturing t, just before the spin look avoids this. Similar if OS decides to shedule a different process between getting the total elu from start and the diff elu showing the spin loop the check to verify that total active time is long then the spin loop fails. Exchanging these statements avoids this race.
Trying to start a stress test for this but CI stress test is not starting for some reason. Anyway, a similar stress test a couple days ago against master showed 12 failures on osx1015 in 1000 runs. So hopefully we can run that on this PR, see that it fixes things, and I can approve this and close my PR to move this test to sequential. |
@Trott Please note that you refer to a different test (test-worker-eventlooputil) - this PR modifies test-performance-eventlooputil |
Stress tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on getting the test as it is on my machine to fail from a stress test, but doesn't want to.
Mind checking if moving setting elu1
, but not changing anything else, fixes the issue?
@@ -23,28 +23,28 @@ if (nodeTiming.loopStart === -1) { | |||
} | |||
|
|||
setTimeout(mustCall(function r() { | |||
const t = Date.now(); | |||
const elu1 = eventLoopUtilization(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I messed up here. elu1
also shouldn't be captured until after some idleTime
has accumulated. This function call was only meant supposed to check if .active
was being set correctly.
The reason for forcing idleTime
to accumulate before the test ran was so it would execute the same running it directly or running with ./tools/test.py --worker
.
As far as I remember I tried this but then the second race mentioned in description jumped in and resulted in a sporadic fail in the next assert. |
@Flarna Thanks. Was able to reproduce. See what you're saying. I'm testing out some variations w/ you PR. Will get back in a bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Flarna Thanks for the fix. Looks good.
Landed in 135dd88. Thanks! |
Fix two races in test-performance-eventlooputil resulting in a flaky test. elu1 was capture after start time t from spin look. If OS descides to reschedule the process after capturing t but before getting elu for >=50ms the spin loop is actually a nop. elu1 doesn't show this and as a result elut3 = eventLoopUtilization(elu1) results in elu3.active === 0. Moving capturing of t after capturing t, just before the spin look avoids this. Similar if OS decides to shedule a different process between getting the total elu from start and the diff elu showing the spin loop the check to verify that total active time is long then the spin loop fails. Exchanging these statements avoids this race. PR-URL: nodejs#36028 Fixes: nodejs#35309 Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>
Fix two races in test-performance-eventlooputil resulting in a flaky test. elu1 was capture after start time t from spin look. If OS descides to reschedule the process after capturing t but before getting elu for >=50ms the spin loop is actually a nop. elu1 doesn't show this and as a result elut3 = eventLoopUtilization(elu1) results in elu3.active === 0. Moving capturing of t after capturing t, just before the spin look avoids this. Similar if OS decides to shedule a different process between getting the total elu from start and the diff elu showing the spin loop the check to verify that total active time is long then the spin loop fails. Exchanging these statements avoids this race. PR-URL: #36028 Fixes: #35309 Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>
Fix two races in test-performance-eventlooputil resulting in a flaky test. elu1 was capture after start time t from spin look. If OS descides to reschedule the process after capturing t but before getting elu for >=50ms the spin loop is actually a nop. elu1 doesn't show this and as a result elut3 = eventLoopUtilization(elu1) results in elu3.active === 0. Moving capturing of t after capturing t, just before the spin look avoids this. Similar if OS decides to shedule a different process between getting the total elu from start and the diff elu showing the spin loop the check to verify that total active time is long then the spin loop fails. Exchanging these statements avoids this race. PR-URL: #36028 Fixes: #35309 Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>
Fix two races in test-performance-eventlooputil resulting in a flaky test. elu1 was capture after start time t from spin look. If OS descides to reschedule the process after capturing t but before getting elu for >=50ms the spin loop is actually a nop. elu1 doesn't show this and as a result elut3 = eventLoopUtilization(elu1) results in elu3.active === 0. Moving capturing of t after capturing t, just before the spin look avoids this. Similar if OS decides to shedule a different process between getting the total elu from start and the diff elu showing the spin loop the check to verify that total active time is long then the spin loop fails. Exchanging these statements avoids this race. PR-URL: #36028 Fixes: #35309 Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>
Fix two races in test-performance-eventlooputil resulting in a flaky test.
elu1 was capture after start time t from spin look. If OS descides to reschedule the process after capturing t but before getting elu for >=50ms the spin loop is actually a nop. elu1 doesn't show this and as a result elut3 = eventLoopUtilization(elu1) results in
elu3.active is lower then the expected spin time.
Moving capturing of t after capturing t, just before the spin look
avoids this.
Similar if OS decides to shedule a different process between getting the total elu from start and the diff elu showing the spin loop the check to verify that total active time is long then the spin loop
fails.
Exchanging these statements avoids this race.
Fixes: #35309
fyi @trevnorris