-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: improve performance of test-crypto-timing-safe-equal-benchmarks #26237
Conversation
@bnoordhuis @not-an-aardvark @nodejs/crypto |
(If this is a reasonable approach, this test can probably be moved out of pummel and start running again more regularly in CI.) |
Whoops, I used the wrong repo in the Parameters. Let's try again. Pummel CI: https://ci.nodejs.org/job/node-test-commit-custom-suites/886/ ✅ |
Before this change: $ time ./node test/pummel/test-crypto-timing-safe-equal-benchmarks.js
real 0m56.422s
user 0m54.419s
sys 0m8.399s
$ With this change: $ time ./node test/pummel/test-crypto-timing-safe-equal-benchmarks.js
real 0m4.885s
user 0m4.689s
sys 0m1.418s
$ |
(The speed increase is also seen on CI. Before this change, test was timing out taking more than 2 minutes. With this change, the test takes 11 seconds. I agree with @BridgeAR that this result is surprising.) |
|
@bmeurer is there a possibility to improve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a nit
Post-fixup Lite CI for linting: https://ci.nodejs.org/job/node-test-pull-request-lite-pipeline/2688/ Post-fixup Pummel CI to actually run the test: https://ci.nodejs.org/job/node-test-commit-custom-suites/887/ |
BTW, this "fixes" the test by adding an optimization that's not really related to |
The reason for the original bad performance was actually a memory leak. Even though we delete the module keyed by the file name from the cache, it is added to This is the (partial) content of
This is after 10:
By resetting |
@hashseed thanks for looking into it again. |
I'd like to fast-track this to un-break the nightly CI job that runs pummel tests. Please 👍 here if you support fast-tracking, or leave a comment if you think that's unwarranted in this case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change LGTM, although it might be worth running the test a bunch of times to make sure it's not more flaky as a result of this change.
Running the test as it appears in this PR 1000 times: https://ci.nodejs.org/job/node-stress-single-test/2160/ EDIT: Canceled after 521 runs, 458 success, 63 failure. Oddly, I can't get a failure on my local machine at all. Let's see what happens with master branch on CI stress test. |
Because we're already seeing a small-but-larger-than-expected number of failures on the stress test for this PR, here's a stress test against master but with the timeout bumped to 10 minutes. The idea is to see how often the test currently fails if it's given enough time. We know it will probably fail every time with a timeout, but if the timeout were larger, would the test be as flaky as the version in this PR? Let's find out. https://ci.nodejs.org/job/node-stress-single-test/2161/ |
@not-an-aardvark Any chance at all the test isn't flaky and it really isn't timing-safe? (But, oddly, only on the CI machine but not on my local laptop. ¯\(ツ)/¯) |
I suppose it's possible, but it seems unlikely that this would have resulted from a V8/module-loading change given that the comparison function doesn't do much with V8/module-loading internals. Based on looking at the test failures, it seems like errors are happening in both directions (sometimes it takes longer when the two buffers are equal, and sometimes it takes longer when the two buffers are unequal) which doesn't seem like it would typically occur if it's not timing-safe. I remember when I was originally trying to debug this, we were also encountering flakiness on some architectures but not others. |
Is it possible that the timing difference this test is intended to measure has been overshadowed and therefore masked by the GC time caused by the memory leak? |
I'm not very familiar with If the original code had a memory leak, another possibility could be that some calls are getting interrupted by GC once in awhile and throwing off the timing. |
Note that previously, the code was not being recompiled on every run because it hit V8's internal compilation cache, even though it was deleted from the require cache. |
@hashseed @not-an-aardvark On the GC hypothesis, sorry to ask a potentially-ignorant or naive question, but is this a correct understanding of what you're saying?: The memory leak means that GC happens from time to time, which will be unpredictable, thus causing timing to appear safe (because it is unpredictable) when it may not actually be? So it would cause the test to pass when perhaps it shouldn't? Or am I misunderstanding? |
I don't know what exactly the test asserts. Is it predictability or unpredictability? The memory leak will likely contribute to the latter. |
The test fails if it can find a statistically significant difference between comparison times for equal buffers and comparison times for unequal buffers, using The test also asserts that there is a statistically significant difference between timings for equal/unequal buffers when using the timing-unsafe The failures here seem to indicate that there is a timing anomaly being introduced somewhere, although it seems likely that the anomaly is an artifact of the test setup rather than timing unsafety in |
Resetting require.cache() to `Object.create(null)` each time rather than deleting the specific key results in a 10x improvement in running time. Fixes: nodejs#25984 Refs: nodejs#26229
@joyeecheung Any reason to avoid |
…nchmarks Use eval() instead....
5e1d897
to
69655d7
Compare
Changed it up to Stress test: https://ci.nodejs.org/job/node-stress-single-test/2162/ |
The eval version appears to run as quickly as the |
My original suggestion about using vm.runInThisContext() was based on that the previous implementation of the CJS loader used that method to perform the compilation. If eval() hits the compilation cache in similar ways without introducing unpredictable optimization that interferes with the timing being tested (@hashseed ?) then LGTM |
Stress test results: 1000 out of 1000 runs passed. 🎉 |
Because this is a fair bit different from what was originally opened in this PR, a couple re-reviews or new reviews would be welcome. |
And...now that we have green stress test and some re-reviews... would like to fast-track to unbreak CI, so I'm re-adding the fast-track label. Please 👍 here to fast-track. |
Landed in eb9e6c0 Thanks, everyone! |
Using `eval()` rather than `require()`'ing a fixture and deleting it from the cache results in a roughtly 10x improvement in running time. Fixes: nodejs#25984 Refs: nodejs#26229 PR-URL: nodejs#26237 Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Yang Guo <yangguo@chromium.org> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Teddy Katz <teddy.katz@gmail.com> Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Using `eval()` rather than `require()`'ing a fixture and deleting it from the cache results in a roughtly 10x improvement in running time. Fixes: #25984 Refs: #26229 PR-URL: #26237 Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Yang Guo <yangguo@chromium.org> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Teddy Katz <teddy.katz@gmail.com> Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Using `eval()` rather than `require()`'ing a fixture and deleting it from the cache results in a roughtly 10x improvement in running time. Fixes: #25984 Refs: #26229 PR-URL: #26237 Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Yang Guo <yangguo@chromium.org> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Teddy Katz <teddy.katz@gmail.com> Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Resetting require.cache() to
Object.create(null)
each time rather thandeleting the specific key results in a 10x improvement in running time.
Fixes: #25984
Refs: #26229
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes