-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf Tests: Wild attempt at improving test determinism #47400
Conversation
Size Change: 0 B Total Size: 1.31 MB ℹ️ View Unchanged
|
In this patch we're tossing in Chrome CLI args in an attempt to run things in a more deterministic manner, as well as trying to improve the timer precision, which is normally reduced as a mitigation against speculative execution attacks. The goal is to be able to run the performance tests CI workflow on a branch against itself or against another branch with a no-code change (such as a Markdown doc update) and end up with performance test results that are close enough to each other to be effectively equal. Currently the tests run with variation in the results that exceeds any actual variation between the branches, whereas we find statistical confidence that random noise is unlikely to account for the differences in the readings that we measure. Hopefully we can adjust some command-line arguments and figure out that some of them will help with the test reliability and we can add those to the repository. https://peter.sh/experiments/chromium-command-line-switches/ https://chromium.googlesource.com/v8/v8/+/master/src/flags/flag-definitions.h#188
22b558d
to
479eba1
Compare
Flaky tests detected in 479eba14f9855c0cd7e3387aa5174bb725a18009. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/4108976751
|
025b93e
to
e5c613d
Compare
e5c613d
to
9c5b722
Compare
Closing since the introduction of #47889 greatly improved test determinism. |
Status
Please ignore this PR. It's for testing and exploration.
--js-flags="--predictable --predictable_gc_schedule --single_threaded"
works but didn't noticeably impact the test results--deterministic-mode
leads to test failures after longer-than-30s delays--deterministic-mode
also gets cut after 6 hours by Github Actions' test deadlinesWhat?
Attempt to add Chrome flags to our test suites that might improve test reliability.
Why?
Because our tests are reporting performance metrics that are wrong.
How?
In this patch we're tossing in Chrome CLI args in an attempt to run things in a more deterministic manner, as well as trying to improve the timer precision, which is normally reduced as a mitigation against speculative execution attacks.
The goal is to be able to run the performance tests CI workflow on a branch against itself or against another branch with a no-code change (such as a Markdown doc update) and end up with performance test results that are close enough to each other to be effectively equal.
Currently the tests run with variation in the results that exceeds any actual variation between the branches, whereas we find statistical confidence that random noise is unlikely to account for the differences in the readings that we measure.
Hopefully we can adjust some command-line arguments and figure out that some of them will help with the test reliability and we can add those to the repository.
https://peter.sh/experiments/chromium-command-line-switches/ https://chromium.googlesource.com/v8/v8/+/master/src/flags/flag-definitions.h#188
Testing Instructions
This is the test. Please ignore the PR for now.