-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch performance tests to Playwright #52022
Conversation
Size Change: +236 B (0%) Total Size: 1.51 MB
ℹ️ View Unchanged
|
Flaky tests detected in 1cf298a. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/5683296273
|
The first character is excluded from the results later on, so let's not double-exclude.
- define fixed time to wait until browser becomes idle instead random values - make each test record 10 samples + 1 throwaway to address initial metric spikes - improve overall code consistency
80b4a7e
to
df1216e
Compare
6a56bd6
to
1a18006
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple quick questions, but it looks like you've been thorough here. I don't know if there are things otherwise I can review without looking at it in production, but the things are questions about integrating with existing services like the codehealth app, and I think you have taken care of that already.
So, well done.
await firstParagraph.click(); | ||
await pageUtils.pressKeys( 'primary+a' ); | ||
await page.keyboard.press( 'ArrowRight' ); | ||
await page.keyboard.press( 'Enter' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume there's a reason these are all gone here, but are we certain they didn't help reset some state which was helpful for the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for those changes is to bring consistency in methodology. To make these changes easier to follow, I've made some before and after recordings and put them together in this comment. Hope it helps!
BTW, this test actually needed a small tweak that I pushed here. The change was intended but still not working as expected, so nice catch! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment in 894896b
For further inspection, here's a comparison of before and after the migration + proposed fixes for metric consistency. The most important changes are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Left some suggestions but all minor.
I think this looks good but given that it might have a bigger impact I don't object to having more eyes on this before merging 💯 .
@@ -66,5 +66,5 @@ export async function visitSiteEditor( | |||
// TODO: Ideally the content underneath the spinner should be marked inert until it's ready. | |||
await this.page | |||
.locator( '.edit-site-canvas-spinner' ) | |||
.waitFor( { state: 'hidden' } ); | |||
.waitFor( { state: 'hidden', timeout: 60_000 } ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this here? If it applies to all the tests then maybe we should add a comment above explaining why 😅 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was needed because, for performance tests, we load an enormous post fixture to test against, which often takes more than the default 10 seconds to load.
I'm not sure the rationale for using a post big enough to freeze the browser for a bit, but it's probably something to be discussed in a separate PR? /cc @youknowriad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I've tested with the large post in my browser it opens reasonably fast, but if I open the profiling tools I've seen it freeze for more than a minute.
I'm inferring that Chrome is spending a lot of time instrumenting the thousands of listeners and hooks when we turn of tracing, which probably happens here as well.
so this may not be realistic from a user perspective, but the wait may still occur when testing and profiling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, profiling adds a lot of overhead. We could assume that the overhead added shares averagely across all code and can be seen as a fair emulated slowdown of the CPU, but I don't know if that's true 😅 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's weird. I don't understand it, but it doesn't seem to appear in the metrics. I suspect that the profiling mechanisms don't start until after this initialization. the page isn't interactive while it's starting up, but then once it finishes things are suddenly fast and fine.
so it may not be the case that Chrome reports this big delay, but it still delays the tests. based on some recent profiling work I'm wondering if we might have code paths that are to blame. there are so many listeners involved and it doesn't seem reasonable that we should have so many. maybe Chrome's profiler simply chokes at tens of thousands of listeners (or maybe this is a normal number 🤷♂️ I really don't know).
getTraceFilePath, | ||
getTypingEventDurations, | ||
getLoadingDurations, | ||
loadBlocksFromHtml, | ||
} = require( '../utils' ); | ||
|
||
const BROWSER_IDLE_WAIT = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this to an inline comment?
import { existsSync, readFileSync, unlinkSync } from 'fs'; | ||
|
||
export function sum( array ) { | ||
if ( ! array || ! array.length ) return undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: For testing I think it might be better to just throw an error if we're passing unexpected arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the error throwing should happen one step earlier (where we collect results data, for example, via expect()
). Before that, though, I wanted to migrate the tests to TypeScript (in a separate PR) to get a better feel for how to handle exceptions.
For these calc functions (sum, median, average, etc.) I think it's OK not to throw, but it's still important to stay consistent in how we handle invalid arguments (return undefined
). Having said that, I'm aware this implementation is not super-defensive, but I don't think it's necessary at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were using TypeScript, we wouldn't have to guard the inputs during runtime since the error would be caught statically. 👍
Each test is run with a fresh context by default, so we can just set the storageState via test.use instead.
This reverts commit 86ed1e0.
This reverts commit 5014a65.
AKA Revert "Revert "Switch performance tests to Playwright (#52022)""
What?
A follow-up to #51084
Closes #51383
Closes #51379
This PR effectively switches our performance tests to be handled by Playwright. Apart from that, the following improvements have been made:
ℹ️ The codevitals app requires no update as the test results will be produced and uploaded as expected.
I ran a couple of comparisons of this branch vs. the base branch (round 1, 2 and 3). The results seem stable and should not disrupt the current codevitals logs:
Why?
As a part of the Playwright migration task, but also to improve the overall experience and quality of our performance metrics.
Testing Instructions
To run performance tests locally, the command remains unchanged, e.g.:
You should see different reporter output though, as it was updated to be consistent with the format we use in CI. Also, missing front-end perf results have been added:
To run branch performance comparison tests locally (via the CLI tool), the command also remains unchanged as well as the output, e.g.:
You can check the performance GH action output for this PR to verify the above as well.