-
Notifications
You must be signed in to change notification settings - Fork 235
Very large performance regression from 0.22 to 0.24 #414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you share an example document that shows a measurable performance difference when parsing it? |
My only guess as to what could have changed is 81ea581; none of the rest of the commits look like they should have an impact on parsing performance. |
I can try, it might take me a while. |
It can't have been 81ea581 because that's only part of 0.25, not 0.24 (according to github's tag parsing) |
Ok I managed to reproduce with the docs.rs code, although the slowdown doesn't seem to be enough to explain the CPU spike above. With 0.22:
With 0.24:
The code under test: https://github.com/rust-lang/docs.rs/blob/5c389cabe8629483f00261dfb455dd842dc05940/src/utils/html.rs Benchmark code:
HTML file: Let me know if you want me to try and reproduce with html5ever directly. |
See rust-lang#671, servo/html5ever#414 for rationale. This makes the `extract_head_and_body` function public since I couldn't find a cfg flag that would do so only during tests.
See #671, servo/html5ever#414 for rationale. This makes the `extract_head_and_body` function public since I couldn't find a cfg flag that would do so only during tests.
Benchmarks with 0.25 and kuchiki still show a big slowdown, but not as bad as 0.24: rust-lang/docs.rs#671 (comment) |
We probably won't be able to investigate this until sometime next week due to deadlines, unfortunately. |
Just checking on the status of this, you mentioned you might have time to look at it this week? No problem if not, this isn't urgent for us. |
We have not had any time due to the previous deadlines being extended to accommodate delays. |
Checking back in on this, let me know if there's any more I can do to identify the source of the regressions. |
How am I supposed to run the benchmark code? The cratesfyi crate does not appear to exist on crates.io. |
The most useful thing for me would be a standalone cargo project that runs a bunch of html5ever ever code infinitely in a loop that I can run in a profiler. Even better if it's very easy to swap out the version of html5ever that's in use in order to compare profiles and identify what changed to make it slower. |
Here is a standalone project that runs the html5ever code indefinitely in a loop: https://github.com/jyn514/html5ever-example To swap out the html5ever, just change the version specified in |
This comment has been minimized.
This comment has been minimized.
Thanks for the standalone project! I applied the following change to it: diff --git a/src/main.rs b/src/main.rs
index 49c3787..4eda507 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -8,9 +8,12 @@ type Result<T> = std::result::Result<T, failure::Error>;
fn main() {
let regex = include_str!("struct.CaptureMatches.html");
- loop {
+ let now = std::time::Instant::now();
+ let iters = 200;
+ for _ in 1..iters {
let (_head, _body, _classes) = extract_head_and_body(regex).unwrap();
}
+ println!("{}us/iter", (now.elapsed() / iters).as_micros());
}
/// Extracts the contents of the `<head>` and `<body>` tags from an HTML document, as well as the And ran
The spread definitely looks like versions since 0.22 take longer. |
That being said, we're taking about a difference between 6.7ms and 7.7ms, but I'll see if a profile shows anything interesting. |
Awesome, thanks for the response! I agree that doesn't seem to explain the spike in usage we saw, but I'm at a loss as to what else it could be ... the only changes we made were those in https://github.com/rust-lang/docs.rs/pull/668/files (html5ever plus some transitive dependencies). I'll see whether we can try 0.25 instead and see if that has less of a usage spike than 0.24, I remember I benched it in rust-lang/docs.rs#671 (comment) and it was slightly better. |
It seems that this was not nearly as bad a spike in 0.25, so this can probably be closed. It's still slightly worse performance than 0.22 so if you see something interesting I'd love to have it improved, but it's not a high priority. |
(I'm aware 0.25.1 is the latest version, if this is fixed on master let me know and I can close this. 0.25 removed RcDom so it would take some set up on my end to switch)
Docs.rs uses html5ever to parse HTML files generated by rustdoc. We recently updated from 0.22 to 0.24 and docs.rs started using so much CPU that all requests turned into 500 errors. The 0.24 version was deployed from 20:04 to 20:20 in the following diagram:
See also rust-lang/docs.rs#669 (and many other PRs linked from there, we had to bisect in production).
Do you know what could have caused the slowdown? I saw that no new dependencies were introduced so it must have been something internal to html5ever. I can try to reproduce this on a single file if you're not sure, but right now it's preventing us from upgrading.
The text was updated successfully, but these errors were encountered: