Skip to content

Very large performance regression from 0.22 to 0.24 #414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jyn514 opened this issue Mar 31, 2020 · 18 comments
Closed

Very large performance regression from 0.22 to 0.24 #414

jyn514 opened this issue Mar 31, 2020 · 18 comments

Comments

@jyn514
Copy link

jyn514 commented Mar 31, 2020

(I'm aware 0.25.1 is the latest version, if this is fixed on master let me know and I can close this. 0.25 removed RcDom so it would take some set up on my end to switch)

Docs.rs uses html5ever to parse HTML files generated by rustdoc. We recently updated from 0.22 to 0.24 and docs.rs started using so much CPU that all requests turned into 500 errors. The 0.24 version was deployed from 20:04 to 20:20 in the following diagram:

image

See also rust-lang/docs.rs#669 (and many other PRs linked from there, we had to bisect in production).

Do you know what could have caused the slowdown? I saw that no new dependencies were introduced so it must have been something internal to html5ever. I can try to reproduce this on a single file if you're not sure, but right now it's preventing us from upgrading.

@jdm
Copy link
Member

jdm commented Mar 31, 2020

Can you share an example document that shows a measurable performance difference when parsing it?

@jdm
Copy link
Member

jdm commented Mar 31, 2020

My only guess as to what could have changed is 81ea581; none of the rest of the commits look like they should have an impact on parsing performance.

@jyn514
Copy link
Author

jyn514 commented Mar 31, 2020

I can try, it might take me a while.

@jyn514
Copy link
Author

jyn514 commented Mar 31, 2020

It can't have been 81ea581 because that's only part of 0.25, not 0.24 (according to github's tag parsing)

@jyn514
Copy link
Author

jyn514 commented Mar 31, 2020

Ok I managed to reproduce with the docs.rs code, although the slowdown doesn't seem to be enough to explain the CPU spike above.

With 0.22:

Benchmarking parse regex html: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 51.2s or reduce sample count to 20.
parse regex html        time:   [10.066 ms 10.095 ms 10.123 ms]                              
Found 19 outliers among 100 measurements (19.00%)
  19 (19.00%) high mild

With 0.24:

parse regex html        time:   [13.598 ms 13.630 ms 13.665 ms]                              
                        change: [+35.060% +35.431% +35.778%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

The code under test: https://github.com/rust-lang/docs.rs/blob/5c389cabe8629483f00261dfb455dd842dc05940/src/utils/html.rs

Benchmark code:

extern crate criterion;
extern crate cratesfyi;

use criterion::{black_box, criterion_group, criterion_main, Criterion};
use cratesfyi::utils::extract_head_and_body;

pub fn criterion_benchmark(c: &mut Criterion) {
    let html = std::fs::read_to_string("benches/struct.CaptureMatches.html").unwrap();
    c.bench_function("parse regex html", |b| b.iter(|| extract_head_and_body(black_box(&html))));
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

HTML file:
capture_matches.zip

Let me know if you want me to try and reproduce with html5ever directly.

jyn514 added a commit to jyn514/docs.rs that referenced this issue Apr 1, 2020
See rust-lang#671,
servo/html5ever#414 for rationale.

This makes the `extract_head_and_body` function public since I couldn't
find a cfg flag that would do so only during tests.
jyn514 added a commit to rust-lang/docs.rs that referenced this issue Apr 1, 2020
See #671,
servo/html5ever#414 for rationale.

This makes the `extract_head_and_body` function public since I couldn't
find a cfg flag that would do so only during tests.
@jyn514
Copy link
Author

jyn514 commented Apr 1, 2020

Benchmarks with 0.25 and kuchiki still show a big slowdown, but not as bad as 0.24: rust-lang/docs.rs#671 (comment)

@jdm
Copy link
Member

jdm commented Apr 2, 2020

We probably won't be able to investigate this until sometime next week due to deadlines, unfortunately.

@jyn514
Copy link
Author

jyn514 commented Apr 9, 2020

Just checking on the status of this, you mentioned you might have time to look at it this week? No problem if not, this isn't urgent for us.

@jdm
Copy link
Member

jdm commented Apr 9, 2020

We have not had any time due to the previous deadlines being extended to accommodate delays.

@jyn514
Copy link
Author

jyn514 commented May 27, 2020

Checking back in on this, let me know if there's any more I can do to identify the source of the regressions.

@jdm
Copy link
Member

jdm commented Jun 1, 2020

How am I supposed to run the benchmark code? The cratesfyi crate does not appear to exist on crates.io.

@jdm
Copy link
Member

jdm commented Jun 1, 2020

The most useful thing for me would be a standalone cargo project that runs a bunch of html5ever ever code infinitely in a loop that I can run in a profiler. Even better if it's very easy to swap out the version of html5ever that's in use in order to compare profiles and identify what changed to make it slower.

@jyn514
Copy link
Author

jyn514 commented Jun 2, 2020

Here is a standalone project that runs the html5ever code indefinitely in a loop: https://github.com/jyn514/html5ever-example

To swap out the html5ever, just change the version specified in Cargo.toml. Unfortunately this will only work for versions up to 0.24 (since rcdom was removed in 0.25). You can use 0.25 through kuchiki by checking out the kuchiki branch.

@jdm

This comment has been minimized.

@jdm
Copy link
Member

jdm commented Jun 6, 2020

Thanks for the standalone project! I applied the following change to it:

diff --git a/src/main.rs b/src/main.rs
index 49c3787..4eda507 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -8,9 +8,12 @@ type Result<T> = std::result::Result<T, failure::Error>;

 fn main() {
     let regex = include_str!("struct.CaptureMatches.html");
-    loop {
+    let now = std::time::Instant::now();
+    let iters = 200;
+    for _ in 1..iters {
         let (_head, _body, _classes) = extract_head_and_body(regex).unwrap();
     }
+    println!("{}us/iter", (now.elapsed() / iters).as_micros());
 }

 /// Extracts the contents of the `<head>` and `<body>` tags from an HTML document, as well as the

And ran cargo run --release 8 times for each of 0.22, 0.23, and 0.24. I got these results:

0.22:
6799, 6952, 7042, 7056, 7078, 7096, 7135, 7272,

0.23:
7175, 7155, 7160, 7276, 7280, 7305, 7376, 7387, 

0.24:
7226, 7240, 7249, 7331, 7342, 7373, 7390, 7705, 

The spread definitely looks like versions since 0.22 take longer.

@jdm
Copy link
Member

jdm commented Jun 6, 2020

That being said, we're taking about a difference between 6.7ms and 7.7ms, but I'll see if a profile shows anything interesting.

@jyn514
Copy link
Author

jyn514 commented Jun 6, 2020

Awesome, thanks for the response! I agree that doesn't seem to explain the spike in usage we saw, but I'm at a loss as to what else it could be ... the only changes we made were those in https://github.com/rust-lang/docs.rs/pull/668/files (html5ever plus some transitive dependencies). I'll see whether we can try 0.25 instead and see if that has less of a usage spike than 0.24, I remember I benched it in rust-lang/docs.rs#671 (comment) and it was slightly better.

@jyn514
Copy link
Author

jyn514 commented Jun 16, 2020

It seems that this was not nearly as bad a spike in 0.25, so this can probably be closed. It's still slightly worse performance than 0.22 so if you see something interesting I'd love to have it improved, but it's not a high priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants