-
Notifications
You must be signed in to change notification settings - Fork 153
Better benchmarks for match checking? #792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It might work well to add a test that (for example) has 1,000 small matches, or something like that, in separate functions -- but ultimately, the reason why the current stress test is like this I suspect is because I would (knowing essentially nothing about the match code) expect that the code's runtime is some function of the number of patterns -- so, the more patterns, the more we make the interesting bits hot and make it easier to detect an accidental exponential 2^n loop introduction or something like that. That said I'm quite open to modifying/adding new benchmarks, though I don't really know what those would necessary look like :) |
I suggest basing benchmarks on real-world code wherever possible.
|
As an interesting data point, rust-lang/rust#79394 enables a features that slows down match exhaustiveness noticeably. The resulting perf run therefore give a good idea of which benchmarks use the algorithm the most. Knowing the algorithm I'd say this slowdown is representative of how many times it actually goes through the "main loop", with type/pattern depth having a quadratic impact. That's good because I wanted to benchmark deeper patterns. |
HTML predefined entity resolution is a decent representative benchmark for this, I think. With this one function enabled, compilation takes 30 seconds, compared to 3 seconds without it. |
The match in question is here https://github.com/tafia/quick-xml/blob/5629ac75164793aeb8ab3bf8bb15d13ea9b29e00/src/escape.rs#L335 It's certainly a real-world example of the "huge match statement takes forever to compile" class of problems, but I'm not sure if it's exactly what this issue is looking for. It's huge but exhaustiveness is |
@Nadrieril What do you think? Maybe we could add more cases to |
Actually yeah, the two stress-tests we have are on patterns with 0 depth (fieldless enum variants and chars respectively), whereas this one is on slices so it tests the actual insides of the algorithm. Not what I was asking for but interesting to have, for match lowering too I'd guess. |
If we change match-stress I'd request we also reduce the size of the enum match. It's ridiculously big and hitting weird edge cases that aren't reproducible from machine to machine. A few hundred cases should be more than enough. |
We will probably do a large benchmark update in the upcoming months, since it has almost been three years since the last update, and the current benchmarks generate a lot of linting errors. We can do that as a part of that update. |
The benchmark suite includes two benchmarks that stress match checking (and I assume match codegen too):
unicode_normalization
andmatch-stress-enum
. The problem with these two is that they really are not representative of matches in the wild: they both have a billion branches and a single non-nested pattern per branch, whereas real-life matches have only a couple of branches and nested patterns.When I'm working on the exhaustiveness algorithm I would love to have a benchmark that has more representative examples, but I wouldn't know how to gather that. Like maybe every
?
generates a match and that's the most common form? Or maybederives
or other macros?I was wondering if anyone here would know how to collect those kinds of tests, or would have some time to look into that.
The text was updated successfully, but these errors were encountered: