-
Notifications
You must be signed in to change notification settings - Fork 153
Add diesel to the rustc perf test suite #807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add diesel to the rustc perf test suite #807
Conversation
886a95e
to
31fd220
Compare
@jyn514 I'm not sure if #802 is really comparable. Diesel stresses not only rustdoc, but all of rustc's trait resolution related code. We've hit quite a few of performance related issues there. (See some of the issues opened by me at the rustc repo). The submitted test configuration really stresses rustc/rustdoc/… as it implements a lot of traits for tuples up to 128 values. (That is happening via this macro).) As already mentioned here that really takes quite a lot of time. (On my laptop it takes ~17 minutes to run Edit: I should probably add that I do not expect that this gets much better any time soon. I assume that significant improvements would need need substantial changes to diesel + maybe something like variadic generics. |
As far as I know diesel is a rather strange workload for rustc. According to some short measurements most of the time compiling diesel is spend type checking the crate and resolving trait bounds. I see multiple reasons for this: * Diesel builts a complex abstract query dsl for SQL based on rust generics. All fo this needs to be type checked. * Diesel generates a ton of trait impls for tuples of various sizes. There are features to set the supported max size to 16, 32, 64 and 128 tuple elements. As this is a benchmark, I've chosen to set it to 128 to maximize the number of impls. As consequence of this diesels compile times are quite sensitive to changes touching the type system in general and the trait resolution in detail. Any change that will introduce a behaviour which does not scale well with the number of available trait impls will likely result in a huge increase for this benchmark.
31fd220
to
5d6ade5
Compare
It looks like this adds roughly 1.5 hours of CI time to some of our builders here, and while the perf machine is more powerful we cannot afford that big a timesink for just one benchmark. I imagine lowering from 128 to e.g. 32 might help there, but as-is I cannot merge this. |
@Mark-Simulacrum I can just lower the supported tuple size to 32 if that helps, but I should probably add that the compile times do not scale linearly here in my experience, so going from supporting tuples up to 32 elements to tuples up to 64 elements does more than double the compile time. It will definitively speedup the benchmark by a lot, but I'm not sure what exactly that means for compiler internals. |
I've pushed the 32-columns variant, if that continues to take to much time we can go down another step and use the implicit default |
@weiznich a way to check it's stressing the same things is by running the benchmark locally with --self-profile, if roughly the same queries are taking most of the time that means it's still a good benchmark. |
The following comment contains the output of `default = []` (so only generate impls for up to 16 tuple elements)
Total cpu time: 6.478858514s `default = ["32-column-tables"]`
Total cpu time: 22.865035499s `default = ["64-columns-table"]`
Total cpu time: 106.57413693s `default = ["128-columns-table"]`
Total cpu time: 753.685691273s As the raw numbers are quite hard to compare, here a comparison for all passes that uses more than 5% of the compilation time in at least one of the runs:
I would have expected that if the number of tuple elements would only influence the amount of work put to each query that the those relative numbers stay at the same level, but it seems like there are a few passes that are much more important for large tuple sizes. Namely those are |
Yeah I'm unsure too. I am still not comfortable merging this PR with the current variant, it just adds too much time, and your results seem to indicate scaling that is unexpectedly non-linear in terms of time allocated to various passes - we might just not be able to pull off diesel without first speeding up the compiler or more work on the collection infra first. |
I totally understand that this comes with a large time cost, but on the other hand it seems like those two passes do matter only above a certain threshold of code generation size. That seems to be something that cannot be reproduced in a smaller test case as far as I see. |
@Mark-Simulacrum Any new insights or suggestions here how to continue on that problem? |
I've recently added metrics tracking for the queue length on perf.rust-lang.org, and I hope that in 1-2 weeks when we have some data there we can merge this and see if it has any significant impact (and then make a more informed decision). |
Ok, we have some initial data from the queue length metrics, and it looks like we have relatively speaking some downtime and aren't constantly fighting to keep up - I'm going to go ahead and merge this PR, but may back it out if the queue times end up being too long or similar. |
@@ -41,6 +41,10 @@ These are real programs that are important in some way, and worth tracking. | |||
These are real programs that are known to stress the compiler in interesting | |||
ways. | |||
|
|||
- **diesel**: A type save SQL query builder. Utilizes the type system to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should say, "A type safe SQL query builder."
As far as I know diesel is a rather strange workload for rustc.
According to some short measurements most of the time compiling diesel is
spend type checking the crate and resolving trait bounds. I see multiple
reasons for this:
generics. All fo this needs to be type checked.
There are features to set the supported max size to 16, 32, 64 and 128
tuple elements. As this is a benchmark, I've chosen to set it to 128 to
maximize the number of impls.
As consequence of this diesels compile times are quite sensitive to
changes touching the type system in general and the trait resolution in
detail. Any change that will introduce a behavior which does not scale
well with the number of available trait impls will likely result in a huge
increase for this benchmark.
As suggested by @jyn514 in rust-lang/rust#79599