Add diesel to the rustc perf test suite #807

weiznich · 2020-12-03T11:05:19Z

As far as I know diesel is a rather strange workload for rustc.
According to some short measurements most of the time compiling diesel is
spend type checking the crate and resolving trait bounds. I see multiple
reasons for this:

Diesel builts a complex abstract query dsl for SQL based on rust
generics. All fo this needs to be type checked.
Diesel generates a ton of trait impls for tuples of various sizes.
There are features to set the supported max size to 16, 32, 64 and 128
tuple elements. As this is a benchmark, I've chosen to set it to 128 to
maximize the number of impls.

As consequence of this diesels compile times are quite sensitive to
changes touching the type system in general and the trait resolution in
detail. Any change that will introduce a behavior which does not scale
well with the number of available trait impls will likely result in a huge
increase for this benchmark.

As suggested by @jyn514 in rust-lang/rust#79599

jyn514 · 2020-12-03T12:57:29Z

Hmm, I just realized this has a lot of overlap with #802. Maybe we should only add one or the other? I like #802 just because I know it stresses rustdoc - @weiznich how long does rustdoc take to run on diesel?

weiznich · 2020-12-03T13:31:51Z

@jyn514 I'm not sure if #802 is really comparable. Diesel stresses not only rustdoc, but all of rustc's trait resolution related code. We've hit quite a few of performance related issues there. (See some of the issues opened by me at the rustc repo). The submitted test configuration really stresses rustc/rustdoc/… as it implements a lot of traits for tuples up to 128 values. (That is happening via this macro).) As already mentioned here that really takes quite a lot of time. (On my laptop it takes ~17 minutes to run cargo doc --no-deps, while requiring >10GB RAM)

Edit: I should probably add that I do not expect that this gets much better any time soon. I assume that significant improvements would need need substantial changes to diesel + maybe something like variadic generics.

As far as I know diesel is a rather strange workload for rustc. According to some short measurements most of the time compiling diesel is spend type checking the crate and resolving trait bounds. I see multiple reasons for this: * Diesel builts a complex abstract query dsl for SQL based on rust generics. All fo this needs to be type checked. * Diesel generates a ton of trait impls for tuples of various sizes. There are features to set the supported max size to 16, 32, 64 and 128 tuple elements. As this is a benchmark, I've chosen to set it to 128 to maximize the number of impls. As consequence of this diesels compile times are quite sensitive to changes touching the type system in general and the trait resolution in detail. Any change that will introduce a behaviour which does not scale well with the number of available trait impls will likely result in a huge increase for this benchmark.

Mark-Simulacrum · 2020-12-14T23:44:09Z

It looks like this adds roughly 1.5 hours of CI time to some of our builders here, and while the perf machine is more powerful we cannot afford that big a timesink for just one benchmark. I imagine lowering from 128 to e.g. 32 might help there, but as-is I cannot merge this.

weiznich · 2020-12-15T08:19:02Z

@Mark-Simulacrum I can just lower the supported tuple size to 32 if that helps, but I should probably add that the compile times do not scale linearly here in my experience, so going from supporting tuples up to 32 elements to tuples up to 64 elements does more than double the compile time. It will definitively speedup the benchmark by a lot, but I'm not sure what exactly that means for compiler internals.

weiznich · 2020-12-15T08:21:42Z

I've pushed the 32-columns variant, if that continues to take to much time we can go down another step and use the implicit default 16-column feature.

jyn514 · 2020-12-15T13:01:20Z

@weiznich a way to check it's stressing the same things is by running the benchmark locally with --self-profile, if roughly the same queries are taking most of the time that means it's still a good benchmark.

weiznich · 2020-12-16T09:50:10Z

The following comment contains the output of summarize summarize -p 1 profile for running cargo build using the different feature flags on diesel:

`default = []` (so only generate impls for up to 16 tuple elements)

Item	Self time	% of total time	Time	Item count
`typeck`	1.23s	18.973	1.36s	1817
`expand_crate`	729.72ms	11.263	749.28ms	1
`mir_borrowck`	578.55ms	8.930	1.41s	1817
`check_item_well_formed`	328.82ms	5.075	412.05ms	5520
`evaluate_obligation`	211.08ms	3.258	216.04ms	21291
`mir_built`	206.45ms	3.187	308.50ms	1817
`LLVM_module_codegen_emit_obj`	171.76ms	2.651	171.76ms	66
`optimized_mir`	171.70ms	2.650	499.90ms	1988
`type_op_prove_predicate`	162.80ms	2.513	163.16ms	13902
`mir_drops_elaborated_and_const_checked`	160.75ms	2.481	262.09ms	1817
`check_impl_item_well_formed`	157.86ms	2.436	239.52ms	2486
`check_mod_item_types`	152.42ms	2.353	163.29ms	135
`normalize_projection_ty`	126.15ms	1.947	126.24ms	4664
`type_op_ascribe_user_type`	111.36ms	1.719	111.41ms	1806
`resolve_crate`	109.78ms	1.694	109.78ms	1
`generate_crate_metadata`	97.10ms	1.499	646.62ms	1
`param_env`	89.74ms	1.385	115.86ms	8690
`hir_lowering`	88.33ms	1.363	88.33ms	1
`check_mod_privacy`	84.71ms	1.308	85.61ms	135
`LLVM_passes`	83.95ms	1.296	83.95ms	1
`specialization_graph_of`	66.81ms	1.031	101.17ms	145

Total cpu time: 6.478858514s
Filtered results account for 79.012% of total time.

`default = ["32-column-tables"]`

Item	Self time	% of total time	Time	Item count
`typeck`	5.37s	23.476	5.75s	2249
`expand_crate`	3.18s	13.897	3.19s	1
`mir_borrowck`	2.12s	9.262	4.86s	2249
`check_item_well_formed`	1.59s	6.954	1.80s	6224
`evaluate_obligation`	667.61ms	2.920	675.63ms	47627
`mir_built`	666.40ms	2.915	969.31ms	2249
`check_mod_item_types`	569.95ms	2.493	593.27ms	135
`check_impl_item_well_formed`	560.54ms	2.451	792.59ms	3174
`type_op_prove_predicate`	553.23ms	2.420	554.12ms	32054
`mir_drops_elaborated_and_const_checked`	538.08ms	2.353	851.83ms	2249
`check_mod_privacy`	523.02ms	2.287	524.57ms	135
`type_op_ascribe_user_type`	501.22ms	2.192	501.32ms	5446
`optimized_mir`	500.68ms	2.190	1.59s	2436
`normalize_projection_ty`	420.51ms	1.839	420.59ms	10000
`resolve_crate`	401.20ms	1.755	401.20ms	1
`specialization_graph_of`	324.67ms	1.420	361.09ms	145
`hir_lowering`	318.75ms	1.394	318.75ms	1
`param_env`	238.53ms	1.043	275.20ms	10098

Total cpu time: 22.865035499s
Filtered results account for 83.261% of total time.

`default = ["64-columns-table"]`

Item	Self time	% of total time	Time	Item count
`expand_crate`	26.75s	25.100	26.90s	1
`typeck`	22.37s	20.987	23.25s	3113
`mir_borrowck`	9.09s	8.528	20.22s	3113
`check_item_well_formed`	6.22s	5.835	6.79s	7632
`check_mod_privacy`	5.16s	4.840	5.16s	135
`type_op_ascribe_user_type`	2.59s	2.427	2.59s	19638
`specialization_graph_of`	2.44s	2.289	2.51s	145
`type_op_prove_predicate`	2.43s	2.277	2.43s	98310
`evaluate_obligation`	2.32s	2.181	2.33s	144843
`mir_drops_elaborated_and_const_checked`	2.22s	2.080	3.59s	3113
`mir_built`	2.22s	2.080	3.33s	3113
`check_mod_item_types`	1.98s	1.860	2.01s	135
`check_impl_item_well_formed`	1.97s	1.846	2.47s	4550
`optimized_mir`	1.89s	1.773	6.55s	3332
`resolve_crate`	1.87s	1.759	1.87s	1
`normalize_projection_ty`	1.79s	1.682	1.79s	29120
`hir_lowering`	1.28s	1.203	1.28s	1

Total cpu time: 106.57413693s
Filtered results account for 88.749% of total time.

`default = ["128-columns-table"]`

Item	Self time	% of total time	Time	Item count

`expand_crate`	262.36s	34.810	262.37s	1
`typeck`	120.41s	15.976	123.73s	4841
`check_mod_privacy`	74.11s	9.833	74.12s	135
`mir_borrowck`	50.40s	6.687	112.08s	4841
`check_item_well_formed`	44.92s	5.960	49.00s	10448
`specialization_graph_of`	21.17s	2.809	21.21s	145
`type_op_ascribe_user_type`	16.66s	2.210	16.66s	75670
`type_op_prove_predicate`	14.85s	1.970	14.85s	350630
`check_mod_item_types`	14.15s	1.878	14.20s	135
`check_impl_item_well_formed`	12.44s	1.650	14.38s	7302
`evaluate_obligation`	11.53s	1.530	11.54s	517451
`normalize_projection_ty`	10.78s	1.431	10.78s	101152
`mir_drops_elaborated_and_const_checked`	10.41s	1.382	18.18s	4841
`mir_built`	9.04s	1.199	14.42s	4841
`resolve_crate`	8.13s	1.079	8.13s	1
`optimized_mir`	7.97s	1.058	32.16s	5124

Total cpu time: 753.685691273s
Filtered results account for 91.461% of total time.

As the raw numbers are quite hard to compare, here a comparison for all passes that uses more than 5% of the compilation time in at least one of the runs:

	16	32	64	128
`typeck`	18.973%	23.476%	20.987%	15.976%
`expand_crate`	11.263%	13.897%	25.1%	34.81%
`mir_borrowck`	8.93%	9.262%	8.528%	6.687%
`check_item_well_formed`	5.075%	6.954%	5.835%	5.96%
`check_mod_privacy`	1.308%	2.287%	4.84%	9.833%

I would have expected that if the number of tuple elements would only influence the amount of work put to each query that the those relative numbers stay at the same level, but it seems like there are a few passes that are much more important for large tuple sizes. Namely those are expand_crate (Expanding the large __diesel_for_each_tuple!(tuple_impls); macro call?) and check_mod_privacy (Unsure why this happens). Given those points I'm not sure how useful it would be to just use the faster smaller variant as benchmark.

Mark-Simulacrum · 2020-12-16T14:33:35Z

Yeah I'm unsure too. I am still not comfortable merging this PR with the current variant, it just adds too much time, and your results seem to indicate scaling that is unexpectedly non-linear in terms of time allocated to various passes - we might just not be able to pull off diesel without first speeding up the compiler or more work on the collection infra first.

weiznich · 2020-12-16T16:32:29Z

I totally understand that this comes with a large time cost, but on the other hand it seems like those two passes do matter only above a certain threshold of code generation size. That seems to be something that cannot be reproduced in a smaller test case as far as I see.

weiznich · 2021-01-07T17:40:48Z

@Mark-Simulacrum Any new insights or suggestions here how to continue on that problem?

Mark-Simulacrum · 2021-03-04T14:18:19Z

I've recently added metrics tracking for the queue length on perf.rust-lang.org, and I hope that in 1-2 weeks when we have some data there we can merge this and see if it has any significant impact (and then make a more informed decision).

Mark-Simulacrum · 2021-03-08T13:53:02Z

Ok, we have some initial data from the queue length metrics, and it looks like we have relatively speaking some downtime and aren't constantly fighting to keep up - I'm going to go ahead and merge this PR, but may back it out if the queue times end up being too long or similar.

SunnyWar · 2021-03-10T05:44:11Z

collector/benchmarks/README.md

@@ -41,6 +41,10 @@ These are real programs that are important in some way, and worth tracking.
 These are real programs that are known to stress the compiler in interesting
 ways.

+- **diesel**: A type save SQL query builder. Utilizes the type system to


should say, "A type safe SQL query builder."

weiznich force-pushed the add_diesel_to_rustc_bench branch 2 times, most recently from 886a95e to 31fd220 Compare December 3, 2020 12:45

weiznich force-pushed the add_diesel_to_rustc_bench branch from 31fd220 to 5d6ade5 Compare December 3, 2020 13:32

jyn514 mentioned this pull request Dec 12, 2020

Add stm327xx-hal benchmark #802

Merged

Decrease to 32-columns feature for diesel to speed up the benchmark

ac9a2bd

jyn514 mentioned this pull request Dec 18, 2020

Turn quadratic time on number of impl blocks into linear time rust-lang/rust#78317

Merged

This was referenced Jan 22, 2021

Non linear runtime increasment for expand_crate pass while compiling diesel with 128-column-tables rust-lang/rust#81262

Open

Nonlinearity in check_mod_privacy pass for a large number of impls rust-lang/rust#81263

Open

Mark-Simulacrum merged commit 96b7bdf into rust-lang:master Mar 8, 2021

weiznich mentioned this pull request Mar 9, 2021

Initial analysis is slow for diesel rust-lang/rust-analyzer#4186

Closed

SunnyWar reviewed Mar 10, 2021

View reviewed changes

edwin0cheng mentioned this pull request Mar 13, 2021

Speed up mbe matching in heavy recursive cases rust-lang/rust-analyzer#7994

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add diesel to the rustc perf test suite #807

Add diesel to the rustc perf test suite #807

Uh oh!

weiznich commented Dec 3, 2020

Uh oh!

jyn514 commented Dec 3, 2020

Uh oh!

weiznich commented Dec 3, 2020 •

edited

Loading

Uh oh!

Mark-Simulacrum commented Dec 14, 2020

Uh oh!

weiznich commented Dec 15, 2020

Uh oh!

weiznich commented Dec 15, 2020

Uh oh!

jyn514 commented Dec 15, 2020

Uh oh!

weiznich commented Dec 16, 2020

Uh oh!

Mark-Simulacrum commented Dec 16, 2020

Uh oh!

weiznich commented Dec 16, 2020

Uh oh!

weiznich commented Jan 7, 2021

Uh oh!

Mark-Simulacrum commented Mar 4, 2021

Uh oh!

Mark-Simulacrum commented Mar 8, 2021

Uh oh!

SunnyWar Mar 10, 2021 •

edited

Loading

Uh oh!

Uh oh!

Add diesel to the rustc perf test suite #807

Add diesel to the rustc perf test suite #807

Uh oh!

Conversation

weiznich commented Dec 3, 2020

Uh oh!

jyn514 commented Dec 3, 2020

Uh oh!

weiznich commented Dec 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mark-Simulacrum commented Dec 14, 2020

Uh oh!

weiznich commented Dec 15, 2020

Uh oh!

weiznich commented Dec 15, 2020

Uh oh!

jyn514 commented Dec 15, 2020

Uh oh!

weiznich commented Dec 16, 2020

Uh oh!

Mark-Simulacrum commented Dec 16, 2020

Uh oh!

weiznich commented Dec 16, 2020

Uh oh!

weiznich commented Jan 7, 2021

Uh oh!

Mark-Simulacrum commented Mar 4, 2021

Uh oh!

Mark-Simulacrum commented Mar 8, 2021

Uh oh!

SunnyWar Mar 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

weiznich commented Dec 3, 2020 •

edited

Loading

SunnyWar Mar 10, 2021 •

edited

Loading