-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce ChunkedBitSet
and use it for some dataflow analyses.
#93984
Conversation
This will be a complex one to evaluate, because there are various contradictory metrics results, e.g. some improving, some worsening. Anyway, let's get a perf run started to serve as the basis for any discussion: @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
@bors try |
⌛ Trying commit fd9228607b35e1094193fe9e423e356544b527f2 with merge 9eec9616e3c91ff7231a3fadff261984e83f251f... |
☀️ Try build successful - checks-actions |
Queued 9eec9616e3c91ff7231a3fadff261984e83f251f with parent 52dd59e, future comparison URL. |
Finished benchmarking commit (9eec9616e3c91ff7231a3fadff261984e83f251f): comparison url. Summary: This benchmark run shows 115 relevant regressions 😿 to instruction counts.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never |
If you look at instruction counts, it looks bad. If you look at every other metric, it looks good. One of the cases where instruction counts it misleading, I think. A big part of this: x86-64 has "string processing" instructions like |
This also improves the |
BTW, this will fix #54208, if it ends up being merged. |
Not sure if draft status is intended to reflect some unreadiness, but assigning myself to at least take an initial look. |
Thanks! The code quality isn't great right now, there are numerous "njn:" comments indicating things that need to be fixed, and the |
fd92286
to
3ba04f6
Compare
I have updated the code. The second commit contains the use of @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 3ba04f6462ebd682fb50c5b7123c50e0ee748f99 with merge 5a267675a45b7dcaef6afd739f55f3b207eaba3b... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left high-level commentary on data structures for now, didn't look at usage/impls.
☀️ Try build successful - checks-actions |
Queued 5a267675a45b7dcaef6afd739f55f3b207eaba3b with parent 6655109, future comparison URL. |
Finished benchmarking commit (5a267675a45b7dcaef6afd739f55f3b207eaba3b): comparison url. Summary: This benchmark run shows 6 relevant improvements 🎉 but 121 relevant regressions 😿 to instruction counts.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never |
Finished benchmarking commit (9003e92b6f7298995cc9772da0ec54c403a09a7e): comparison url. Summary: This benchmark run shows 12 relevant improvements 🎉 but 105 relevant regressions 😿 to instruction counts.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never |
As mentioned here and here, this is a rare case where instruction counts are misleading. Instead look at wall-time and cycles (some big wins along with the usual noise) along with max-rss and faults (some massive wins along with the usual noise). @rustbot label: +perf-regression-triaged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good overall
Mostly nits, r=me if you don't make major changes |
This reduces peak memory usage significantly for some programs with very large functions, such as: - `keccak`, `unicode_normalization`, and `match-stress-enum`, from the `rustc-perf` benchmark suite; - `http-0.2.6` from crates.io. The new type is used in the analyses where the bitsets can get huge (e.g. 10s of thousands of bits): `MaybeInitializedPlaces`, `MaybeUninitializedPlaces`, and `EverInitializedPlaces`. Some refactoring was required in `rustc_mir_dataflow`. All existing analysis domains are either `BitSet` or a trivial wrapper around `BitSet`, and access in a few places is done via `Borrow<BitSet>` or `BorrowMut<BitSet>`. Now that some of these domains are `ClusterBitSet`, that no longer works. So this commit replaces the `Borrow`/`BorrowMut` usage with a new trait `BitSetExt` containing the needed bitset operations. The impls just forward these to the underlying bitset type. This required fiddling with trait bounds in a few places. The commit also: - Moves `static_assert_size` from `rustc_data_structures` to `rustc_index` so it can be used in the latter; the former now re-exports it so existing users are unaffected. - Factors out some common "clear excess bits in the final word" functionality in `bit_set.rs`. - Uses `fill` in a few places instead of loops.
d97c79e
to
36b495f
Compare
@bors r=Mark-Simulacrum |
📌 Commit 36b495f has been approved by |
I took some extra measurements on my machine for affected benchmarks:
instructions:uAs mentioned above, these aren't useful for this change, but I've put them here for completeness.
cyclesThe cycle results for this PR were consistently better on CI than they were for my local builds. Maybe because my local builds don't have PGO? Anyway, these aren't all that reliable.
wall-timeSame story for wall-time as for cycles.
max-rssHuge wins here.
faultsHuge wins here, too.
|
☀️ Test successful - checks-actions |
Finished benchmarking commit (bafe8d0): comparison url. Summary: This benchmark run shows 6 relevant improvements 🎉 but 107 relevant regressions 😿 to instruction counts.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression |
…Simulacrum Remove `HybridBitSet` `HybridBitSet` was introduced under the name `HybridIdxSetBuf` way back in rust-lang#53383 where it was a big win for NLL borrow checker performance. In rust-lang#93984 the more flexible `ChunkedBitSet` was added. Uses of `HybridBitSet` have gradually disappeared (e.g. rust-lang#116152) and there are now few enough that they can be replaced with `BitSet` or `ChunkedBitSet`, and `HybridBitSet` can be removed, cutting more than 700 lines of code. r? `@Mark-Simulacrum`
This reduces peak memory usage significantly for some programs with very
large functions.
r? @ghost