-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize dataflow-const-prop place-tracking infra #110820
Conversation
r? @TaKO8Ki (rustbot has picked a reviewer for you, use r? to override) |
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 3cfa1b1fa6c493ebbf8b4282eb92a161eb8563c7 with merge 8fffd5b5f897bb58af07618610dd7b1cef63d4e2... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (8fffd5b5f897bb58af07618610dd7b1cef63d4e2): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 654.411s -> 654.14s (-0.04%) |
The perf results are expected: those code paths are not enabled by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me with question answered as a comment or addressed otherwise
// We manually iterate instead of using `children` as we need to mutate `self`. | ||
let mut next_child = self.places[root].first_child; | ||
while let Some(child) = next_child { | ||
self.cache_preorder_invoke(child); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how deep can this recursion get? We can have some very long projections from user code. A recursion limit could either cause a miscompilation or a hard error where there shouldn't be one, so maybe use ensure_sufficient_stack
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This recursion is required for correctness, so I added the ensure_sufficient_stack
.
I don't think this is much of an issue in practice: we have a limit on the number of created values in register_with_filter
.
@bors r+ |
📌 Commit e858997ad21c730972d6f3f254ad6821f64f50f6 has been approved by It is now in the queue for this repository. |
⌛ Testing commit e858997ad21c730972d6f3f254ad6821f64f50f6 with merge 390e41af383ca43944ee332c68e3f3cb1e84b122... |
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
@bors r=oli-obk |
⌛ Testing commit ccc1da2 with merge b73d98147a1d4b1f392b6e38c052d8151e1a2dfa... |
💔 Test failed - checks-actions |
@bors retry |
@bors p=50 |
☀️ Test successful - checks-actions |
Finished benchmarking commit (9a767b6): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 658.519s -> 659.961s (0.22%) |
Optimization opportunities found while investigating #110719
Computing places breadth-first ensures that we create short projections before deep projections, since the former are more likely to be propagated.
The most relevant is the pre-computation of flooded places. Callgrind showed
flood_*
methods and especiallypreorder_preinvoke
were especially hot. This PR attempts to pre-compute the set ofValueIndex
thatpreorder_invoke
would visit.Using this information, we make some
PlaceIndex
inaccessible when they contain noValueIndex
, allowing to skip computations for those places.cc @jachris as original author