-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize fold_ty
#107627
Optimize fold_ty
#107627
Conversation
Best reviewed one commit at a time. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit a1df2c5d1e9a41e6f4fdceac31a831ba01763108 with merge 14f439f45a195e76c41d576f2e6aeac48603aae3... |
☀️ Try build successful - checks-actions |
1 similar comment
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (14f439f45a195e76c41d576f2e6aeac48603aae3): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me with nits or not, unless you want a review from oli specifically
ty::IntVar(v), | ||
ty::FreshIntTy, | ||
), | ||
#[cfg(debug_assertions)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally think this should stay an "always" assertion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. To avoid any performance issues, this could call a #[cold]
function with the bug!
inside instead of having the formatting inside the main function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried changing it back to an always assertion, and it had a noticeable perf impact, e.g. the instruction count for wg-grammar
increased by 0.7%. I then tried the #[cold]
function and it made a small improvement, but was still 0.5% worse.
So I will leave this as is, but I will add a comment about it.
} | ||
|
||
ty::Generator(..) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional question: Does the wildcard have a perf difference over the exhaustive match? Otherwise, I kinda prefer the exhaustive match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably does. Transforming exhaustive matches to a wildcard in the code generation may be a good idea if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on the use case. If the match can be converted to a table lookup, the exhaustive match will have one less branch in LLVM, but have a bigger lookup table: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c7b21a7f9d032aea5aa261953a85d735
For actual branching logic, it doesn't really matter. There may be a larger lookup table in LLVM IR, but that will become the same thing at the assembly level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the debug assertion for Placeholder
and Bound
in place, doing an exhaustive match is awkward, so I've left this unchanged as well. If it helps, this leaves this method not dissimilar to ShallowResolver::fold_ty
, which has the form if let ty::Infer(v) = ty.kind() { ... } else { ty }
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(It makes me sad that we discover the need to do micro-optimizations like re-encoding a big or-pattern arm as a wildcard; I, like @compiler-errors, find value in the exhaustive match from the view point of maintenance. Are we keeping track of efforts, if any, to put such a transformation into rustc itself?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I could even imagine an #[rustc_*]
attribute that would tell the compiler to convert a given arm into a wild-card at the end of the match. That would provide a way to make @compiler-errors errors and also ease experiments like this one that @nnethercote is doing, right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is not "wildcard is faster than manually listing the alternatives". The problem is the assertion on Placeholder
and Bound
. A debug assertion is faster, which makes sense. And once you have the debug assertion for those variants, having a wildcard is a lot easier.
If that assertion wasn't necessary, then you can do an exhaustive match that is the same speed as a wildcard match. (I just tried it out; same speed.) Though I would argue that an exhaustive match probably isn't appropriate when ty::Infer
gets treatment A and every other variant gets treatment B.
So one doesn't have to be constructed every time.
`!t.has_non_region_infer()` is the test used in `OpportunisticVarResolver`, and catches a few cases that `!t.needs_infer()` misses.
a1df2c5
to
4aec134
Compare
I addressed most of the suggestions, mostly by adding comments. I couldn't address the ones about the match in Based on @compiler-errors' previos "r=me with nits or not", I will say: Thanks for the reviews! |
☀️ Test successful - checks-actions |
Finished benchmarking commit (14ea63a): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. |
The post-merge perf run has regressions in @rustbot label: +perf-regression-triaged |
Micro-optimizing the heck out of the important
fold_ty
methods.r? @oli-obk