-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Range.contains failed to be inlined/optimized #90609
Comments
Are you sure you posted the right links? Inlining does happen in the godbolt examples, there are no |
Maybe I was confusing/incorreect with my use of the term 'inlining'. Yes, the assembly has no call instruction, but the optimizer fails to "merge" the checks when I use |
While I believe this issue has been resolved, I'm seeing an odd discrepancy between versions 1.52 and later versions depending on whether the special value is added at the left or right end of the range (see the first two rows of https://rust.godbolt.org/z/Ea149cjYo). It looks like, starting with version 1.53, there is an improvement at the left end of the range in only one of the two function versions (the one without |
Fixed by the LLVM upgrade in #93577. |
I was suggested on Stack Overflow (https://stackoverflow.com/questions/69844819/rust-range-contains-failed-to-be-inlined-optimized) to ask here.
I am aware that optimization in complex situations can fail to apply. However, rather straightforward inlining "in the small" should still apply.
I was running my code through Clippy and it suggested changing the following:
Into
Since it is more readable. Unfortunately the resulting assembly output is twice as long, even with optimization level 3. Manually inlining it (2-nestings down), gives almost the same code as
version1
and is as efficient.If I remove the
|| value == SPECIAL_VALUE
they all resolve with the same (though with 1 more instruction added to decrement the parameter value before a compare). Also if I changeSPECIAL_VALUE
to something not adjacent to the range they all resolve to same assembly code asversion2
, which is the reason why I kept it0
unless I eventually have to change it.I have a link to Godbolt with the code here: https://rust.godbolt.org/z/d9PWYEKc8
Why is the compiler failing to properly inline/optimize
version2
? Is it an "optimization bug"? Or am I misunderstanding some semantics of Rust, maybe something with the borrowing prevents the optimization, but can't the compiler assume no mutation of value due to the aliasing and referencing rules? Because the optimization is applied inversion1
it would suggest LLVM knows that because the value is unsigned it can simplify the comparison. So it may be that there is a missed optimization opportunity in the Rust frontend?Trying to do something similar in C++ gives the optimum short assembly in GCC but not in Clang https://godbolt.org/z/erYPYsvhf
The text was updated successfully, but these errors were encountered: