-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve codegen for align_offset
#75600
Conversation
At opt-level <= 1, the methods such as `wrapping_mul` are not being inlined, causing significant bloating and slowdowns of the implementation at these optimisation levels. With use of these intrinsics, the codegen of this function at -Copt_level=1 is the same as it is at -Copt_level=3.
Previously checking for `pmoda == 0` would get LLVM to generate branchy code, when, for `stride = 1` the offset can be computed without such a branch by doing effectively a `-p % a`. For well-known (constant) alignments, with the new ordering of these conditionals, we end up generating 2 to 3 cheap instructions on x86_64: movq %rdi, %rax negl %eax andl $7, %eax instead of 5+ as previously. For unknown alignments the new code also generates just 3 instructions: negq %rdi leaq -1(%rsi), %rax andq %rdi, %rax
r? @KodrAus (rust_highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue (I don't think we use this much so probably not important, but worth checking) |
Awaiting bors try build completion |
⌛ Trying commit 5d22b18 with merge 5c6f0b9c5f37b21b2a1dcf91bc2a50a5d6209672... |
☀️ Try build successful - checks-actions, checks-azure |
Queued 5c6f0b9c5f37b21b2a1dcf91bc2a50a5d6209672 with parent 009551f, future comparison URL. |
Finished benchmarking try commit (5c6f0b9c5f37b21b2a1dcf91bc2a50a5d6209672): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Looks good to me! The performance results seem to suggest we do get a slight improvement in those builds with low optimization levels. @bors r+ |
📌 Commit 5d22b18 has been approved by |
☀️ Test successful - checks-actions, checks-azure |
In this PR the
align_offset
implementation is changed/improved to produce better code in certain scenarios such as when pointer type is has a stride of 1 or when building for low optimisation levels.While these changes do not achieve the "ideal" codegen referenced in #75579, it gets significantly closer to it. I’m not actually sure if the codegen can actually be much better with this function returning the offset, rather than the aligned pointer.
See the descriptions for separate commits for further information.