Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use
load
+store
instead ofmemcpy
for small integer arrays #111999Use
load
+store
instead ofmemcpy
for small integer arrays #111999Changes from all commits
cce0b52
e1b020d
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a FIXME so someone may remove this when llvm only has opaque ptrs? (Well, I guess removing this logic would also preclude other backends with typed ptrs, too. In that case, maybe no comment at all.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a bigger question because GCC would need to move, so I'll leave it tracked by conversations like https://rust-lang.zulipchat.com/#narrow/stream/187780-t-compiler.2Fwg-llvm/topic/llvm.20bitcasts.20in.20codegen/near/356591425 instead of something specific in this place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do these not match pty? Why not just return a more "accurate" type in
scalar_copy_llvm_type
? Like the actual LLVM type that corresponds to src/dst? (Or am I misunderstanding something?)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Well, the shallow answer is that they don't match because it's
-> Option<Type>
, notshould_load_store_instead_of_memcpy() -> bool
.)Because emitting these as the backend type doesn't necessarily do what we want. Notably, this PR is emitting the load/store as
<4 x i8>
, the vector type, rather than thellvm_backend_type
of[4 x i8]
for arrays.I tried using the LLVM type first, but with arrays that results in exploding out the IR: https://llvm.godbolt.org/z/vjjsdea9e
Optimizing the version that loads/stores arrays
gives
whereas optimizing the version with vectors leaves the operations together
for the backend to legalize instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and I originally expected to do this with LLVM's arbitrary-length integer support https://llvm.godbolt.org/z/hzG9aeqhM, like I did for comparisons back in #85828
But that broke some of the autovectorization codegen tests for operations on arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ok, so we explicitly want to be lowering these moves to vector types specifically because they can optimize better than arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a default impl here may make it not obvious that other backends should emit typed load/store. Is it bad style to just add this to cranelift and codegen_gcc here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, cg_clif doesn't use cg_ssa, so it's not impacted.
I have no practical way to test cg_gcc, being on windows, and I also don't actually have any information that doing something other than
memcpy
for it would actually be an improvement for it, so I figured I'd just leave the default here since it's a semantically sufficient implementation.Notably, GCC apparently doesn't have the
poison
semantics that are what nikic mentioned as being the problem for better optimizing this:rust/compiler/rustc_codegen_gcc/src/common.rs
Lines 76 to 79 in f8447b9
so indeed it might just never need to do this.