-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove my scalar_copy_backend_type
optimization attempt
#123185
Conversation
r? @fee1-dead rustbot has assigned @fee1-dead. Use |
// CHECK: %[[TEMP:.+]] = load <2 x i8>, ptr %a, align 1 | ||
// CHECK: store <2 x i8> %[[TEMP]], ptr %p, align 1 | ||
// CHECK: %[[TEMP:.+]] = load i16, ptr %a, align 1 | ||
// CHECK: store i16 %[[TEMP]], ptr %p, align 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that while we now generate an alloca
and memcpy
s for this (as seen in the array-codegen
file), LLVM is able to remove them.
If LLVM picks i16
for this (and not <2 x i8>
), then great, let's do that.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Remove my `scalar_copy_backend_type` optimization attempt I added this back in rust-lang#111999 , but I no longer think it's a good idea - It had to get scaled back to only power-of-two things to not break a bunch of targets - LLVM seems to be getting better at memcpy removal anyway - Introducing vector instructions has seemed to sometimes (rust-lang#115515 (comment)) make autovectorization worse So this removes it from the codegen crates entirely, and instead just tries to use <https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy> instead of direct `memcpy` so things will still use load/store when a type isn't `OperandValue::Ref`.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (ab08738): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 667.865s -> 668.349s (0.07%) |
One small secondary regression; everything else an improvement -- even the runtime benchmarks improved. |
@@ -419,7 +418,14 @@ impl<'a, 'tcx, V: CodegenObject> OperandValue<V> { | |||
bx.store_with_flags(val, dest.llval, dest.align, flags); | |||
return; | |||
} | |||
base::memcpy_ty(bx, dest.llval, dest.align, r, source_align, dest.layout, flags) | |||
bx.memcpy_known_size( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 why isn't this also just using typed_place_copy
?
also, memcpy_known_size
has one callsite (this one) -- you should inline it. i don't really see the value of exposing it if it's only being used once, seems to add more confusion to the api imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know, that's a really good observation. I thought the answer was that it would be recursion -- that typed_place_copy
ends up here -- but now that I look it doesn't, so I'll make that change 👍
And yup, I'll inline memcpy_known_size
too. Looks like I forgot to think it if was useful after I undid a couple of other changes that didn't work out.
b72e5ad
to
120cb65
Compare
3e5d267
to
ac20c35
Compare
Well @rustbot ready |
r? compiler |
I already started this review anyways r? compiler-errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM seems to be getting better at memcpy removal anyway.
I think we can expect to see more significant changes after llvm/llvm-project#87190.
// OPT3LINX64-NEXT: store <8 x i16> | ||
// OPT3WINX64: load <8 x i16> | ||
// OPT3WINX64-NEXT: call <8 x i16> @llvm.bswap | ||
// OPT3WINX64-NEXT: store <8 x i16> | ||
// CHECK-NEXT: ret void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks! I hadn't noticed that different optimization levels yield different optimization effects.
I believe it makes sense to differentiate between O2 and O3 here.
The changes to O2 seem fine. I'll check again on this later.
…-errors Remove my `scalar_copy_backend_type` optimization attempt I added this back in rust-lang#111999 , but I no longer think it's a good idea - It had to get scaled back to only power-of-two things to not break a bunch of targets - LLVM seems to be getting better at memcpy removal anyway - Introducing vector instructions has seemed to sometimes (rust-lang#115515 (comment)) make autovectorization worse So this removes it from the codegen crates entirely, and instead just tries to use <https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy> instead of direct `memcpy` so things will still use load/store when a type isn't `OperandValue::Ref`.
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
556b47e
to
593e900
Compare
☀️ Test successful - checks-actions |
Finished benchmarking commit (c2239bc): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 675.526s -> 675.28s (-0.04%) |
Wow, I think that might be the happiest perf has ever been with one of my PRs. Even instruction wins on the runtime benchmarks. |
I added this back in #111999 , but I no longer think it's a good idea
So this removes it from the codegen crates entirely, and instead just tries to use https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/builder/trait.BuilderMethods.html#method.typed_place_copy instead of direct
memcpy
so things will still use load/store when a type isn'tOperandValue::Ref
.