-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid alloca
s in codegen for simple mir::Aggregate
statements
#123886
Conversation
r? @fee1-dead rustbot has assigned @fee1-dead. Use |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Avoid `alloca`s in codegen for simple pairs and simple transparent structs Even something simple like constructing a ```rust #[repr(transparent)] struct Foo(u32); ``` forces an `alloca` to be generated in nightly right now. Certainly LLVM can optimize that away, but it would be nice if it didn't have to. Quick example: ```rust #[repr(transparent)] pub struct Transparent32(u32); #[no_mangle] pub fn make_transparent(x: u32) -> Transparent32 { let a = Transparent32(x); a } ``` on nightly we produce <https://rust.godbolt.org/z/zcvoM79ae> ```llvm define noundef i32 `@make_transparent(i32` noundef %x) unnamed_addr #0 { %a = alloca i32, align 4 store i32 %x, ptr %a, align 4 %0 = load i32, ptr %a, align 4, !noundef !3 ret i32 %0 } ``` but after this PR we produce ```llvm define noundef i32 `@make_transparent(i32` noundef %x) unnamed_addr #0 { start: ret i32 %x } ``` (even before the optimizer runs).
This comment has been minimized.
This comment has been minimized.
dd8ffd0
to
7cb89e3
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Avoid `alloca`s in codegen for simple pairs and simple transparent structs Even something simple like constructing a ```rust #[repr(transparent)] struct Foo(u32); ``` forces an `alloca` to be generated in nightly right now. Certainly LLVM can optimize that away, but it would be nice if it didn't have to. Quick example: ```rust #[repr(transparent)] pub struct Transparent32(u32); #[no_mangle] pub fn make_transparent(x: u32) -> Transparent32 { let a = Transparent32(x); a } ``` on nightly we produce <https://rust.godbolt.org/z/zcvoM79ae> ```llvm define noundef i32 `@make_transparent(i32` noundef %x) unnamed_addr #0 { %a = alloca i32, align 4 store i32 %x, ptr %a, align 4 %0 = load i32, ptr %a, align 4, !noundef !3 ret i32 %0 } ``` but after this PR we produce ```llvm define noundef i32 `@make_transparent(i32` noundef %x) unnamed_addr #0 { start: ret i32 %x } ``` (even before the optimizer runs).
r? compiler |
☀️ Try build successful - checks-actions |
1 similar comment
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment has been minimized.
This comment has been minimized.
// According to `rvalue_creates_operand`, only ZST | ||
// aggregate rvalues are allowed to be operands. | ||
// repat rvalues are allowed to be operands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repeat
let ty = self.monomorphize(ty); | ||
let layout = self.cx.layout_of(self.monomorphize(ty)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is second call to monomorphize
required?
This comment was marked as outdated.
This comment was marked as outdated.
(I've seen regex and wg-grammar being noisy in other PRs) |
This comment has been minimized.
This comment has been minimized.
693b7bf
to
fb212e8
Compare
@bors try @rust-timer queue |
b104691
to
c38f75c
Compare
@matthewjasper just wanted to make sure you have a notification for this review since the |
@bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (6e1d947): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 673.906s -> 673.586s (-0.05%) |
Nice binary size reductions. They are almost entirely in debug builds, indicating that "Certainly LLVM can optimize that away" is correct but that only happens in opt builds :) |
Yeah; I was hoping that LLVM not needing to optimize them away would be more of a perf win in opt, but I'll take instruction-neutral improvements to debug build perf too :) |
std::cell::Cell::new(b) | ||
} | ||
|
||
// CHECK-LABLE: { i8, i16 } @make_cell_of_bool_and_short(i1 noundef zeroext %b, i16 noundef %s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo here: "CHECK LABLE" should be "LABEL". The test is presumably not working properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good eye; thank you. I've added a fix for that to #124999
fix few typos in filecheck annotations Inspired by rust-lang#123886 (comment) `rg -g '*.rs' '//\s+?[\w-]+(-[\w]+):' -r '$1' -oNI | sort -u` Should https://llvm.org/docs/CommandGuide/FileCheck.html#cmdoption-FileCheck-ignore-case be used for case-insensetive match for filecheck?
fix few typos in filecheck annotations Inspired by rust-lang/rust#123886 (comment) `rg -g '*.rs' '//\s+?[\w-]+(-[\w]+):' -r '$1' -oNI | sort -u` Should https://llvm.org/docs/CommandGuide/FileCheck.html#cmdoption-FileCheck-ignore-case be used for case-insensetive match for filecheck?
Unify `Rvalue::Aggregate` paths in cg_ssa In rust-lang#123840 and rust-lang#123886 I added two different codepaths for `Rvalue::Aggregate` in `cg_ssa`. This merges them into one, since raw pointers are also immediates that can be built from the immediates of their "fields".
Nice binary size reductions. The regressions are limited to a single benchmark ( @rustbot label: +perf-regression-triaged |
The core idea here is to remove the abstraction penalty of simple newtypes in codegen.
Even something simple like constructing a
forces an
alloca
to be generated in nightly right now.Certainly LLVM can optimize that away, but it would be nice if it didn't have to.
Quick example:
on nightly we produce https://rust.godbolt.org/z/zcvoM79ae
but after this PR we produce
(even before the optimizer runs).