-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seemingly inefficient code generated to forward a parameter to a function #22891
Comments
I'm currently working on a PR that makes use of LLVM's subq $24, %rsp
.Ltmp4:
.cfi_def_cfa_offset 32
movaps 32(%rsp), %xmm0
movups %xmm0, (%rsp)
callq _ZN3bar20h6b5879ee29f32fdeeaaE@PLT
movaps 32(%rsp), %xmm0
movups %xmm0, (%rsp)
callq _ZN3bar20h6b5879ee29f32fdeeaaE@PLT
addq $24, %rsp
retq And some other improvements also happen, I'll detail them with the PR, once I get the last few bugs fixed. In a later step, we could definitely look into additional improvements like just dropping the |
Good to know. Does dropping |
Since the |
The |
As mentioned in a recent Discourse thread - why does it have to go through a double indirection (pointer to fat pointer to string data) at all, whether or not that can be optimized in forwarding situations? Normally a small structure passed by value would be placed in registers. |
I'm working on a patch that does that (pass fat pointers in registers), and a first iteration shows good results, but I'm still passing the fat pointers as a single struct |
Why isn't it a good idea? I think ABIs are normally good at dealing with structs passed by value, although ARM has a hazard in the case of small structs returned by value (it's slower than returning an equivalently sized scalar). |
The rust ABI isn't good at it, yet. That's why I'm working on the patch to change it. The thing with the first iteration of my patch is that it translates |
LLVM 'happening to pass' (i.e. must pass using standard ABIs on most architectures) structs like multiple values is what I meant by ABIs being good at it. Quite interesting that it slows down LLVM, though. |
Oh, now I see. No, LLVM doesn't automatically handle ABI compliance. It only handles calling conventions. For example, we used to pass a struct consisting of 4 u8 values (think RGBA) directly as the struct type. LLVM did pass the 4 values as 4 individual arguments. But the x86_64 SysV ABI (and by now the rust ABI, too) wants that struct to be passed as a single u32 value. And it's the frontend's (i.e. our) job to handle that. |
Oh... wow, I'm really surprised by that. I'd love to know what motivated that decision, considering how close LLVM IR is semantically to C. |
This has a number of advantages compared to creating a copy in memory and passing a pointer. The obvious one is that we don't have to put the data into memory but can keep it in registers. Since we're currently passing a pointer anyway (instead of using e.g. a known offset on the stack, which is what the `byval` attribute would achieve), we only use a single additional register for each fat pointer, but save at least two pointers worth of stack in exchange (sometimes more because more than one copy gets eliminated). On archs that pass arguments on the stack, we save a pointer worth of stack even without considering the omitted copies. Additionally, LLVM can optimize the code a lot better, to a large degree due to the fact that lots of copies are gone or can be optimized away. Additionally, we can now emit attributes like nonnull on the data and/or vtable pointers contained in the fat pointer, potentially allowing for even more optimizations. This results in LLVM passes being about 3-7% faster (depending on the crate), and the resulting code is also a few percent smaller, for example: text data filename 5671479 3941461 before/librustc-d8ace771.so 5447663 3905745 after/librustc-d8ace771.so 1944425 2394024 before/libstd-d8ace771.so 1896769 2387610 after/libstd-d8ace771.so I had to remove a call in the backtrace-debuginfo test, because LLVM can now merge the tails of some blocks when optimizations are turned on, which can't correctly preserve line info. Fixes rust-lang#22924 Cc rust-lang#22891 (at least for fat pointers the code is good now)
This has a number of advantages compared to creating a copy in memory and passing a pointer. The obvious one is that we don't have to put the data into memory but can keep it in registers. Since we're currently passing a pointer anyway (instead of using e.g. a known offset on the stack, which is what the `byval` attribute would achieve), we only use a single additional register for each fat pointer, but save at least two pointers worth of stack in exchange (sometimes more because more than one copy gets eliminated). On archs that pass arguments on the stack, we save a pointer worth of stack even without considering the omitted copies. Additionally, LLVM can optimize the code a lot better, to a large degree due to the fact that lots of copies are gone or can be optimized away. Additionally, we can now emit attributes like nonnull on the data and/or vtable pointers contained in the fat pointer, potentially allowing for even more optimizations. This results in LLVM passes being about 3-7% faster (depending on the crate), and the resulting code is also a few percent smaller, for example: |text|data|filename| |----|----|--------| |5671479|3941461|before/librustc-d8ace771.so| |5447663|3905745|after/librustc-d8ace771.so| | | | | |1944425|2394024|before/libstd-d8ace771.so| |1896769|2387610|after/libstd-d8ace771.so| I had to remove a call in the backtrace-debuginfo test, because LLVM can now merge the tails of some blocks when optimizations are turned on, which can't correctly preserve line info. Fixes #22924 Cc #22891 (at least for fat pointers the code is good now)
Triage: it's not totally clear to me if this ticket is still valid. A lot has changed between now and then, and I get very different code, but I'm not sure if that's from all the other stuff that's gone on since pre-1.0. |
Yeah as per the above it looks like this is probably not valid anymore, or at least I can't say either way -- closing. If that's not the case, please reopen; I apologize ahead of time! |
The generated code for passing arguments larger than a machine word looks inefficient.
Test case:
On x86_64-unknown-linux-gnu, compiling with
rustc test.rs -O -C no-stack-check --crate-type dylib --emit asm
, I see this code forfoo
:foo
receives the address of the&str
in%rdi
. It copies it into a new stack location for each call, then passes the address of that location tobar
.Could
foo
forward the address of the&str
along without making stack copies?If I remove one of the
bar
calls fromfoo
, then the function also ought to become a tail call, but it doesn't. Tail call optimization does occur if I replace the&str
types with&&str
.The calling convention for passing
&str
(and other arguments larger than a machine word?) seems to be:i.e. We seem to be passing values both by-value and by-reference.
With the current convention, I think we could get smaller code by eliding some of the copies. If the copies were instead immutable, I think we could elide more copies.
Compiler version:
The text was updated successfully, but these errors were encountered: