-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiler fails to optimize out bounds checks on 32-bit x86 and ARM #22924
Comments
The problem goes away if I stick this at the start of my_write: let dst = dst; |
kmcallister
added
I-slow
Issue: Problems and improvements with respect to performance of generated code.
A-codegen
Area: Code generation
labels
Mar 2, 2015
http://llvm.org/bugs/show_bug.cgi?id=22786 We could work around this by special casing slices and copying them by doing two loads/stores. Not sure if it's worth it. |
@dotdash should obviously be fixed upstream, will help more than just use I assume. |
dotdash
added a commit
to dotdash/rust
that referenced
this issue
Jun 20, 2015
This has a number of advantages compared to creating a copy in memory and passing a pointer. The obvious one is that we don't have to put the data into memory but can keep it in registers. Since we're currently passing a pointer anyway (instead of using e.g. a known offset on the stack, which is what the `byval` attribute would achieve), we only use a single additional register for each fat pointer, but save at least two pointers worth of stack in exchange (sometimes more because more than one copy gets eliminated). On archs that pass arguments on the stack, we save a pointer worth of stack even without considering the omitted copies. Additionally, LLVM can optimize the code a lot better, to a large degree due to the fact that lots of copies are gone or can be optimized away. Additionally, we can now emit attributes like nonnull on the data and/or vtable pointers contained in the fat pointer, potentially allowing for even more optimizations. This results in LLVM passes being about 3-7% faster (depending on the crate), and the resulting code is also a few percent smaller, for example: text data filename 5671479 3941461 before/librustc-d8ace771.so 5447663 3905745 after/librustc-d8ace771.so 1944425 2394024 before/libstd-d8ace771.so 1896769 2387610 after/libstd-d8ace771.so I had to remove a call in the backtrace-debuginfo test, because LLVM can now merge the tails of some blocks when optimizations are turned on, which can't correctly preserve line info. Fixes rust-lang#22924 Cc rust-lang#22891 (at least for fat pointers the code is good now)
bors
added a commit
that referenced
this issue
Jun 20, 2015
This has a number of advantages compared to creating a copy in memory and passing a pointer. The obvious one is that we don't have to put the data into memory but can keep it in registers. Since we're currently passing a pointer anyway (instead of using e.g. a known offset on the stack, which is what the `byval` attribute would achieve), we only use a single additional register for each fat pointer, but save at least two pointers worth of stack in exchange (sometimes more because more than one copy gets eliminated). On archs that pass arguments on the stack, we save a pointer worth of stack even without considering the omitted copies. Additionally, LLVM can optimize the code a lot better, to a large degree due to the fact that lots of copies are gone or can be optimized away. Additionally, we can now emit attributes like nonnull on the data and/or vtable pointers contained in the fat pointer, potentially allowing for even more optimizations. This results in LLVM passes being about 3-7% faster (depending on the crate), and the resulting code is also a few percent smaller, for example: |text|data|filename| |----|----|--------| |5671479|3941461|before/librustc-d8ace771.so| |5447663|3905745|after/librustc-d8ace771.so| | | | | |1944425|2394024|before/libstd-d8ace771.so| |1896769|2387610|after/libstd-d8ace771.so| I had to remove a call in the backtrace-debuginfo test, because LLVM can now merge the tails of some blocks when optimizations are turned on, which can't correctly preserve line info. Fixes #22924 Cc #22891 (at least for fat pointers the code is good now)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Test case:
The 32-bit x86 and ARM disassembly for
my_write
includes two checks of the slice length and a call topanic_bounds_check
. The bounds check is optimized away on x86_64.Here's the x86 assembly:
LLVM bitcode:
Compiler version:
My ARM compiler is older -- 2015-02-18 or so.
The text was updated successfully, but these errors were encountered: