-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Destination register for x86 bsf
and bsr
should always be initialized
#129659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-backend-x86 Author: None (CatsAreFluffy)
The x86 instructions `bsf` and `bsr` (which are used to implement `countl_zero` and `countr_zero` respectively on targets not supporting BMI1) have a dependency on their output register. Currently, LLVM does not do anything to break this dependency, which can lead to strange slowdowns depending on the whims of register allocation. For example, in [this Rust program](https://godbolt.org/z/45axenMjd), on the baseline target "New isqrt" is much slower than "Stdlib isqrt", while on a modern target "New isqrt" is much faster than "Stdlib isqrt", since on the baseline target "New isqrt" has a damaging `bsr`-induced dependency. This dependency should be broken by initializing the output register before `bsr` or `bsf`. (LLVM already does this on 64-bit targets when the argument is not known to be nonzero as of #123623, but the dependency still exists even in other cases.)
|
Are there CPUs that are affected worse by this? |
rust-lang/rust#137786 (comment) has one such example. The missed optimization described here may not account for the entire difference between the two benchmarks though. |
We generate |
Yes, |
Thanks, I didn't realize that the return value between bsr and lzcnt is inverted. |
The x86 instructions
bsf
andbsr
(which are used to implementcountl_zero
andcountr_zero
respectively on targets not supporting BMI1) have a dependency on their output register. Currently, LLVM does not do anything to break this dependency, which can lead to strange slowdowns depending on the whims of register allocation. For example, in this Rust program, on the baseline target "New isqrt" is much slower than "Stdlib isqrt", while on a modern target "New isqrt" is much faster than "Stdlib isqrt", since on the baseline target "New isqrt" has a damagingbsr
-induced dependency. This dependency should be broken by initializing the output register beforebsr
orbsf
. (LLVM already does this on 64-bit targets when the argument is not known to be nonzero as of #123623, but the dependency still exists even in other cases.)The text was updated successfully, but these errors were encountered: