-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
println!() prevents optimization by capturing pointers #50519
Comments
In this reduced example, minss is generated for both cases since rust 1.25: https://godbolt.org/z/wz8Kmk Replacing the return with println brings the problem back. |
Okay, the relevant difference that Taking the address prevents the conversion of lowest from an alloca into an SSA value, and that's going to inhibit lots of optimizations (including the select formation desired here). The good news is that this is probably not going to affect real code much, though I am concerned about cases where you have conditional debugging code that includes formatting. Two ways this could be fixed:
|
@rkruppe @nagisa @eddyb Any idea what we can do here? I think it's pretty bad that It would be great if we could force a copy of the formatted value before taking the pointer, but I'm not sure how to do that on a technical level. We'd only want to do this for specific types (integers and floats), but println! is expanded long before this type information is available. |
If changing how |
The formatting machinery has been specifically crafted to minimize the size rather than increase the speed (desired for panics), which will eventually come at some cost somewhere, which is what we are seeing here. If we can find ways to improve |
Any news here? I'm getting similar things where passing expressions to |
I don't think inlining _print function help as that function depends on print_to: rust/library/std/src/io/stdio.rs Lines 997 to 999 in 8e73853
|
Sorry, I meant inlining all the way down the call graph. |
Ran into the same issue. In the code below, Link: https://godbolt.org/z/4Yxbf1dox const N: usize = 20_000_000;
pub fn sieve(vis: &mut [bool; N + 1]) {
let mut i = 2;
let mut count = 0;
while i * i <= N {
if vis[i] {
count += 1; // This writes to stack every time.
let mut k = 2 * i;
while k <= N {
vis[k] = true;
k += i;
}
}
i += 1;
}
// This loop can be optimized into SIMD when using walk-around below.
while i <= N {
if vis[i] {
count += 1; // This writes to stack every time.
}
i += 1;
}
// let count = count; // Uncomment this line to walk-around this issue.
println!("{}", count);
} |
(FWIW even doing |
@oxalica Interesting case! I would have expected scalar promotion in LICM to sink the store outside the loop for that example. Unfortunately it ends up running afoul of this thread safety check: https://github.com/llvm/llvm-project/blob/64c24f493e5f4637ee193f10f469cdd2695b4ba6/llvm/lib/Transforms/Scalar/LICM.cpp#L2196-L2198 Because you only store count conditionally, there is no guaranteed write on each iteration, and because printf captures it, the pointer may leak to a different thread. |
I think scalar store promotion candidates are sufficiently rare, and the resulting optimization sufficiently valuable, that doing a loop header reachability capture check is probably worth the compile-time cost in this case. |
https://reviews.llvm.org/D100706 for what I mentioned above. This should help any cases that involve a println after a loop. |
Upstream commit: llvm/llvm-project@d440f9a |
This weekend I ran some benchmarks on some of my code. After making a seemingly insignificant code change I noticed a small, but measurable performance regression. After investigating the generated assembly, I stumbled upon a case, where the compiler emits code that is not optimal.
This minimal example shows the same behaviour (Playground link):
When compiling with the
--release
flag, the compiler generates the following instructions for the marked block:However, if I replace those lines with the following:
the compiler emits a strange series of float compare and jump instructions:
As a comparison, both gcc and clang can optimize a similar C++ example:
Both compilers generate
minss
instructions for both variants.(Godbolt)
I wasn't sure whether rustc or LLVM were responsible for this behaviour, however after a quick glance at the generated LLVM IR, I'm tending towards rustc, since in the first case it emits
fcmp
andselect
instructions, while in the latter it generatesfcmp
andbr
.What do you think?
The text was updated successfully, but these errors were encountered: