-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc fails to remove dead code in exactly one instance of multiple similar code #48627
Comments
triage: P-medium Seems curious and worthy of fixing, but not a P-high problem. @glandium do you believe this optimization (or lack of it) is affecting other things? |
Okay, so I confirmed both this and #48253 have the same regression range, which is the last days before 1.23 made it to beta. 1.23.0.beta.2 exhibits both regressions, and stable-1.23 doesn't. So this would be a regression from #45225, and #48253 is a dupe. |
So, to see whether this had an impact on actual code, I tried with some work-in-progress code to replace https://dxr.mozilla.org/mozilla-central/source/memory/replace/logalloc/ which happened to be much slower when compiled with 1.24 compared to 1.23 (670ms vs 550ms to process 1.3M lines). So I tried the nightly before and after #45225, and ... they both were fast. Which is a good news of some sort, since despite it definitely affecting codegen as demonstrated, it didn't seem to have a real impact. At least not on my code. However, tracking it down, I identified the regression came from #46623, which is related. That one seems to have made std::fmt::Write functions slower, as well as things related to Option and Result, and probably more. I'm not sure whether I should open a separate issue. |
My initial guess is that LLVM is very finicky and sensitive to our exact enum representation. Can we get LLVM IR for the relevant example & version combinations? |
LLVR IR for both the testcase here and the one in #48253, with nightly-2017-11-20: with nightly-2017-11-21: Those testcases don't seem to have been affected by #46623, so I'll try to reduce something from my code that got 20% slower. At least, the LLVM IR is clear: it's not LLVM's doing. |
Here's a reduced example that is not affected by #45225 but is by #46623: #[inline(never)]
fn parse_hex<'a>(input: &'a [u8]) -> Option<usize> {
let mut s = input;
let mut result: Option<usize> = None;
loop {
if let Some((&c, remainder)) = s.split_first() {
let d = match c {
d @ b'0'...b'9' => d - b'0',
d @ b'a'...b'f' => d + 10 - b'a',
d @ b'A'...b'F' => d + 10 - b'A',
_ => break,
};
result = Some(match result {
Some(r) => r.checked_mul(16usize)
.and_then(|r| r.checked_add(d as usize))?,
None => d as usize,
});
s = remainder;
} else {
break;
}
}
result
}
fn main() {
for _ in 1..10000000 {
match parse_hex("85af342b1".as_bytes()) {
Some(_) => continue,
_ => break,
}
}
} 150ms with nightly-2017-12-15 vs 260ms with nightly-2017-12-16. Note that the inline(never) and the stupid loop in main are there to ensure the compiler doesn't eliminate dead code, and to reduce the IR size. Using a loop that generates random or sequential numbers to parse, with no inline(never) shows a perf difference too. LLVM IR with nightly-2017-12-15: with nightly-2017-12-16: |
And just in case, the LLVM IR with rustc built from the commit just before the merge of #46623, which is only trivially different from that of nightly-2017-12-15: |
Thank you! My suspicion was correct: @_ZN4test5FUNCS17h4349eb3b1be595e8E = local_unnamed_addr constant { [0 x i8], {}*, [0 x i8], {}*, [0 x i8], {}*, [0 x i8] } { [0 x i8] undef, {}* null, [0 x i8] undef, {}* null, [0 x i8] undef, {}* null, [0 x i8] undef }, align 8
; ...
%0 = load i8*, i8** bitcast ({ [0 x i8], {}*, [0 x i8], {}*, [0 x i8], {}*, [0 x i8] }* @_ZN4test5FUNCS17h4349eb3b1be595e8E to i8**), align 8
%1 = icmp eq i8* %0, null LLVM should constant-fold these two and know that |
After a quick skim of the constant folding helper functions I can't see what causes this, but I have minimized the issue, it appears the problem are the |
I've submitted https://reviews.llvm.org/D55169 to fix this issue. |
Struct types may have leading zero-size elements like [0 x i32], in which case the "real" element at offset 0 will not necessarily coincide with the 0th element of the aggregate. ConstantFoldLoadThroughBitcast() wants to drill down the element at offset 0, but currently always picks the 0th aggregate element to do so. This patch changes the code to find the first non-zero-size element instead, for the struct case. The motivation behind this change is rust-lang/rust#48627. Rust is fond of emitting [0 x iN] separators between struct elements to enforce alignment, which prevents constant folding in this particular case. The additional tests with [4294967295 x [0 x i32]] check that we don't end up unnecessarily looping over a large number of zero-size elements of a zero-size array. Differential Revision: https://reviews.llvm.org/D55169 llvm-svn=348895
Struct types may have leading zero-size elements like [0 x i32], in which case the "real" element at offset 0 will not necessarily coincide with the 0th element of the aggregate. ConstantFoldLoadThroughBitcast() wants to drill down the element at offset 0, but currently always picks the 0th aggregate element to do so. This patch changes the code to find the first non-zero-size element instead, for the struct case. The motivation behind this change is rust-lang/rust#48627. Rust is fond of emitting [0 x iN] separators between struct elements to enforce alignment, which prevents constant folding in this particular case. The additional tests with [4294967295 x [0 x i32]] check that we don't end up unnecessarily looping over a large number of zero-size elements of a zero-size array. Differential Revision: https://reviews.llvm.org/D55169 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348895 91177308-0d34-0410-b5e6-96231b3b80d8
Patch landed upstream. |
The patch has landed in Rust's LLVM. |
The original issue here is fixed, but LLVM is still doing something slightly weird here with the four consecutive ud2 instructions. IR for reference: https://rust.godbolt.org/z/5vRhI1 |
This is long since resolved, and the redundant ud2s are no longer present either. |
I made a mistake in some code, which essentially made everything statically panic. Which is fine. The problem is that in exactly one instance of the same kind of code, rustc actually failed to remove the non-panicking branch. Note this /might/ be a duplicate of #48253. At least, similarly to #48253, it appears to be a regression in 1.24.0, according to godbolt.
A copy/pastable-to-godbolt and reduced version of the mistaken code is:
This generates the following assembly (skipping the data fields):
Note how
calloc
andrealloc
are properly just directly panicking because FUNCS is immutable and initialized with empty function pointers, leading to unwrap always panicking, but the same doesn't happen formalloc
. Interestingly, removingrealloc
makesmalloc
optimized the same way as the others.The code generated by 1.23.0 was:
The text was updated successfully, but these errors were encountered: