-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LTO triggers apparent miscompilation on Windows 10 x64 #74498
Comments
@rustbot ping llvm |
Hey LLVM ICE-breakers! This bug has been identified as a good cc @camelid @comex @cuviper @DutchGhost @hanna-kruppe @hdhoang @heyrutvik @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique @vgxbj |
@rustbot ping windows |
Hey Windows Group! This bug has been identified as a good "Windows candidate". cc @arlosi @danielframpton @gdr-at-ms @kennykerr @luqmana @lzybkr @retep998 @rylev @sivadeilra |
Works fine on Windows-gnu:
With MSVC it endlessly allocates memory. @rustbot modify labels: -O-windows +O-windows-msvc |
A bit smaller, use std::io::Write;
fn main() {
let mut out = Vec::new();
for val in 0u8..=255 {
out.write_all(&[val, val]).unwrap();
}
out.write_all(&[0]).unwrap();
} [profile.release]
lto = true
codegen-units = 1 |
Another datapoint: adding That flag changed from default true to false about a year ago which kinda lines up with the repro chart above (minus 1.43.1). There was an LLVM bug about that causing some miscompilations last I checked. cc @nikic who I believe worked on that upstream. |
@luqmana Nowrap handling in LFTR post-inc canonicalization was fixed when the CVP flag was enabled, so this is most likely a new issue. Can you share how the optimized IR looks like on Windows? In particular it would be interesting to see whether it already infinitely loops there, or whether this goes wrong in the backend (e.g. in LSR). |
Still fails on my end, unless I'm doing something wrong. |
Assigning |
Reproduced on LLVM master and simplified: Run
The icmp gets changed to comparing the output of the 8-bit addition to 0, even though it will be |
Heh, so this is indeed the same as the LFTR bug, just for LSR. Surprised it took this long to surface. |
Thanks @comex, that's a lot simpler than the repro I ended up getting down to and it also demonstrates it's not windows specific. |
I'm no windows expert, but you might be doing something wrong... Or at least, here is my local transcript:
Having said that, I did see some evidence in some other runs that maybe setting this flag doesn't resolve the issue in all cases. I haven't gotten to the bottom of those observations yet; I just wanted to note that I had observed some sensitivity to the flag. |
@pnkfelix I believe the arg is being set correctly, because:
If RUSTFLAGS wasn't being set properly, then the garbage arg wouldn't cause an issue. Above is using Powershell; I also tried using cmd using your commands:
and received the same failure. I see you are using Rust 1.44.1 in your example. I tried using that arg on Rust 1.44.1 and it does indeed cause a successful run. On 1.45, however, it fails with or without the arg. (of course, there may be some other thing causing the run to succeed/fail; this is way outside my knowledge base) |
Here's the reduced IR of the minimal rust code here: #74498 (comment), a bit simplified and with some renaming target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.26.28806"
define i32 @main() unnamed_addr {
entry:
%proc_heap = tail call i8* @GetProcessHeap()
%vec_ptr0 = tail call i8* @HeapAlloc(i8* %proc_heap, i32 0, i64 8)
br label %loop_hdr
loop_hdr: ; preds = %loop_write, %entry
%vec_ptr1 = phi i8* [ %vec_ptr0, %entry ], [ %vec_ptr2, %loop_write ]
%vec_cap1 = phi i64 [ 8, %entry ], [ %new_vec_cap1, %loop_write ]
%vec_len1 = phi i64 [ 0, %entry ], [ %new_vec_len1, %loop_write ]
%val = phi i8 [ 0, %entry ], [ %1, %loop_write ]
%0 = icmp eq i8 %val, 255
%1 = add nuw i8 %val, 1
%2 = sub i64 %vec_cap1, %vec_len1
%3 = icmp ult i64 %2, 2
br i1 %3, label %check_realloc, label %pre_loop_write
check_realloc: ; preds = %loop_hdr
%len_plus_two_checked = tail call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %vec_len1, i64 2)
%checked_res = extractvalue { i64, i1 } %len_plus_two_checked, 1
br i1 %checked_res, label %exit, label %realloc
realloc: ; preds = %check_realloc
%len_plus_two = extractvalue { i64, i1 } %len_plus_two_checked, 0
%cap_times_two = shl i64 %vec_cap1, 1
%4 = icmp ugt i64 %cap_times_two, %len_plus_two
%new_cap = select i1 %4, i64 %cap_times_two, i64 %len_plus_two
%new_vec_ptr = tail call i8* @HeapReAlloc(i8* %proc_heap, i32 0, i8* %vec_ptr1, i64 %new_cap)
br label %loop_write
pre_loop_write: ; preds = %loop_hdr
%new_vec_len2 = add i64 %vec_len1, 2
br label %loop_write
loop_write: ; preds = %realloc, %pre_loop_write
%new_vec_len1 = phi i64 [ %new_vec_len2, %pre_loop_write ], [ %len_plus_two, %realloc ]
%vec_ptr2 = phi i8* [ %vec_ptr1, %pre_loop_write ], [ %new_vec_ptr, %realloc ]
%new_vec_cap1 = phi i64 [ %vec_cap1, %pre_loop_write ], [ %new_cap, %realloc ]
%vec_store_slot1 = getelementptr inbounds i8, i8* %vec_ptr2, i64 %vec_len1
store i8 %val, i8* %vec_store_slot1, align 1
%vec_store_slot2 = getelementptr inbounds i8, i8* %vec_store_slot1, i64 1
store i8 %val, i8* %vec_store_slot2, align 1
br i1 %0, label %exit, label %loop_hdr
exit: ; preds = %loop_write, %check_realloc
%exit_code = phi i32 [ 0, %loop_write ], [ 1, %check_realloc ]
ret i32 %exit_code
}
declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64)
declare i8* @GetProcessHeap() unnamed_addr
declare i8* @HeapAlloc(i8*, i32, i64) unnamed_addr
declare i8* @HeapReAlloc(i8*, i32, i8*, i64) unnamed_addr Running this through Running it through loop_hdr:
[...]
%0 = add nuw i8 %val, 1
[...]
loop_write:
[...]
%4 = icmp eq i8 %0, 0
br i1 %4, label %exit, label %loop_hdr This is the same issue @comex mentioned above. LLVM should be clearing the no wrap flag on the add there after transforming to a post-increment check but it doesn't. Then later it just gets rid of the exit condition altogether since it believes that the add will never wrap, it can't possibly be equal to 0. But that doesn't explain why the flag doesn't seem to work post 1.44, looking at the IR (with 1.44.1: %_5.0.i.i.i = add i8 %iter.sroa.0.060, 1 1.45+: %7 = add nuw i8 %iter.sroa.0.063, 1 So in 1.45+ we're not relying on LLVM's CVP to determine that the add has no unsigned wrap but some other mechanism. But after looking through print-after-all for a bit I'm pretty sure this new line in rust/library/core/src/iter/range.rs Line 507 in 98efae8
That boils down to an Since that code was introduced in #69659 that lines up with the behaviour we see in 1.45+. Either way, still an LLVM bug that's just triggered more now. |
thank you for that excellent analysis @luqmana |
Filed https://bugs.llvm.org/show_bug.cgi?id=46943 with LLVM upstream. |
The only remaining question on our end is if we should, for the short term, change that (That, or try to figure out a patch to LLVM to fix the bug on their end.) |
To be clear: my thinking is that we could change the That way, we shouldn't generate any overflow checking code, so I wouldn't expect any significant performance hit. At the same time, I'm not really a big fan of putting in such narrowly targetted hacks to work around LLVM bugs, especially ones that have been confirmed to be bugs on LLVM's end. |
(having said that, this bug does seem to serve as evidence that our code is probably executing an overflowing addition via
I'm going to go in and confirm that this add may indeed be overflowing, and if it is, I probably will push for changing to Update: okay, its hard to see how this could possibly be causing an overflowing addition under Rust's machine model: if self.start < self.end {
// SAFETY: just checked precondition
let n = unsafe { Step::forward_unchecked(self.start.clone(), 1) };
...
} |
Downgrading priority to |
I have a patch that fixes (what I believe to be) the underlying LLVM bug; see https://bugs.llvm.org/show_bug.cgi?id=46943 ; I'm working on making a LLVM unit test and then I'll put up a patch for the llvm-devs to review. |
Now working on nightly 2021-03-04, fixed by the LLVM 12 upgrade. |
I tried this code:
Cargo.toml:
main.rs:
(extremely minimized form of grayscale BMP encoding from
image
crate)Command:
cargo run --release
I expected to see this happen:
1025 bytes written
Instead, this happened:
Meta
rustc --version --verbose
:It appears to work fine on ARM Linux in my limited testing.
This minimized form works/doesn't work on the following versions:
If the LTO and codegen-units settings are removed it works on all above versions.
I originally noticed this when a project that worked on 1.44.1 suddenly started failing with huge memory allocations on 1.45, so this table specifically applies to this repro, not whatever the underlying bug is.
The text was updated successfully, but these errors were encountered: