Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LTO triggers apparent miscompilation on Windows 10 x64 #74498

Closed
AlyoshaVasilieva opened this issue Jul 19, 2020 · 24 comments
Closed

LTO triggers apparent miscompilation on Windows 10 x64 #74498

AlyoshaVasilieva opened this issue Jul 19, 2020 · 24 comments
Assignees
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-LTO Area: Link-time optimization (LTO) C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@AlyoshaVasilieva
Copy link

I tried this code:

Cargo.toml:

[package]
name = "bmp-minimal"
version = "0.1.0"
authors = ["Malloc Voidstar <1284317+AlyoshaVasilieva@users.noreply.github.com>"]
edition = "2018"

[dependencies]

[profile.release]
lto = true
codegen-units = 1

main.rs:

use std::io::Write;

fn main() {
    let mut out = Vec::with_capacity(51200);
    for val in 0u8..=255 {
        out.write_all(&[val, val, val, 0]).unwrap();
    }

    for _ in (0..1).rev() {
        for _ in 0..1 {
            out.write_all(&[0]).unwrap();
        }
    }
    println!("{} bytes written", out.len());
}

(extremely minimized form of grayscale BMP encoding from image crate)

Command: cargo run --release

I expected to see this happen: 1025 bytes written

Instead, this happened:

memory allocation of 26843545600 bytes failederror: process didn't exit successfully: `target\release\bmp-minimal.exe` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)

Meta

rustc --version --verbose:

rustc 1.45.0 (5c1f21c3b 2020-07-13)
binary: rustc
commit-hash: 5c1f21c3b82297671ad3ae1e8c942d2ca92e84f2
commit-date: 2020-07-13
host: x86_64-pc-windows-msvc
release: 1.45.0
LLVM version: 10.0

It appears to work fine on ARM Linux in my limited testing.

This minimized form works/doesn't work on the following versions:

Rust version Works?
1.45 (LLVM 10.0) No
1.44.1 No
1.43.1 Yes
1.42.0 No
1.41.1 No
1.40.0 No
1.39.0 No
1.38.0 (LLVM 9.0) No
1.37.0 Yes
1.36.0 Yes
1.35.0 Yes
1.34.0 Yes

If the LTO and codegen-units settings are removed it works on all above versions.

I originally noticed this when a project that worked on 1.44.1 suddenly started failing with huge memory allocations on 1.45, so this table specifically applies to this repro, not whatever the underlying bug is.

@AlyoshaVasilieva AlyoshaVasilieva added the C-bug Category: This is a bug. label Jul 19, 2020
@retep998 retep998 added the A-codegen Area: Code generation label Jul 19, 2020
@jonas-schievink jonas-schievink added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 19, 2020
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jul 19, 2020
@LeSeulArtichaut
Copy link
Contributor

@rustbot ping llvm

@rustbot
Copy link
Collaborator

rustbot commented Jul 19, 2020

Hey LLVM ICE-breakers! This bug has been identified as a good
"LLVM ICE-breaking candidate". In case it's useful, here are some
instructions for tackling these sorts of bugs. Maybe take a look?
Thanks! <3

cc @camelid @comex @cuviper @DutchGhost @hanna-kruppe @hdhoang @heyrutvik @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique @vgxbj

@rustbot rustbot added the ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group label Jul 19, 2020
@LeSeulArtichaut
Copy link
Contributor

LeSeulArtichaut commented Jul 19, 2020

It appears to work fine on ARM Linux in my limited testing.

@rustbot ping windows

@rustbot rustbot added the O-windows Operating system: Windows label Jul 19, 2020
@rustbot
Copy link
Collaborator

rustbot commented Jul 19, 2020

Hey Windows Group! This bug has been identified as a good "Windows candidate".
In case it's useful, here are some instructions for tackling these sorts of
bugs. Maybe take a look?
Thanks! <3

cc @arlosi @danielframpton @gdr-at-ms @kennykerr @luqmana @lzybkr @retep998 @rylev @sivadeilra

@mati865
Copy link
Contributor

mati865 commented Jul 19, 2020

Works fine on Windows-gnu:

$ cargo run --release
   Compiling bmp-minimal v0.1.0 (C:\Users\mateusz\AppData\Local\Temp\tmp)
    Finished release [optimized] target(s) in 3.71s
     Running `target\release\bmp-minimal.exe`
1025 bytes written

With MSVC it endlessly allocates memory.

@rustbot modify labels: -O-windows +O-windows-msvc

@rustbot rustbot added O-windows-msvc Toolchain: MSVC, Operating system: Windows and removed O-windows Operating system: Windows labels Jul 19, 2020
@connorskees
Copy link
Contributor

A bit smaller,

use std::io::Write;

fn main() {
    let mut out = Vec::new();
    for val in 0u8..=255 {
        out.write_all(&[val, val]).unwrap();
    }

    out.write_all(&[0]).unwrap();
}
[profile.release]
lto = true
codegen-units = 1

@luqmana
Copy link
Member

luqmana commented Jul 20, 2020

Another datapoint: adding -C llvm-args=-cvp-dont-add-nowrap-flags makes it work.

That flag changed from default true to false about a year ago which kinda lines up with the repro chart above (minus 1.43.1). There was an LLVM bug about that causing some miscompilations last I checked.

cc @nikic who I believe worked on that upstream.

@nikic
Copy link
Contributor

nikic commented Jul 20, 2020

@luqmana Nowrap handling in LFTR post-inc canonicalization was fixed when the CVP flag was enabled, so this is most likely a new issue. Can you share how the optimized IR looks like on Windows? In particular it would be interesting to see whether it already infinitely loops there, or whether this goes wrong in the backend (e.g. in LSR).

@AlyoshaVasilieva
Copy link
Author

> $Env:RUSTFLAGS="-C llvm-args=-cvp-dont-add-nowrap-flags"
> cargo run --release
   Compiling bmp-minimal v0.1.0 (C:\X\bmp-minimal)
    Finished release [optimized] target(s) in 2.39s
     Running `target\release\bmp-minimal.exe`
memory allocation of 13421772800 bytes failederror: process didn't exit successfully: `target\release\bmp-minimal.exe` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)
> rustc --version
rustc 1.45.0 (5c1f21c3b 2020-07-13)

Still fails on my end, unless I'm doing something wrong.

@JohnTitor
Copy link
Member

Assigning P-critical as discussed as part of the Prioritization Working Group procedure and removing I-prioritize.

@JohnTitor JohnTitor added P-critical Critical priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jul 21, 2020
@comex
Copy link
Contributor

comex commented Jul 21, 2020

Reproduced on LLVM master and simplified: Run opt -loop-reduce -S on this:

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

declare void @dummy(i8 zeroext)

define void @test() {
  br label %1

1:
  %2 = phi i8 [ 0, %0 ], [ %4, %1 ]
  call void @dummy(i8 %2)

  %3 = icmp eq i8 %2, -1
  %4 = add nuw i8 %2, 1
  br i1 %3, label %5, label %1

5:
  ret void
}

The icmp gets changed to comparing the output of the 8-bit addition to 0, even though it will be poison if the input was 255 (because the add is marked as nuw).

@nikic
Copy link
Contributor

nikic commented Jul 21, 2020

Heh, so this is indeed the same as the LFTR bug, just for LSR. Surprised it took this long to surface.

@luqmana
Copy link
Member

luqmana commented Jul 21, 2020

Thanks @comex, that's a lot simpler than the repro I ended up getting down to and it also demonstrates it's not windows specific.

@luqmana luqmana removed the O-windows-msvc Toolchain: MSVC, Operating system: Windows label Jul 21, 2020
@spastorino spastorino added the regression-from-stable-to-stable Performance or correctness regression from one stable version to another. label Jul 22, 2020
@pnkfelix pnkfelix self-assigned this Jul 23, 2020
@pnkfelix
Copy link
Member

pnkfelix commented Jul 24, 2020

@AlyoshaVasilieva commented:

> $Env:RUSTFLAGS="-C llvm-args=-cvp-dont-add-nowrap-flags"
> cargo run --release
   Compiling bmp-minimal v0.1.0 (C:\X\bmp-minimal)
    Finished release [optimized] target(s) in 2.39s
     Running `target\release\bmp-minimal.exe`
memory allocation of 13421772800 bytes failederror: process didn't exit successfully: `target\release\bmp-minimal.exe` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)
> rustc --version
rustc 1.45.0 (5c1f21c3b 2020-07-13)

Still fails on my end, unless I'm doing something wrong.

I'm no windows expert, but you might be doing something wrong...

Or at least, here is my local transcript:

c:\Users\pfxbo\issue_74498>set RUSTFLAGS=
set RUSTFLAGS=

c:\Users\pfxbo\issue_74498>echo %RUSTFLAGS%
echo %RUSTFLAGS%
%RUSTFLAGS%

c:\Users\pfxbo\issue_74498>cargo run --release
cargo run --release
   Compiling issue_74498 v0.1.0 (C:\Users\pfxbo\issue_74498)
    Finished release [optimized] target(s) in 2.25s
     Running `target\release\issue_74498.exe`
memory allocation of 53687091200 bytes failederror: process didn't exit successfully: `target\release\issue_74498.exe` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)

c:\Users\pfxbo\issue_74498>set RUSTFLAGS=-Cllvm-args=-cvp-dont-add-nowrap-flags
set RUSTFLAGS=-Cllvm-args=-cvp-dont-add-nowrap-flags

c:\Users\pfxbo\issue_74498>echo %RUSTFLAGS%
echo %RUSTFLAGS%
-Cllvm-args=-cvp-dont-add-nowrap-flags

c:\Users\pfxbo\issue_74498>cargo run --release
cargo run --release
   Compiling issue_74498 v0.1.0 (C:\Users\pfxbo\issue_74498)
    Finished release [optimized] target(s) in 2.18s
     Running `target\release\issue_74498.exe`
1025 bytes written

c:\Users\pfxbo\issue_74498>cargo --version
cargo --version
cargo 1.44.1 (88ba85757 2020-06-11)

c:\Users\pfxbo\issue_74498>rustc --version
rustc --version
rustc 1.44.1 (c7087fe00 2020-06-17)

c:\Users\pfxbo\issue_74498>

Having said that, I did see some evidence in some other runs that maybe setting this flag doesn't resolve the issue in all cases. I haven't gotten to the bottom of those observations yet; I just wanted to note that I had observed some sensitivity to the flag.

@AlyoshaVasilieva
Copy link
Author

AlyoshaVasilieva commented Jul 24, 2020

@pnkfelix I believe the arg is being set correctly, because:

> $Env:RUSTFLAGS="-C llvm-args=-cvp-dont-add-nowrap-flags-GARBAGE-GARBAGE"
> cargo run --release
error: failed to run `rustc` to learn about target-specific information

Caused by:
  process didn't exit successfully: `rustc - --crate-name ___ --print=file-names -C llvm-args=-cvp-dont-add-nowrap-flags-GARBAGE-GARBAGE --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=cfg` (exit code: 1)
--- stderr
rustc: Unknown command line argument '-cvp-dont-add-nowrap-flags-GARBAGE-GARBAGE'.  Try: 'rustc --help'
rustc: Did you mean '--cvp-dont-add-nowrap-flags'?

> $Env:RUSTFLAGS="-C llvm-args=-cvp-dont-add-nowrap-flags"
> cargo run --release
    Finished release [optimized] target(s) in 0.07s
     Running `target\release\bmp-minimal.exe`
memory allocation of 13421772800 bytes failederror: process didn't exit successfully: `target\release\bmp-minimal.exe` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)

If RUSTFLAGS wasn't being set properly, then the garbage arg wouldn't cause an issue.

Above is using Powershell; I also tried using cmd using your commands:

>set RUSTFLAGS=-Cllvm-args=-cvp-dont-add-nowrap-flags
>echo %RUSTFLAGS%
-Cllvm-args=-cvp-dont-add-nowrap-flags
>cargo run --release
   Compiling bmp-minimal v0.1.0 (C:\X\bmp-minimal)
    Finished release [optimized] target(s) in 4.46s
     Running `target\release\bmp-minimal.exe`
memory allocation of 13421772800 bytes failederror: process didn't exit successfully: `target\release\bmp-minimal.exe` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)

and received the same failure.

I see you are using Rust 1.44.1 in your example. I tried using that arg on Rust 1.44.1 and it does indeed cause a successful run. On 1.45, however, it fails with or without the arg. (of course, there may be some other thing causing the run to succeed/fail; this is way outside my knowledge base)

@luqmana
Copy link
Member

luqmana commented Jul 25, 2020

Here's the reduced IR of the minimal rust code here: #74498 (comment), a bit simplified and with some renaming

target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.26.28806"

define i32 @main() unnamed_addr {
entry:
  %proc_heap = tail call i8* @GetProcessHeap()
  %vec_ptr0 = tail call i8* @HeapAlloc(i8* %proc_heap, i32 0, i64 8)
  br label %loop_hdr
loop_hdr:                                         ; preds = %loop_write, %entry
  %vec_ptr1 = phi i8* [ %vec_ptr0, %entry ], [ %vec_ptr2, %loop_write ]
  %vec_cap1 = phi i64 [ 8, %entry ], [ %new_vec_cap1, %loop_write ]
  %vec_len1 = phi i64 [ 0, %entry ], [ %new_vec_len1, %loop_write ]
  %val = phi i8 [ 0, %entry ], [ %1, %loop_write ]
  %0 = icmp eq i8 %val, 255
  %1 = add nuw i8 %val, 1
  %2 = sub i64 %vec_cap1, %vec_len1
  %3 = icmp ult i64 %2, 2
  br i1 %3, label %check_realloc, label %pre_loop_write
check_realloc:                   ; preds = %loop_hdr
  %len_plus_two_checked = tail call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %vec_len1, i64 2)
  %checked_res = extractvalue { i64, i1 } %len_plus_two_checked, 1
  br i1 %checked_res, label %exit, label %realloc
realloc:                         ; preds = %check_realloc
  %len_plus_two = extractvalue { i64, i1 } %len_plus_two_checked, 0
  %cap_times_two = shl i64 %vec_cap1, 1
  %4 = icmp ugt i64 %cap_times_two, %len_plus_two
  %new_cap = select i1 %4, i64 %cap_times_two, i64 %len_plus_two
  %new_vec_ptr = tail call i8* @HeapReAlloc(i8* %proc_heap, i32 0, i8* %vec_ptr1, i64 %new_cap)
  br label %loop_write
pre_loop_write:                                             ; preds = %loop_hdr
  %new_vec_len2 = add i64 %vec_len1, 2
  br label %loop_write
loop_write:                                       ; preds = %realloc, %pre_loop_write
  %new_vec_len1 = phi i64 [ %new_vec_len2, %pre_loop_write ], [ %len_plus_two, %realloc ]
  %vec_ptr2 = phi i8* [ %vec_ptr1, %pre_loop_write ], [ %new_vec_ptr, %realloc ]
  %new_vec_cap1 = phi i64 [ %vec_cap1, %pre_loop_write ], [ %new_cap, %realloc ]
  %vec_store_slot1 = getelementptr inbounds i8, i8* %vec_ptr2, i64 %vec_len1
  store i8 %val, i8* %vec_store_slot1, align 1
  %vec_store_slot2 = getelementptr inbounds i8, i8* %vec_store_slot1, i64 1
  store i8 %val, i8* %vec_store_slot2, align 1
  br i1 %0, label %exit, label %loop_hdr
exit:                                             ; preds = %loop_write, %check_realloc
  %exit_code = phi i32 [ 0, %loop_write ], [ 1, %check_realloc ]
  ret i32 %exit_code
}
declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64)
declare i8* @GetProcessHeap() unnamed_addr
declare i8* @HeapAlloc(i8*, i32, i64) unnamed_addr
declare i8* @HeapReAlloc(i8*, i32, i8*, i64) unnamed_addr

Running this through clang -O (I tried v 10.0.0), the resulting binary will continue to allocate memory in an infinite loop.

Running it through opt -loop-reduce will return a slightly different IR:

loop_hdr:
[...]
  %0 = add nuw i8 %val, 1
[...]
loop_write:
[...]
  %4 = icmp eq i8 %0, 0
  br i1 %4, label %exit, label %loop_hdr

This is the same issue @comex mentioned above. LLVM should be clearing the no wrap flag on the add there after transforming to a post-increment check but it doesn't. Then later it just gets rid of the exit condition altogether since it believes that the add will never wrap, it can't possibly be equal to 0.

But that doesn't explain why the flag doesn't seem to work post 1.44, looking at the IR (with -cvp-dont-add-nowrap-flags=true):

1.44.1:

%_5.0.i.i.i = add i8 %iter.sroa.0.060, 1

1.45+:

%7 = add nuw i8 %iter.sroa.0.063, 1

So in 1.45+ we're not relying on LLVM's CVP to determine that the add has no unsigned wrap but some other mechanism. But after looking through print-after-all for a bit I'm pretty sure this new line in Range::next is the reason:

let n = unsafe { Step::forward_unchecked(self.start.clone(), 1) };

That boils down to an unchecked_add which in LLVM becomes LLVMBuildNUWAdd in this case.

Since that code was introduced in #69659 that lines up with the behaviour we see in 1.45+.

Either way, still an LLVM bug that's just triggered more now.

@pnkfelix
Copy link
Member

thank you for that excellent analysis @luqmana

@pnkfelix
Copy link
Member

pnkfelix commented Aug 1, 2020

Filed https://bugs.llvm.org/show_bug.cgi?id=46943 with LLVM upstream.

@pnkfelix
Copy link
Member

pnkfelix commented Aug 1, 2020

The only remaining question on our end is if we should, for the short term, change that range.rs code to not used an unchecked_add, in order to (hopefully) bypass the misoptimization here.

(That, or try to figure out a patch to LLVM to fix the bug on their end.)

@pnkfelix
Copy link
Member

pnkfelix commented Aug 4, 2020

To be clear: my thinking is that we could change the range.rs and/or Step code to use wrapping_add rather than unchecked_add.

That way, we shouldn't generate any overflow checking code, so I wouldn't expect any significant performance hit.

At the same time, I'm not really a big fan of putting in such narrowly targetted hacks to work around LLVM bugs, especially ones that have been confirmed to be bugs on LLVM's end.

@pnkfelix
Copy link
Member

pnkfelix commented Aug 4, 2020

At the same time, I'm not really a big fan of putting in such narrowly targetted hacks to work around LLVM bugs, especially ones that have been confirmed to be bugs on LLVM's end.

(having said that, this bug does seem to serve as evidence that our code is probably executing an overflowing addition via unchecked_add. Even though that "merely" generates poison under this particular codegen path (that should be unused and thus won't cause UB), strictly speaking, it is a violation of the contract written for Rust's unchecked_add:

Returns the result of an unchecked addition, resulting in undefined behavior when x + y > T::MAX or x + y < T::MIN.

I'm going to go in and confirm that this add may indeed be overflowing, and if it is, I probably will push for changing to wrapping_add here instead.


Update: okay, its hard to see how this could possibly be causing an overflowing addition under Rust's machine model:

        if self.start < self.end {
            // SAFETY: just checked precondition
            let n = unsafe { Step::forward_unchecked(self.start.clone(), 1) };
            ...
        }

@spastorino
Copy link
Member

spastorino commented Aug 20, 2020

Downgrading priority to P-high as discussed on the compiler team meeting.

@spastorino spastorino added P-high High priority and removed P-critical Critical priority labels Aug 20, 2020
@camelid camelid added the A-LTO Area: Link-time optimization (LTO) label Oct 27, 2020
@pnkfelix
Copy link
Member

I have a patch that fixes (what I believe to be) the underlying LLVM bug; see https://bugs.llvm.org/show_bug.cgi?id=46943 ; I'm working on making a LLVM unit test and then I'll put up a patch for the llvm-devs to review.

@AlyoshaVasilieva
Copy link
Author

Now working on nightly 2021-03-04, fixed by the LLVM 12 upgrade.

@nikic nikic closed this as completed Mar 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-LTO Area: Link-time optimization (LTO) C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests