Memory unsafety problem in safe Rust #69225

dfyz · 2020-02-17T03:40:22Z

I have a small program (a simplification of a test function from a larger project) that slices a small array and tries to access an out-of-bounds element of the slice. Running it with cargo run --release using the stable 1.41.0 release prints something like this (tested on macOS 10.15 and Ubuntu 19.10):

0 0 3 18446744073709551615
[1]    21065 segmentation fault  cargo run --release

It looks like the resulting slice somehow has length 2**64 - 1, so the bounds checking is omitted, which predictably results in a segfault. On 1.39.0 and 1.40.0 the very same program prints what I would expect:

0 0 3 0
thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 16777216', src/main.rs:13:35
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

The problem goes away if I do any of the following:

remove either of the two do_test(...); calls in main();
remove the for _ in 0..1 { loop;
replace the for y in 0..x { loop with for y in 0..1 {;
remove the z.extend(std::iter::repeat(0).take(x)); line or replace it with z.extend(std::iter::repeat(0).take(1));;
replace the for arr_ref in arr { loop with let arr_ref = &arr[0];;
specify RUSTFLAGS="-C opt-level=2";
specify RUSTFLAGS="-C codegen-units=1".

My best guess is -C opt-level=3 enables a problematic optimization pass in LLVM, which results in miscompilation. This is corroborated by the fact that MIR (--emit mir) and LLVM IR before optimizations (--emit llvm-ir -C no-prepopulate-passes) is the same for both -C opt-level=2 and -C opt-level=3.

Some additional info that might be helpful:

I can't reproduce the problem in the Rust playground (presumably because it uses codegen-units = 1);
I can't reproduce the problem on Windows 10 with the same 1.41.0 release (no idea what makes it different);
cargo-bisect-rustc says the regression first happened in the 2019-12-12 nightly, specifically in this commit. This seems suspicious to me, given that 1.40.0, which does not exhibit the problem, was released after this date.

I'm attaching the program inline in case the GitHub repo doesn't work (if you want to compile it without Cargo, use rustc -C opt-level=3 main.rs):

fn do_test(x: usize) {
    let arr = vec![vec![0u8; 3]];

    let mut z = Vec::new();
    for arr_ref in arr {
        for y in 0..x {
            for _ in 0..1 {
                z.extend(std::iter::repeat(0).take(x));
                let a = y * x;
                let b = (y + 1) * x - 1;
                let slice = &arr_ref[a..b];
                eprintln!("{} {} {} {}", a, b, arr_ref.len(), slice.len());
                eprintln!("{:?}", slice[1 << 24]);
            }
        }
    }
}

fn main() {
    do_test(1);
    do_test(2);
}

The text was updated successfully, but these errors were encountered:

pietroalbini · 2020-02-17T11:07:58Z

cc @rust-lang/compiler
@rustbot ping icebreakers-llvm

The release team is considering making a point release for Rust 1.41 (we briefly discussed it in last week's meeting), and I'd love for this to be included in it if we can get a PR up soon.

rustbot · 2020-02-17T11:08:00Z

Hey LLVM ICE-breakers! This bug has been identified as a good
"LLVM ICE-breaking candidate". In case it's useful, here are some
instructions for tackling these sorts of bugs. Maybe take a look?
Thanks! <3

cc @comex @DutchGhost @hanna-kruppe @hdhoang @heyrutvik @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique @vgxbj

RalfJung · 2020-02-17T11:32:02Z

Running it with cargo run --release using the stable 1.41.0 release prints something like this (tested on macOS 10.15 and Ubuntu 19.10):

I cannot reproduce this on playground. The program works fine there on 1.41.0 in release mode.

EDIT: Ah, you already said that.
Also the program is fine in Miri, so this is likely not UB but a miscompilation.

BurntSushi · 2020-02-17T12:46:30Z

Just to add a data point, I can reproduce this on Linux with the latest nightly:

[andrew@krusty rust-69225]$ rustc --version
rustc 1.43.0-nightly (5e7af4669 2020-02-16)

[andrew@krusty rust-69225]$ cat main.rs
fn do_test(x: usize) {
    let arr = vec![vec![0u8; 3]];

    let mut z = Vec::new();
    for arr_ref in arr {
        for y in 0..x {
            for _ in 0..1 {
                z.extend(std::iter::repeat(0).take(x));
                let a = y * x;
                let b = (y + 1) * x - 1;
                let slice = &arr_ref[a..b];
                eprintln!("{} {} {} {}", a, b, arr_ref.len(), slice.len());
                eprintln!("{:?}", slice[1 << 24]);
            }
        }
    }
}

fn main() {
    do_test(1);
    do_test(2);
}

[andrew@krusty rust-69225]$ rustc -C opt-level=3 main.rs

[andrew@krusty rust-69225]$ ./main
0 0 3 18446744073709551615
zsh: segmentation fault (core dumped)  ./main

I was able to reproduce the above with the exact same output with Rust 1.41 stable. Rust 1.40 stable does not exhibit the problem:

$ ./main
0 0 3 0
thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 16777216', main.rs:13:35
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

I think this is all consistent with @dfyz's report, except this at least confirms that it isn't macOS specific.

SimonSapin · 2020-02-17T13:20:30Z

cargo-bisect-rustc says the regression first happened in the 2019-12-12 nightly, specifically in this commit. This seems suspicious to me, given that 1.40.0, which does not exhibit the problem, was released after this date.

This is expected. 1.40.0 was released on 2019-12-19 based on what was then the beta branch, which was branched from master six weeks earlier (around the time of the 1.39.0 release). See https://doc.rust-lang.org/book/appendix-07-nightly-rust.html for more about release channels.

RalfJung · 2020-02-17T13:24:30Z

If I had to guess, I'd say #67015 is the likely culprit. This fixed 3 codegen issues, so it touches codegen-critical code.
Cc @osa1 @oli-obk @wesleywiser

osa1 · 2020-02-17T13:34:01Z

Thanks for the ping. I'm currently making a build to investigate. It's possible that this is a bug introduced with #67015. Another possibility is that #67015 just revealed an existing bug.

osa1 · 2020-02-17T13:54:56Z

I managed to reproduce the segfault using rustc nightly on Linux, but not with my build generated with this config.toml:

config.toml

[llvm]

[build]

[install]

[rust]
optimize = true
debug = true
codegen-units = 0
debug-assertions = true
debuginfo-level = 2

[target.x86_64-unknown-linux-gnu]
llvm-config = "/usr/bin/llvm-config-9"

[dist]

Using rustc nightly I checked MIRs before and after ConstProp and the MIRs are identical. So if this is caused by ConstProp then it's because of a difference in generated code of a library, not of this program.

hellow554 · 2020-02-17T15:16:28Z

Regression in 033662d

@rustbot modify labels: -E-needs-bisection

@osa1 the debug-assertions = true is probably to blame. When I try to compile (with a vanilla nightly compiler) the program with -C debug-assertions=y the program panics instead of the segfault

shahn · 2020-02-17T17:40:40Z

I think I solved it! Reverting a983e05 fixes the issue. It seemed like the culprit to me all day, but I couldn't test it at work :) Can someone help me how to best make a PR for this issue? I think the best way would be to add a testcase that must fail, but it seems this is very platform-specific etc, so maybe that is not a good idea after all? Any ideas here? Thanks!

lqd · 2020-02-17T17:44:43Z

(This was already bisected by the OP)

I managed to reproduce this with a local build with debug-assertions turned off (thanks for that @hellow554).

Some of the PRs in the rollup cause conflicts when reverted because we have since then rustfmt-ed everything, but I believe this issue is because of #67174.

edit: it seems we found this at the same time @shahn :)

peterhj · 2020-02-17T23:57:17Z

To add another data point, the problem disappears when codegen-units is set to 3 or less (assuming release profile, with incremental=false). In other words I'm able to reproduce when codegen-units is 4 or greater.

tmiasko · 2020-02-18T00:09:28Z

The call to panic_bounds_check disappears after LLVM Jump Threading pass. I can reproduce the issue with opt from LLVM 9, but not from LLVM 10.

dfyz · 2020-02-18T02:46:41Z

So, I checked out and built a stage 1 rustc (./x.py build -i --stage 1), rebuilt libstd (./x.py build -i --stage 1 --keep-stage 0 src/libstd) without #67174, and recompiled the segfaulting program with four codegen units (rustc +stage1 -C opt-level=3 -C codegen-units=4 main.rs). As expected, this made the segfault go away. If I apply #67174 back again, the segfault returns.

This means I now have two compilers which only differ in the standard library they use. Let's call these compilers GOOD (no segfault) and BAD (segfault).

I then noticed that the 4 unoptimized *.ll files generated by GOOD (-C no-prepopulate-passes) are pretty much the same as the ones generated by BAD (the only difference I saw were the different random IDs in function names), but the optimized *.ll files (no -C no-prepopulate-passes) are wildly different. I'm not a compiler expert in any way, but since the program being compiled is exactly the same in both cases, both compilers proper are exactly the same, and the only difference is in a precompiled standard library, I think LTO might be involved.

Indeed, if I pass any of -Z thinlto=no, -C lto=no, -C lto=yes, -C lto=thin (at this point I'm really not sure, but I guess all forms of -C lto are different from ThinLTO, which is used by default) to BAD, the segfault once again disappears.

Does it look likely that LTO might be at fault here?

This fixes a a segfault in safe code, a stable regression. Reported in \rust-lang#69225. This reverts commit a983e05. Also adds a test for the expected behaviour.

dfyz · 2020-02-24T04:09:38Z

I think I just found a way to reproduce the issue even after reverting #67174. Here's a slightly longer, but still safe program that reliably segfaults on Windows, Linux and macOS using the latest nightly with #67174 reverted:

fn do_test(x: usize) {
    let mut arr = vec![vec![0u8; 3]];

    let mut z = vec![0];
    for arr_ref in arr.iter_mut() {
        for y in 0..x {
            for _ in 0..1 {
                z.reserve_exact(x);
                let iterator = std::iter::repeat(0).take(x);
                let mut cnt = 0;
                iterator.for_each(|_| {
                    z[0] = 0;
                    cnt += 1;
                });
                let a = y * x;
                let b = (y + 1) * x - 1;
                let slice = &mut arr_ref[a..b];
                slice[1 << 24] += 1;
            }
        }
    }
}

fn main() {
    do_test(1);
    do_test(2);
}

Windows:

PS> rustup run nightly rustc --version
rustc 1.43.0-nightly (6d0e58bff 2020-02-23)
PS> rustup run nightly cargo run --release
    Finished release [optimized] target(s) in 0.01s
     Running `target\release\rust-segfault.exe`
error: process didn't exit successfully: `target\release\rust-segfault.exe` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)

Linux:

$ rustup run nightly rustc --version
rustc 1.43.0-nightly (6d0e58bff 2020-02-23)
$ rustup run nightly cargo run --release
    Finished release [optimized] target(s) in 1.13s
     Running `target/release/rust-segfault`
Segmentation fault (core dumped)

macOS:

λ rustup run nightly rustc --version
rustc 1.43.0-nightly (6d0e58bff 2020-02-23)
λ rustup run nightly cargo run --release
    Finished release [optimized] target(s) in 0.01s
     Running `target/release/rust-segfault`
[1]    24331 segmentation fault  rustup run nightly cargo run --release

This program doesn't depend on the number of codegen units, so it segfaults in the Playground, too (on stable, beta, and nightly). I also reproduced this by compiling rustc from master (with #67174 reverted) linked against LLVM 9.

The underlying LLVM bug is still the same, so upgrading to LLVM 10 or cherry-picking the LLVM fix makes the segfault go away.

I really wish I understood what's going on better. It does look like the bounds checks are elided because of the extra nuw, which comes from incorrectly cached SCEV values (just like in the C program from the thread @nikic linked to). But by the time the bad optimization happens, I can barely recognize my simple program through the layers of LLVM basic blocks; worse, any seemingly no-op change in the source code (e.g., removing the cnt variable) leads to a very differently looking LLVM IR and makes the problem disappear.

dfyz · 2020-02-24T20:46:00Z

My impression is that 1.41.1 has just been finalized in #69359 (bad timing on my part), so there is not much that can be done at this point. Is it at least a good idea to update the comment in Layout::repeat() with a more detailed explanation of the LLVM issue? If so, I can send a PR.

pietroalbini · 2020-02-24T21:40:21Z

My impression is that 1.41.1 has just been finalized in #69359 (bad timing on my part), so there is not much that can be done at this point.

If the patch we included in 1.41.1 doesn't actually fix the problem we should reconsider whether we want to backport the new fix and rebuild the release. There was consensus in the release team meeting not to backport the LLVM fix, but I personally think ~~another~~ this new PoC could warrant another discussion on the topic.

cc @Mark-Simulacrum @rust-lang/release

pietroalbini · 2020-02-24T21:55:48Z

@dfyz we'll try to get another build of 1.41.1 with the LLVM fix backported, while we wait for consensus on actually shipping that.

cuviper · 2020-02-24T22:34:30Z

FWIW, for me the new reproducer works as expected (index out of bounds) on stable 1.38.0 and earlier, but segfaults on 1.39.0 and later. There's not a whole lot of difference in LLVM between 1.38 and 1.39 (rust-lang/llvm-project@71fe7ec...8adf9bd), but it could be any sort of Rust-generated difference along the way too.

@ghost

…oalbini [stable] Cherry-pick LLVM fix for 1.41.1 This PR cherry-picks the LLVM patch required to properly fix #69225 on 1.41.1, and adds a test to make sure the fix works. r? @ghost

dfyz · 2020-02-25T00:46:11Z

the new reproducer works as expected (index out of bounds) on stable 1.38.0

I (accidentally) found out that setting -C codegen-units=1 on 1.38.0 reproduces the segfault. 1.37.0 seems safe to me (no combination of options I tried produces a segfault).

Disregard that, 1.37.0 uses LLVM 8.
~~Curiously, the LLVM IR diff between 1.37.0 and 1.38.0 (with -C codegen-units=1) is just one line:~~

- %71 = icmp eq {}* %70, null
+ %71 = icmp ule {}* %70, null

~~(where %70 is derived from the result of <core::slice::IterMut<T> as core::iter::traits::iterator::Iterator>::next())~~

~~This alone is enough to trick LLVM into adding the dreaded nuw to add nuw i64 %x, -1.~~

cuviper · 2020-02-25T00:51:33Z

1.37.0 seems safe to me (no combination of options I tried produces a segfault).

That's using LLVM 8, so the blamed SCEV change shouldn't exist at all.

dfyz · 2020-02-25T00:56:12Z

That's using LLVM 8

My bad, sorry for the confusion (I was so happy to reduce it to a one-line diff I didn't even bother checking the LLVM version).

pietroalbini · 2020-02-25T11:02:04Z

We prepared new 1.41.1 artifacts with the LLVM fix cherry-picked in it. You can test them locally with:

RUSTUP_DIST_SERVER=https://dev-static.rust-lang.org rustup update stable

@kraai

Cherry-pick the LLVM fix for #69225 An additional reproducer was provided in #69225 -- the new testcase here -- which still crashes even after #69241 reverted #67174. Now this pull request updates LLVM with the cherry-picked reversion of its own. This is also going to stable in #69444. I have not tried to reapply #67174 yet -- cc @kraai @shahn

petrochenkov · 2020-02-26T19:14:40Z

ping in #69225 (comment)

[triagebot] The issue was successfully resolved without any involvement from the pinged compiler team.
Bretty good.

dfyz · 2020-02-27T18:16:07Z

1.41.1 is out, I guess it's time to finally close this issue.

@ghost

[beta] backports This backports the following PRs: * ci: switch macOS builders to 10.15 #68863 * Backport release notes of 1.41.1 #69468 * Cherry-pick the LLVM fix for #69225 #69450 * `lit_to_const`: gracefully bubble up type errors. #69330 * [beta] bootstrap from 1.41.1 stable #69518 * bootstrap: Configure cmake when building sanitizer runtimes #69104 r? @ghost

dfyz added the C-bug Category: This is a bug. label Feb 17, 2020

rustbot added the ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group label Feb 17, 2020

XAMPPRocky mentioned this issue Feb 17, 2020

Ping teams for marker teams rust-lang/triagebot#345

Open

jonas-schievink added the E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc label Feb 17, 2020

rustbot removed the E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc label Feb 17, 2020

This comment has been minimized.

Sign in to view

shahn mentioned this issue Feb 17, 2020

Revert "Remove checked_add in Layout::repeat" #69241

Merged

cuviper mentioned this issue Feb 24, 2020

Revert "[SCEV] add no wrap flag for SCEVAddExpr." rust-lang/llvm-project#39

Merged

cuviper mentioned this issue Feb 24, 2020

Revert "[SCEV] add no wrap flag for SCEVAddExpr." rust-lang/llvm-project#40

Closed

pietroalbini mentioned this issue Feb 24, 2020

[stable] Cherry-pick LLVM fix for 1.41.1 #69444

Merged

cuviper added a commit to cuviper/rust that referenced this issue Feb 25, 2020

Cherry-pick the LLVM fix for rust-lang#69225

6e84699

cuviper mentioned this issue Feb 25, 2020

Cherry-pick the LLVM fix for #69225 #69450

Merged

cuviper mentioned this issue Feb 26, 2020

Rust 1.41.1 announcement rust-lang/blog.rust-lang.org#525

Merged

dfyz closed this as completed Feb 27, 2020

Mark-Simulacrum pushed a commit to Mark-Simulacrum/rust that referenced this issue Feb 28, 2020

Cherry-pick the LLVM fix for rust-lang#69225

6f7372b

Mark-Simulacrum mentioned this issue Feb 28, 2020

[beta] backports #69535

Merged

This was referenced Feb 9, 2021

reduce threads spawned by ui-tests #81942

Merged

Safe code (ui/issues/issue-69225-SCEVAddExpr-wrap-flag.rs) miscompiles under llvm9 + codegen-units=1 #81946

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory unsafety problem in safe Rust #69225

Memory unsafety problem in safe Rust #69225

dfyz commented Feb 17, 2020

pietroalbini commented Feb 17, 2020

rustbot commented Feb 17, 2020

RalfJung commented Feb 17, 2020 •

edited

Loading

BurntSushi commented Feb 17, 2020

SimonSapin commented Feb 17, 2020

RalfJung commented Feb 17, 2020

osa1 commented Feb 17, 2020

osa1 commented Feb 17, 2020

hellow554 commented Feb 17, 2020

shahn commented Feb 17, 2020

lqd commented Feb 17, 2020 •

edited

Loading

This comment has been minimized.

peterhj commented Feb 17, 2020 •

edited

Loading

tmiasko commented Feb 18, 2020 •

edited

Loading

dfyz commented Feb 18, 2020

dfyz commented Feb 24, 2020

dfyz commented Feb 24, 2020

pietroalbini commented Feb 24, 2020 •

edited

Loading

pietroalbini commented Feb 24, 2020

cuviper commented Feb 24, 2020

dfyz commented Feb 25, 2020 •

edited

Loading

cuviper commented Feb 25, 2020

dfyz commented Feb 25, 2020

pietroalbini commented Feb 25, 2020

petrochenkov commented Feb 26, 2020

dfyz commented Feb 27, 2020 •

edited

Loading

Memory unsafety problem in safe Rust #69225

Memory unsafety problem in safe Rust #69225

Comments

dfyz commented Feb 17, 2020

pietroalbini commented Feb 17, 2020

rustbot commented Feb 17, 2020

RalfJung commented Feb 17, 2020 • edited Loading

BurntSushi commented Feb 17, 2020

SimonSapin commented Feb 17, 2020

RalfJung commented Feb 17, 2020

osa1 commented Feb 17, 2020

osa1 commented Feb 17, 2020

hellow554 commented Feb 17, 2020

shahn commented Feb 17, 2020

lqd commented Feb 17, 2020 • edited Loading

This comment has been minimized.

peterhj commented Feb 17, 2020 • edited Loading

tmiasko commented Feb 18, 2020 • edited Loading

dfyz commented Feb 18, 2020

dfyz commented Feb 24, 2020

dfyz commented Feb 24, 2020

pietroalbini commented Feb 24, 2020 • edited Loading

pietroalbini commented Feb 24, 2020

cuviper commented Feb 24, 2020

dfyz commented Feb 25, 2020 • edited Loading

cuviper commented Feb 25, 2020

dfyz commented Feb 25, 2020

pietroalbini commented Feb 25, 2020

petrochenkov commented Feb 26, 2020

dfyz commented Feb 27, 2020 • edited Loading

RalfJung commented Feb 17, 2020 •

edited

Loading

lqd commented Feb 17, 2020 •

edited

Loading

peterhj commented Feb 17, 2020 •

edited

Loading

tmiasko commented Feb 18, 2020 •

edited

Loading

pietroalbini commented Feb 24, 2020 •

edited

Loading

dfyz commented Feb 25, 2020 •

edited

Loading

dfyz commented Feb 27, 2020 •

edited

Loading