Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc segfaults if out-of-disk is encountered inside LLVM #122089

Open
saethlin opened this issue Mar 6, 2024 · 7 comments
Open

rustc segfaults if out-of-disk is encountered inside LLVM #122089

saethlin opened this issue Mar 6, 2024 · 7 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@saethlin
Copy link
Member

saethlin commented Mar 6, 2024

In a recent crater run I found a suspicious number of segfaults from rustc (search all crater logs for SIGSEGV: invalid memory reference): #121282 (comment). Examples:
https://crater-reports.s3.amazonaws.com/pr-121282/try%23d073071d77ce0f93b4fd8cc567a1e2b9e1b22126%2Brustflags=-Copt-level=3/gh/tgnottingham.valgrind-ebpf-rodata-bug/log.txt
https://crater-reports.s3.amazonaws.com/pr-121282/try%23d073071d77ce0f93b4fd8cc567a1e2b9e1b22126%2Brustflags=-Copt-level=3/gh/wyatt-herkamp.auto_project/log.txt
https://crater-reports.s3.amazonaws.com/pr-121282/try%23d073071d77ce0f93b4fd8cc567a1e2b9e1b22126%2Brustflags=-Copt-level=3/gh/wrsturgeon.gamecontroller3-nix/log.txt

It's a bit suspicious that these all appear after a cascade of LLVM ERROR output. I've modified the test rig I posted here to test this situation: #119510

new evil write impl

// cargo add libc backtrace

use std::sync::atomic::{AtomicBool, Ordering::SeqCst};

static DISK_FULL: AtomicBool = AtomicBool::new(false);

fn check() -> bool {
    let mut found_llvm = false;

    backtrace::trace(|frame| {
        backtrace::resolve_frame(frame, |symbol| {
            let Some(name) = symbol.name() else {
                return;
            };
            let Some(name) = name.as_str() else {
                return;
            };
            if name.contains("llvm") {
                found_llvm = true;
            }
        });
        !found_llvm
    });

    found_llvm
}

#[no_mangle]
pub extern "C" fn write(
    fd: libc::c_int,
    buf: *const libc::c_void,
    count: libc::size_t,
) -> libc::ssize_t {
    if fd > 2 && (DISK_FULL.load(SeqCst) || check()) {
        DISK_FULL.store(true, SeqCst);
        unsafe {
            *libc::__errno_location() = libc::ENOSPC;
        }
        return -1;
    } else {
        unsafe {
            let res =
                libc::syscall(libc::SYS_write, fd as usize, buf as usize, count as usize) as isize;
            if res < 0 {
                *libc::__errno_location() = -res as i32;
                -1
            } else {
                res
            }
        }
    }
}

Then build any sizable project with that write impl:

LD_PRELOAD=/home/ben/evil/target/release/libevil.so cargo +stable build

and you should be greeted with a huge pile of errors:

output from building exa

╭ ➜ ben@archlinux:~/rustc-perf/collector/compile-benchmarks/exa-0.10.1
╰ ➤ LD_PRELOAD=/home/ben/evil/target/release/libevil.so cargo +stable build
   Compiling libc v0.2.93
   Compiling pkg-config v0.3.19
   Compiling unicode-width v0.1.8
   Compiling tinyvec_macros v0.1.0
   Compiling matches v0.1.8
   Compiling log v0.4.14
   Compiling cfg-if v1.0.0
   Compiling percent-encoding v2.1.0
   Compiling bitflags v1.2.1
   Compiling byteorder v1.4.3
   Compiling lazy_static v1.4.0
   Compiling scoped_threadpool v0.1.9
   Compiling ansi_term v0.12.1
   Compiling natord v1.0.9
   Compiling number_prefix v0.4.0
   Compiling glob v0.3.0
   Compiling tinyvec v1.2.0
LLVM ERROR: IO failure on output stream: No space left on device
LLVM ERROR: IO failure on output stream: No space left on device
   Compiling unicode-bidi v0.3.5
error: could not compile `cfg-if` (lib)
warning: build failed, waiting for other jobs to finish...
LLVM ERROR: IO failure on output stream: No space left on device
error: could not compile `tinyvec_macros` (lib)
LLVM ERROR: IO failure on output stream: No space left on device
LLVM ERROR: IO failure on output stream: No space left on device
error: could not compile `log` (build script)
error: could not compile `lazy_static` (lib)
error: could not compile `matches` (lib)
LLVM ERROR: IO failure on output stream: No space left on device
error: could not compile `unicode-width` (lib)
LLVM ERROR: IO failure on output stream: No space left on device
LLVM ERROR: IO failure on output stream: No space left on device
error: could not compile `number_prefix` (lib)
LLVM ERROR: IO failure on output stream: No space left on device
error: could not compile `natord` (lib)
error: could not compile `percent-encoding` (lib)
LLVM ERROR: IO failure on output stream: No space left on device
error: could not compile `bitflags` (build script)
LLVM ERROR: IO failure on output stream: No space left on deviceLLVM ERROR: IO failure on output stream: No space left on device

LLVM ERROR: IO failure on output stream: No space left on device
error: could not compile `libc` (build script)
error: could not compile `scoped_threadpool` (lib)
LLVM ERROR: IO failure on output stream: No space left on device
error: could not compile `byteorder` (lib)
LLVM ERROR: IO failure on output stream: No space left on deviceLLVM ERROR: IO failure on output stream: No space left on device

LLVM ERROR: IO failure on output stream: No space left on device
LLVM ERROR: IO failure on output stream: No space left on device
error: rustc interrupted by SIGSEGV, printing backtrace

/home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-ef0b2e016afc8182.so(+0x2c31aa6)[0x7e8f4da31aa6]
/usr/lib/libc.so.6(+0x3c770)[0x7e8f4ac5a770]
/home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc(+0x6c232)[0x63c5eff78232]
/home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc(+0x1c635)[0x63c5eff28635]
/usr/lib/libc.so.6(+0x3ebb0)[0x7e8f4ac5cbb0]
/usr/lib/libc.so.6(+0x3ec80)[0x7e8f4ac5cc80]
/home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-ef0b2e016afc8182.so(+0x38094f9)[0x7e8f4e6094f9]
/home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/libLLVM-17-rust-1.76.0-stable.so(_ZN4llvm18report_fatal_errorERKNS_5TwineEb+0x127)[0x7e8f46e03f07]
/home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/libLLVM-17-rust-1.76.0-stable.so(+0x70869ce)[0x7e8f4a0869ce]
/home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-ef0b2e016afc8182.so(LLVMRustWriteOutputFile+0x20b)[0x7e8f4f8163d3]
/home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-ef0b2e016afc8182.so(+0x4a15f95)[0x7e8f4f815f95]
/home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-ef0b2e016afc8182.so(+0x4a13713)[0x7e8f4f813713]
/home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-ef0b2e016afc8182.so(+0x4a133c2)[0x7e8f4f8133c2]
/home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-ef0b2e016afc8182.so(+0x4aabbf1)[0x7e8f4f8abbf1]
/home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-ef0b2e016afc8182.so(+0x4aab6c2)[0x7e8f4f8ab6c2]
/home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/libstd-66d8041607d2929b.so(rust_metadata_std_79729d9c385e1623+0xbe8e5)[0x7e8f50b4a8e5]
/usr/lib/libc.so.6(+0x8b55a)[0x7e8f4aca955a]
/usr/lib/libc.so.6(+0x108a3c)[0x7e8f4ad26a3c]

note: we would appreciate a report at https://github.com/rust-lang/rust
error: could not compile `ansi_term` (lib)
LLVM ERROR: IO failure on output stream: No space left on deviceLLVM ERROR: IO failure on output stream: No space left on device

error: could not compile `pkg-config` (lib)
LLVM ERROR: IO failure on output stream: No space left on device
LLVM ERROR: IO failure on output stream: No space left on device
error: could not compile `unicode-bidi` (lib)
LLVM ERROR: IO failure on output stream: No space left on device
error: could not compile `tinyvec` (lib)
error: could not compile `glob` (lib)

Caused by:
  process didn't exit successfully: `/home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc --crate-name glob /home/ben/.cargo/registry/src/index.crates.io-6f17d22bba15001f/glob-0.3.0/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --diagnostic-width=128 --crate-type lib --emit=dep-info,metadata,link -C embed-bitcode=no -C metadata=75c7fd398ccc3c18 -C extra-filename=-75c7fd398ccc3c18 --out-dir /home/ben/rustc-perf/collector/compile-benchmarks/exa-0.10.1/target/debug/deps -L dependency=/home/ben/rustc-perf/collector/compile-benchmarks/exa-0.10.1/target/debug/deps --cap-lints allow` (signal: 11, SIGSEGV: invalid memory reference)

gdb reports this backtrace from the core dump:

#0  0x000063c5eff78232 in tcache_bin_flush_match (edata=0x0, 
    cur_arena_ind=<error reading variable: Incompatible types on DWARF stack>, cur_binshard=<optimized out>, small=true)
    at src/tcache.c:432
#1  tcache_bin_flush_impl (tsd=0x7e8f301ffc78, cache_bin=0x7e8f302001d0, binind=21, nflush=16, small=true, 
    tcache=<optimized out>, ptrs=<optimized out>) at src/tcache.c:434
#2  tcache_bin_flush_bottom (tsd=0x7e8f301ffc78, tsd@entry=0x0, tcache=<optimized out>, cache_bin=0x7e8f302001d0, 
    cache_bin@entry=0xa, binind=21, rem=<optimized out>, small=true) at src/tcache.c:519
#3  _rjem_je_tcache_bin_flush_small (tsd=tsd@entry=0x7e8f301ffc78, tcache=<optimized out>, 
    cache_bin=cache_bin@entry=0x7e8f302001d0, binind=21, rem=<optimized out>) at src/tcache.c:529
#4  0x000063c5eff28635 in tcache_dalloc_small (tsd=0x7e8f301ffc78, tcache=0x8, ptr=0x7e8f42cb2d00, binind=26, slow_path=false)
    at include/jemalloc/internal/tcache_inlines.h:157
#5  arena_dalloc (tsdn=0x7e8f301ffc78, ptr=0x7e8f42cb2d00, tcache=0x8, slow_path=false, caller_alloc_ctx=<optimized out>)
    at include/jemalloc/internal/arena_inlines_b.h:331
#6  idalloctm (tsdn=0x7e8f301ffc78, ptr=0x7e8f42cb2d00, tcache=0x8, is_internal=false, slow_path=false, 
    alloc_ctx=<optimized out>) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:120
#7  ifree (tsd=0x7e8f301ffc78, ptr=0x7e8f42cb2d00, tcache=0x8, slow_path=false) at src/jemalloc.c:2887
#8  _rjem_je_free_default (ptr=0x7e8f42cb2d00) at src/jemalloc.c:3014
#9  0x00007e8f4ac5cbb0 in __run_exit_handlers (status=101, listp=0x7e8f4adf6680 <__exit_funcs>, 
    run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:123
#10 0x00007e8f4ac5cc80 in __GI_exit (status=<optimized out>) at exit.c:138
#11 0x00007e8f4e6094f9 in FatalErrorHandler(void*, char const*, bool) ()
   from /home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-ef0b2e016afc8182.so
#12 0x00007e8f46e03f07 in llvm::report_fatal_error(llvm::Twine const&, bool) ()
   from /home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/libLLVM-17-rust-1.76.0-stable.so
#13 0x00007e8f4a0869ce in llvm::raw_fd_ostream::~raw_fd_ostream() [clone .cold.0] ()
   from /home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/libLLVM-17-rust-1.76.0-stable.so
#14 0x00007e8f4f8163d3 in LLVMRustWriteOutputFile ()
   from /home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-ef0b2e016afc8182.so
#15 0x00007e8f4f815f95 in rustc_codegen_llvm::back::write::write_output_file ()
   from /home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-ef0b2e016afc8182.so
#16 0x00007e8f4f813713 in rustc_codegen_llvm::back::write::codegen ()
   from /home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-ef0b2e016afc8182.so
#17 0x00007e8f4f8133c2 in rustc_codegen_ssa::back::write::finish_intra_module_work::<rustc_codegen_llvm::LlvmCodegenBackend> ()
   from /home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-ef0b2e016afc8182.so
#18 0x00007e8f4f8abbf1 in std::sys_common::backtrace::__rust_begin_short_backtrace::<<rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::backend::ExtraBackendMethods>::spawn_named_thread<rustc_codegen_ssa::back::write::spawn_work<rustc_codegen_llvm::LlvmCodegenBackend>::{closure#0}, ()>::{closure#0}, ()> ()
   from /home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-ef0b2e016afc8182.so
#19 0x00007e8f4f8ab6c2 in <<std::thread::Builder>::spawn_unchecked_<<rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::backend::ExtraBackendMethods>::spawn_named_thread<rustc_codegen_ssa::back::write::spawn_work<rustc_codegen_llvm::LlvmCodegenBackend>::{closure#0}, ()>::{closure#0}, ()>::{closure#1} as core::ops::function::FnOnce<()>>::call_once::{shim:vtable#0}
    () from /home/ben/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-ef0b2e016afc8182.so
#20 0x00007e8f50b4a8e5 in alloc::boxed::{impl#47}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:2015
#21 alloc::boxed::{impl#47}::call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:2015
#22 std::sys::unix::thread::{impl#2}::new::thread_start () at library/std/src/sys/unix/thread.rs:108
#23 0x00007e8f4aca955a in start_thread (arg=<optimized out>) at pthread_create.c:447
#24 0x00007e8f4ad26a3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

This looks like allocator corruption from state corruption (a data race?) in an atexit handler.

Note that though I found this in a yet-to-be-merged PR, this trivially reproduces on stable.

@saethlin saethlin added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. labels Mar 6, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 6, 2024
@saethlin saethlin added the I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. label Mar 6, 2024
@Noratrieb Noratrieb added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Mar 6, 2024
@dianqk
Copy link
Member

dianqk commented Mar 7, 2024

Neither adding #[inline(never)] to inside_llvm nor using -Copt-level=0 reproduces it.

@saethlin
Copy link
Member Author

saethlin commented Mar 7, 2024

Neither adding #[inline(never)] to inside_llvm nor using -Copt-level=0 reproduces it.

Ah. That's because this then detects all backtraces as "inside llvm" because the checking function's name contains "llvm". I'll change the function name in the description.

@saethlin
Copy link
Member Author

saethlin commented Mar 7, 2024

Ah I think I see what you meant actually. A debug build of the write shim only sometimes produces the problem. I'm getting a segfault about 1 in 3 builds of exa with an unoptimized shim.

I wouldn't be surprised if a debug build adds enough slowness around write to space out calls and prevent a data race from exploding somewhere.

@dianqk
Copy link
Member

dianqk commented Mar 7, 2024

Ah I think I see what you meant actually. A debug build of the write shim only sometimes produces the problem. I'm getting a segfault about 1 in 3 builds of exa with an unoptimized shim.

I was just wondering if write was mis-compiled. But it doesn't look like it.

@dianqk
Copy link
Member

dianqk commented Mar 7, 2024

I speculate it's because of multiple occurrences of fatal errors.
I'm still not clear about the exact reason. I think the same instance is being released repeatedly at some point.

@dianqk
Copy link
Member

dianqk commented Mar 8, 2024

Also, I'm curious about this issue and #121282. It should not block #121282. I think there is a more important issue to address here?

@saethlin
Copy link
Member Author

saethlin commented Mar 8, 2024

It doesn't block that PR, it's not moving just yet because I'm trying to write a check for int to reference transmutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants