-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIGSEGV from rustc while building crate legion
#77869
Comments
Building
|
Possible duplicate of #77849 |
I'm able to reproduce this, although it is finicky. I'm able to reproduce on stable, and as far back as 1.43. I've been having a hard time bisecting to a specific change, since it is a little inconsistent (it can take a few hundred incremental builds before it fails). The failures seem to start around 126ad2b (#68708), although it might be earlier. I can only repro on my main linux system, but I can't seem to repro on a VM. It seems to always fail with a call to I might keep poking at it for a bit, but I think I'm unlikely to make any breakthroughs. |
just out of curiosity, are there conditions that could accellerate the "reproducibility"? Like, if it's a memory exhaustion and allocations fail, could that theoretically happen sooner on a system (hand-wavy speaking) with resources artificially kept busy? |
@ehuss: What commit of |
@apiraino I don't think it has anything to do with resource exhaustion. So far I have 0 clues. I tried running on valgrind overnight, but it wouldn't fail. @Aaron1011 I'm on 0733aa39b253b3404544afc3485d332429009799 (v0.3.1). @alex5nader Can you include which model of CPU you are using? |
@ehuss I'm using a Ryzen 5 1600. |
I've been getting exactly this same bug for many rust versions both stable and nightly (currently on 1.47) over the past ~6 months or so that I've been trying legion from legion 2.4 to 3.0 to its git version, using a Ryzen7. Even a freshly created Here's a GDB backtrace of the SIGSEGV (which happens on thread 2): #0 free (ptr=0x48c2df416aec23d6) at ../jemalloc/src/jemalloc.c:2393
#1 0x00007ffff3513bcc in <smallvec::SmallVec<A> as core::ops::drop::Drop>::drop () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#2 0x00007ffff35c1011 in <rustc_resolve::late::LateResolutionVisitor as rustc_ast::visit::Visitor>::visit_local () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#3 0x00007ffff35c0822 in <rustc_resolve::late::LateResolutionVisitor as rustc_ast::visit::Visitor>::visit_block () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#4 0x00007ffff35c27d0 in <rustc_resolve::late::LateResolutionVisitor as rustc_ast::visit::Visitor>::visit_fn () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#5 0x00007ffff354bf9c in rustc_ast::visit::walk_assoc_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#6 0x00007ffff35ce54e in rustc_resolve::late::LateResolutionVisitor::with_generic_param_rib () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#7 0x00007ffff35c4f39 in rustc_resolve::late::LateResolutionVisitor::resolve_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#8 0x00007ffff355896e in rustc_ast::visit::walk_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#9 0x00007ffff35c4473 in rustc_resolve::late::LateResolutionVisitor::resolve_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#10 0x00007ffff355896e in rustc_ast::visit::walk_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#11 0x00007ffff35c4473 in rustc_resolve::late::LateResolutionVisitor::resolve_item () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#12 0x00007ffff3549d42 in rustc_ast::visit::walk_crate () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#13 0x00007ffff358aac7 in rustc_resolve::Resolver::resolve_crate () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#14 0x00007ffff08f5c97 in rustc_interface::passes::configure_and_expand_inner () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#15 0x00007ffff08d26c9 in rustc_interface::passes::configure_and_expand::{{closure}} () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#16 0x00007ffff08acecf in rustc_data_structures::box_region::PinnedGenerator<I,A,R>::new () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#17 0x00007ffff08f4965 in rustc_interface::passes::configure_and_expand () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#18 0x00007ffff0915f73 in rustc_interface::queries::Queries::expansion () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#19 0x00007ffff05bf887 in rustc_interface::queries::<impl rustc_interface::interface::Compiler>::enter () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#20 0x00007ffff0553f27 in rustc_span::with_source_map () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#21 0x00007ffff05c1513 in rustc_interface::interface::create_compiler_and_run () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#22 0x00007ffff059f9fa in scoped_tls::ScopedKey<T>::set () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#23 0x00007ffff05b4957 in std::sys_common::backtrace::__rust_begin_short_backtrace () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#24 0x00007ffff053ddae in core::ops::function::FnOnce::call_once{{vtable-shim}} () from /home/overminddl1/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-ff4ec557f69b94a7.so
#25 0x00007fffef94bf5a in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/alloc/src/boxed.rs:1042
#26 <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/alloc/src/boxed.rs:1042
#27 std::sys::unix::thread::Thread::new::thread_start () at library/std/src/sys/unix/thread.rs:87
#28 0x00007fffef83e669 in start_thread (arg=<optimized out>) at pthread_create.c:479
#29 0x00007fffef7642b3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 I can very reliably reproduce this. No other known issues with the system, everything else compiles without issue, everything runs without issue, memtest and other stress tests run without issue. |
I've uploaded my no-op project that constantly reproduces on my system to: https://github.com/OvermindDL1/legion_testing Just |
Memory allocation appears to be fairly minimal at the point of crash, 365megs of VIRT and 348megs of RES, with 99004 of SHM, does not appear to be resource exhaustion of anything that I can see. |
I cloned |
After testing of a few things, I found if I removed the In the small test project, leaving out the default features (which should leave out the Note, it's |
So EDIT1: Commenting out the entirety of EDIT2: Commenting out all of its dependencies still causes a compilation failure... EDIT3: Also commenting out EDIT4: Commenting out EDIT5: Removed all optional dependencies and its still failing to compile, even after a clean. EDIT6: Slowly commenting out large swaths of legion and replacing them with no-ops and got it down to something in the EDIT7: So far I've got it down to EDIT8: And got it down to this macro call EDIT9: Got it down to this line in the macro: EDIT10: Okay so the macro's seem fine, however the argument count to EDIT11: Interestingly, if I try to remove some of the entirely empty modules that I completely commented out then it compiles again... EDIT12: Got it down to just an empty |
So far the only code let uncommented is in macro_rules! cons {
() => (
()
);
($head:tt) => (
($head, ())
);
($head:tt, $($tail:tt),*) => (
($head, cons!($($tail),*))
);
}
fn blah() {
let cons!(A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z) = todo!();
} And in pub mod cons; Apparently it's getting more random when it happens the more code I remove, it still happens about 50% of the time though. And in mod internals; Going to try pulling this into its own project now to see if I can replicate it more standalone... |
I have reduced the code significantly, error is now: $ cargo build
Compiling legion_testing v0.1.0 (/home/overminddl1/rust/legion_testing)
error: could not compile `legion_testing`.
Caused by:
process didn't exit successfully: `rustc --crate-name legion_testing --edition=2018 src/lib.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type lib --emit=dep-info,metadata,link -C embed-bitcode=no -C debuginfo=2 -C metadata=668da26770ceeea9 -C extra-filename=-668da26770ceeea9 --out-dir /home/overminddl1/rust/legion_testing/target/debug/deps -C incremental=/home/overminddl1/rust/legion_testing/target/debug/incremental -L dependency=/home/overminddl1/rust/legion_testing/target/debug/deps` (signal: 11, SIGSEGV: invalid memory reference) I have updated the https://github.com/OvermindDL1/legion_testing project to remove legion and just have the code that tests it. I'm trying to reduce it further but I may be hitting the limit. If I manage to reduce it further then I'll update that repo and post here. |
I've reduced it a little more, I've noticed that the more arguments I remove from the $ rustc --version --verbose
rustc 1.47.0 (18bf6b4f0 2020-10-07)
binary: rustc
commit-hash: 18bf6b4f01a6feaf7259ba7cdae58031af1b7b39
commit-date: 2020-10-07
host: x86_64-unknown-linux-gnu
release: 1.47.0
LLVM version: 11.0
$ cargo --version --verbose
cargo 1.47.0 (f3c7e066a 2020-08-28)
release: 1.47.0
commit-hash: f3c7e066ad66e05439cf8eab165a2de580b41aaf
commit-date: 2020-08-28 Is anyone else above that was having an issue compiling legion try out this minimal repo and The current reproducing code is (in macro_rules! cons {
($head:tt) => (
($head, ())
);
($head:tt, $($tail:tt),*) => (
($head, cons!($($tail),*))
);
}
fn blah() {
let cons!(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, z) = todo!();
} |
Oh, and as a note, it still happens if you replace |
@OvermindDL1 if you expand out the macro does it still crash? Or does it require using the macro? |
@Aaron1011 I'm curious of your CPU and OS |
@OvermindDL1: I'm running Arch Linux with an Intel Core i9-8950HK |
@Aaron1011 I'm able to reproduce with the reduced macro rules example. It can take a fair number of runs for it to fail (for me, anywhere from 1 to 500 runs). I can't seem to get Just using gdb with the core dump, it's pretty much the same error as before. Inside |
@Aaron1011 So not a Ryzen, so far it seems everyone this happens to has a Ryzen, interesting... For note, remotely from my phone over ssh I'm trying to do what I can, even #![feature(prelude_import)]
#[prelude_import]
use std::prelude::v1::*;
#[macro_use]
extern crate std;
macro_rules! cons {
($ head : ident) => (($ head, ())) ;
($ head : ident, $ ($ tail : ident), *) =>
(($ head, cons ! ($ ($ tail), *))) ;
}
fn blah() {
let (a,
(b,
(c,
(d,
(e,
(f,
(g,
(h,
(i,
(j,
(k,
(l,
(m,
(n,
(o,
(p,
(q,
(r,
(s,
(t,
(u, (v, (w, (x, (z, ()))))))))))))))))))))))))) =
{ ::std::rt::begin_panic("not yet implemented") };
} And compiling it via fn blah() {
let (a,
(b,
(c,
(d,
(e,
(f,
(g,
(h,
(i,
(j,
(k,
(l,
(m,
(n,
(o,
(p,
(q,
(r,
(s,
(t,
(u, (v, (w, (x, (z, ()))))))))))))))))))))))))) =
{ ::std::rt::begin_panic("not yet implemented") };
} Again, reducing the depth of the tuples lowers the chance that it happens significantly, very rarely if removed 2, more common crash if adding more. Can replace fn blah() {
let (a,
(b,
(c,
(d,
(e,
(f,
(g,
(h,
(i,
(j,
(k,
(l,
(m,
(n,
(o,
(p,
(q,
(r,
(s,
(t,
(u, (v, (w, (x, (z, (aa, ())))))))))))))))))))))))))) = ();
} And still happens about 50% of the time for me. Hard to do much from my phone, but will try more later as I can. |
This crash feels very similar to https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/b48dd28447fc8ef62fbc963accd301557fd9ac20 but I'm very unsure. Unrelated, but is there a way to get rustc with a newer jemalloc or built without jemalloc just as a test? |
@OvermindDL1 I've been testing without jemalloc, and get the same results, so I don't think it is an issue. If you build rustc from source ( |
@ehuss Very cool, thanks for checking without jemalloc. What's your CPU and OS? You said AMD, but is it a ryzen? I have multiple machines here to test with, most of them AMD, only one is a ryzen and that one is the only one that has an issue, unfortunately it's also my fastest cpu by a significant margin so it's the system I usually use as a build host. |
I have a Ryzen Threadripper 2950X, on Ubuntu 20.04. I'm in the same boat, this is the only machine where it reproduces, but it is also by far the fastest one, so I'm still not sure if it is AMD-specific. |
It always happens on a different thread than the main thread, so I'm actually quite curious if it's some kind of race condition with many core CPUs. Is there a way to specify the number of threads that rustc is allowed to use? I would love to test with a single thread, two threads, on up until I can reproduce it. I guess I can just load it with a forced cpu core affinity, I'll try to do that the next opportunity I get but it might not be for a little while, so if someone else is able to do before me that would probably be better. |
For the most part, rustc is single threaded, it just runs everything on a dedicated thread for various reasons. It only uses multiple threads for code generation (in llvm), and this crash is happening far earlier than that. |
rustc always spawns a thread for the purpose of controlling stack size. If that's hindering your debugging, then you can patch rustc like in #48575. |
@ehuss for rr on Zen you have to use one of workarounds from here: https://github.com/mozilla/rr/wiki/Zen |
Yea, I implemented the workaround, and the script printed |
@ehuss @OvermindDL1 impressive work done here to try to reproduce. Can we now set some facts about it? I'm trying to square the issue for the compiler team. Is the latest snippet in this comment a good reproducible example at least in some range of conditions? Second fact, can we rule out a CPU vendor specific issue? What else can we say about this to help reproducing it reliably? thanks! |
The simplification listed above fails on some versions, but not all. It seems to be really sensitive and will pass where the original I did fair bit of investigation, but did not find anything terribly useful. It is very sensitive to the exact code layout and optimization settings of rustc. For example, compiling I cannot rule out that it is AMD-specific because I don't have easy access to a fast Intel system. I was unable to repro in a virtual machine on an Intel machine. I was also unable to repro on macOS (Intel) or Windows (AMD). If someone can reproduce on an Intel Linux system, that would help rule out anything CPU-specific. If they can get it to fail, then running The script I use to run is: #!/bin/bash
# Run with RUSTUP_TOOLCHAIN=<toolchain name> to test different toolchains.
ulimit -c unlimited
set -e
rustc -V
for i in {1..1000}
do
echo $i
rustc --crate-type rlib foo.rs --emit=metadata
# Change to this if testing a cargo project:
# touch src/lib.rs
# cargo check
done
tput bel |
Assigning P-medium as discussed as part of the Prioritization Working Group procedure and removing I-prioritize. Also assigning I-nominate so we can try to get eyes on the root cause of the issue. |
Triage: does anyone know if this is still a problem? Note that the latest snippet doesn't compile on latest stable
|
I did some testing, and as best as I can tell the sigsegv stopped on nightly-2020-12-24. This includes: commit[0] 2020-12-22: Auto merge of #80177 - tgnottingham:foreign_defpathhash_registration, r=Aaron1011 My best guess would be that #80262 changed things enough to mask the underlying issue. I tried building latest master from source without PGO, and wasn't able to reproduce. I recommend closing since I don't suspect we'll be able to uncover whatever the issue was, and it doesn't seem to be exhibiting itself anymore. |
Thanks. Yeah, I'm going to close this in favor of a new issue if this is still a problem because of lack of repro for a long time. |
Code
I am not sure what part of
legion
is causing this. I have not encountered this issue for any other crates.Meta
rustc --version --verbose
:Error output
Backtrace
The text was updated successfully, but these errors were encountered: