Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler crashes/ICEs on new aarch64 GHA runners (and/or Azure's Cobalt 100 VMs) #135867

Open
weiznich opened this issue Jan 22, 2025 · 25 comments
Open
Labels
C-bug Category: This is a bug. C-external-bug Category: issue that is caused by bugs in software beyond our control I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ O-AArch64 Armv8-A or later processors in AArch64 mode S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@weiznich
Copy link
Contributor

weiznich commented Jan 22, 2025

Code

git clone https://github.com/sgrif/pq-sys
cd pq-sys
git checkout 4e4f24c8f35abec47927ba20d144b7a8172f1f98
cargo check --no-default-features --features "bundled"

This happened once on a github CI runner, I nevertheless fill this as report as the output asked for it. It might be an hardware issue, as it went away with a rebuild.

CI LOG: https://github.com/sgrif/pq-sys/actions/runs/12903477183/job/35978791282?pr=73#step:14:26

Meta

rustc --version --verbose:

rustc 1.84.0 (9fc6b4312 2025-01-07)
binary: rustc
commit-hash: 9fc6b43126469e3858e2fe86cafb4f0fd5068869
commit-date: 2025-01-07
host: aarch64-unknown-linux-gnu
release: 1.84.0
LLVM version: 19.1.5

Error output

error: rustc interrupted by SIGSEGV, printing backtrace

/home/runner/.rustup/toolchains/stable-aarch64-unknown-linux-gnu/bin/../lib/librustc_driver-bedc4a794a543ce8.so(+0xbf78ec)[0xff0bdc5f78ec]
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xff0be59597e0]

note: we would appreciate a report at https://github.com/rust-lang/rust
help: you can increase rustc's stack size by setting RUST_MIN_STACK=16777216
error: could not compile `vcpkg` (lib)
@weiznich weiznich added C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 22, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jan 22, 2025
@weiznich
Copy link
Contributor Author

Hit another different error: https://github.com/sgrif/pq-sys/actions/runs/12903663719/job/35979328082#step:15:166

range start index 503566387083609839 out of range for slice of length 967484
stack backtrace:
   0:     0xffc7be21186c - std::backtrace_rs::backtrace::libunwind::trace::h41252309c76992b5
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/../../backtrace/src/backtrace/libunwind.rs:116:5
   1:     0xffc7be21186c - std::backtrace_rs::backtrace::trace_unsynchronized::hf225d125cca37a81
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0xffc7be21186c - std::sys::backtrace::_print_fmt::hf329c51b3772454a
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/sys/backtrace.rs:66:9
   3:     0xffc7be21186c - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h00b09a4bf39b2d85
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/sys/backtrace.rs:39:26
   4:     0xffc7be25e62c - core::fmt::rt::Argument::fmt::heddda1cce7046e46
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/fmt/rt.rs:177:76
   5:     0xffc7be25e62c - core::fmt::write::h7ceb4b8480f8c249
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/fmt/mod.rs:1189:21
   6:     0xffc7be2058e0 - std::io::Write::write_fmt::h85dd460e8a217df7
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/io/mod.rs:1884:15
   7:     0xffc7be211720 - std::sys::backtrace::BacktraceLock::print::h9a7ef9a5429d1e7e
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/sys/backtrace.rs:42:9
   8:     0xffc7be213a60 - std::panicking::default_hook::{{closure}}::h0c27709af3f8d4c6
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:268:22
   9:     0xffc7be2138a8 - std::panicking::default_hook::h896b14ac97871c98
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:295:9
  10:     0xffc7b8035e7c - <alloc[b23c587ff69f26b0]::boxed::Box<rustc_driver_impl[233ea3aee64cd688]::install_ice_hook::{closure#0}> as core[2b1f2bbd3b605c]::ops::function::Fn<(&dyn for<'a, 'b> core[2b1f2bbd3b605c]::ops::function::Fn<(&'a std[a32bae77f370052a]::panic::PanicHookInfo<'b>,), Output = ()> + core[2b1f2bbd3b605c]::marker::Sync + core[2b1f2bbd3b605c]::marker::Send, &std[a32bae77f370052a]::panic::PanicHookInfo)>>::call
  11:     0xffc7be2141d0 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::he48809e66f217622
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/alloc/src/boxed.rs:1986:9
  12:     0xffc7be2141d0 - std::panicking::rust_panic_with_hook::h660add0bcc8a80ea
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:809:13
  13:     0xffc7be213f8c - std::panicking::begin_panic_handler::{{closure}}::h83d17f92f6c8c9b6
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:674:13
  14:     0xffc7be211d68 - std::sys::backtrace::__rust_end_short_backtrace::hab5cb706f3ff2287
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/sys/backtrace.rs:170:18
  15:     0xffc7be213c4c - rust_begin_unwind
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:665:5
  16:     0xffc7b7eed6cc - core::panicking::panic_fmt::h8267c22f618f324d
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:76:14
  17:     0xffc7be26c80c - core::slice::index::slice_start_index_len_fail::do_panic::runtime::hcf2d4900b63c7fab
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panic.rs:219:21
  18:     0xffc7b7eedb00 - core::slice::index::slice_start_index_len_fail::do_panic::h7135aa2b2663528e
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/intrinsics/mod.rs:3535:9
  19:     0xffc7b7eedb00 - core::slice::index::slice_start_index_len_fail::he172df34be0ef418
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panic.rs:224:9
  20:     0xffc7bd63e8e4 - <rustc_metadata[bc7dd363cf28a80b]::creader::CrateMetadataRef>::def_key
  21:     0xffc7bc882224 - <rustc_middle[2c6500b4474def4b]::ty::context::TyCtxt>::def_key::<rustc_span[d6938c8799c78ecd]::def_id::DefId>
  22:     0xffc7bc77a1b4 - <rustc_resolve[803f36c130519f4a]::Resolver>::get_module
  23:     0xffc7bc77a034 - <rustc_resolve[803f36c130519f4a]::Resolver>::expect_module
  24:     0xffc7bc83712c - <rustc_resolve[803f36c130519f4a]::build_reduced_graph::BuildReducedGraphVisitor as rustc_ast[ba5aba888c7ba019]::visit::Visitor>::visit_item
  25:     0xffc7bc7e99cc - <rustc_expand[448f8608944c7a44]::expand::AstFragment>::visit_with::<rustc_resolve[803f36c130519f4a]::build_reduced_graph::BuildReducedGraphVisitor>
  26:     0xffc7bc79cd74 - <rustc_resolve[803f36c130519f4a]::Resolver as rustc_expand[448f8608944c7a44]::base::ResolverExpand>::visit_ast_fragment_with_placeholders
  27:     0xffc7bd707048 - <rustc_expand[448f8608944c7a44]::expand::MacroExpander>::collect_invocations
  28:     0xffc7bd705330 - <rustc_expand[448f8608944c7a44]::expand::MacroExpander>::fully_expand_fragment
  29:     0xffc7bd70298c - <rustc_expand[448f8608944c7a44]::expand::MacroExpander>::expand_crate
  30:     0xffc7b81bf2c8 - <rustc_session[8fe50856523cfc81]::session::Session>::time::<rustc_ast[ba5aba888c7ba019]::ast::Crate, rustc_interface[45f6ee680972b70c]::passes::configure_and_expand::{closure#1}>
  31:     0xffc7b822d8cc - rustc_interface[45f6ee680972b70c]::passes::resolver_for_lowering_raw
  32:     0xffc7bcb0086c - rustc_query_impl[b3a5340e321ea8a9]::plumbing::__rust_begin_short_backtrace::<rustc_query_impl[b3a5340e321ea8a9]::query_impl::resolver_for_lowering_raw::dynamic_query::{closure#2}::{closure#0}, rustc_middle[2c6500b4474def4b]::query::erase::Erased<[u8; 16usize]>>
  33:     0xffc7bcb53d70 - <rustc_query_impl[b3a5340e321ea8a9]::query_impl::resolver_for_lowering_raw::dynamic_query::{closure#2} as core[2b1f2bbd3b605c]::ops::function::FnOnce<(rustc_middle[2c6500b4474def4b]::ty::context::TyCtxt, ())>>::call_once
  34:     0xffc7bcd782c4 - rustc_query_system[4379a6dd4de40329]::query::plumbing::try_execute_query::<rustc_query_impl[b3a5340e321ea8a9]::DynamicConfig<rustc_query_system[4379a6dd4de40329]::query::caches::SingleCache<rustc_middle[2c6500b4474def4b]::query::erase::Erased<[u8; 16usize]>>, false, false, false>, rustc_query_impl[b3a5340e321ea8a9]::plumbing::QueryCtxt, false>
  35:     0xffc7bcbffd58 - rustc_query_impl[b3a5340e321ea8a9]::query_impl::resolver_for_lowering_raw::get_query_non_incr::__rust_end_short_backtrace
  36:     0xffc7bdb42454 - <rustc_middle[2c6500b4474def4b]::ty::context::TyCtxt>::resolver_for_lowering
  37:     0xffc7b7febd7c - <rustc_middle[2c6500b4474def4b]::ty::context::GlobalCtxt>::enter::<rustc_driver_impl[233ea3aee64cd688]::run_compiler::{closure#0}::{closure#1}::{closure#3}, &rustc_data_structures[c3ef3f3d456babe4]::steal::Steal<(rustc_middle[2c6500b4474def4b]::ty::ResolverAstLowering, alloc[b23c587ff69f26b0]::sync::Arc<rustc_ast[ba5aba888c7ba019]::ast::Crate>)>>
  38:     0xffc7b7fc3840 - <rustc_interface[45f6ee680972b70c]::interface::Compiler>::enter::<rustc_driver_impl[233ea3aee64cd688]::run_compiler::{closure#0}::{closure#1}, core[2b1f2bbd3b605c]::result::Result<core[2b1f2bbd3b605c]::option::Option<rustc_interface[45f6ee680972b70c]::queries::Linker>, rustc_span[d6938c8799c78ecd]::ErrorGuaranteed>>
  39:     0xffc7b8029100 - rustc_span[d6938c8799c78ecd]::create_session_globals_then::<core[2b1f2bbd3b605c]::result::Result<(), rustc_span[d6938c8799c78ecd]::ErrorGuaranteed>, rustc_interface[45f6ee680972b70c]::util::run_in_thread_with_globals<rustc_interface[45f6ee680972b70c]::util::run_in_thread_pool_with_globals<rustc_interface[45f6ee680972b70c]::interface::run_compiler<core[2b1f2bbd3b605c]::result::Result<(), rustc_span[d6938c8799c78ecd]::ErrorGuaranteed>, rustc_driver_impl[233ea3aee64cd688]::run_compiler::{closure#0}>::{closure#1}, core[2b1f2bbd3b605c]::result::Result<(), rustc_span[d6938c8799c78ecd]::ErrorGuaranteed>>::{closure#0}, core[2b1f2bbd3b605c]::result::Result<(), rustc_span[d6938c8799c78ecd]::ErrorGuaranteed>>::{closure#0}::{closure#0}::{closure#0}>
  40:     0xffc7b8023c04 - std[a32bae77f370052a]::sys::backtrace::__rust_begin_short_backtrace::<rustc_interface[45f6ee680972b70c]::util::run_in_thread_with_globals<rustc_interface[45f6ee680972b70c]::util::run_in_thread_pool_with_globals<rustc_interface[45f6ee680972b70c]::interface::run_compiler<core[2b1f2bbd3b605c]::result::Result<(), rustc_span[d6938c8799c78ecd]::ErrorGuaranteed>, rustc_driver_impl[233ea3aee64cd688]::run_compiler::{closure#0}>::{closure#1}, core[2b1f2bbd3b605c]::result::Result<(), rustc_span[d6938c8799c78ecd]::ErrorGuaranteed>>::{closure#0}, core[2b1f2bbd3b605c]::result::Result<(), rustc_span[d6938c8799c78ecd]::ErrorGuaranteed>>::{closure#0}::{closure#0}, core[2b1f2bbd3b605c]::result::Result<(), rustc_span[d6938c8799c78ecd]::ErrorGuaranteed>>
  41:     0xffc7b80264bc - <<std[a32bae77f370052a]::thread::Builder>::spawn_unchecked_<rustc_interface[45f6ee680972b70c]::util::run_in_thread_with_globals<rustc_interface[45f6ee680972b70c]::util::run_in_thread_pool_with_globals<rustc_interface[45f6ee680972b70c]::interface::run_compiler<core[2b1f2bbd3b605c]::result::Result<(), rustc_span[d6938c8799c78ecd]::ErrorGuaranteed>, rustc_driver_impl[233ea3aee64cd688]::run_compiler::{closure#0}>::{closure#1}, core[2b1f2bbd3b605c]::result::Result<(), rustc_span[d6938c8799c78ecd]::ErrorGuaranteed>>::{closure#0}, core[2b1f2bbd3b605c]::result::Result<(), rustc_span[d6938c8799c78ecd]::ErrorGuaranteed>>::{closure#0}::{closure#0}, core[2b1f2bbd3b605c]::result::Result<(), rustc_span[d6938c8799c78ecd]::ErrorGuaranteed>>::{closure#1} as core[2b1f2bbd3b605c]::ops::function::FnOnce<()>>::call_once::{shim:vtable#0}
  42:     0xffc7be21d94c - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h8f59f5b3c43759d1
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/alloc/src/boxed.rs:1972:9
  43:     0xffc7be21d94c - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h7541037a2f8924a1
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/alloc/src/boxed.rs:1972:9
  44:     0xffc7be21d94c - std::sys::pal::unix::thread::Thread::new::thread_start::hd1bb28d1647e61f1
                               at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/sys/pal/unix/thread.rs:105:17
  45:     0xffc7b72c597c - <unknown>
  46:     0xffc7b732ba4c - <unknown>
  47:                0x0 - <unknown>

error: the compiler unexpectedly panicked. this is a bug.

@jieyouxu jieyouxu added the S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. label Jan 22, 2025
@jieyouxu
Copy link
Member

Hmm, this might be something about incremental

thread 'rustc' panicked at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/compiler/rustc_serialize/src/opaque.rs:269:45:

@messense
Copy link
Contributor

Similar issue here: https://github.com/PyO3/maturin-action/actions/runs/12907444142/job/35991179531

thread 'rustc' panicked at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/compiler/rustc_serialize/src/opaque.rs:269:45:
range start index 56654325204204048 out of range for slice of length 39963722
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/std/src/panicking.rs:665:5
   1: core::panicking::panic_fmt
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panicking.rs:76:14
   2: core::slice::index::slice_start_index_len_fail::do_panic::runtime
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panic.rs:219:21
   3: core::slice::index::slice_start_index_len_fail::do_panic
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/intrinsics/mod.rs:3535:9
   4: core::slice::index::slice_start_index_len_fail
             at /rustc/9fc6b43126469e3858e2fe86cafb4f0fd5068869/library/core/src/panic.rs:224:9
   5: <rustc_metadata::creader::CrateMetadataRef>::item_ident
   6: <rustc_metadata::creader::CStore>::load_macro_untracked
   7: <rustc_resolve::Resolver>::get_macro_by_def_id
   8: <rustc_resolve::Resolver>::try_define
   9: <rustc_resolve::Resolver>::build_reduced_graph_external
  10: <rustc_resolve::Module>::for_each_child::<rustc_resolve::build_reduced_graph::BuildReducedGraphVisitor, <rustc_resolve::build_reduced_graph::BuildReducedGraphVisitor>::process_macro_use_imports::{closure#2}>
  11: <rustc_resolve::build_reduced_graph::BuildReducedGraphVisitor as rustc_ast::visit::Visitor>::visit_item
  12: <rustc_expand::expand::AstFragment>::visit_with::<rustc_resolve::build_reduced_graph::BuildReducedGraphVisitor>
  13: <rustc_resolve::Resolver as rustc_expand::base::ResolverExpand>::visit_ast_fragment_with_placeholders
  14: <rustc_expand::expand::MacroExpander>::collect_invocations
  15: <rustc_expand::expand::MacroExpander>::fully_expand_fragment
  16: <rustc_expand::expand::MacroExpander>::expand_crate
  17: <rustc_session::session::Session>::time::<rustc_ast::ast::Crate, rustc_interface::passes::configure_and_expand::{closure#1}>
  18: rustc_interface::passes::resolver_for_lowering_raw
      [... omitted 2 frames ...]
  19: <rustc_middle::ty::context::TyCtxt>::resolver_for_lowering
  20: <rustc_middle::ty::context::GlobalCtxt>::enter::<rustc_driver_impl::run_compiler::{closure#0}::{closure#1}::{closure#3}, &rustc_data_structures::steal::Steal<(rustc_middle::ty::ResolverAstLowering, alloc::sync::Arc<rustc_ast::ast::Crate>)>>
  21: <rustc_interface::interface::Compiler>::enter::<rustc_driver_impl::run_compiler::{closure#0}::{closure#1}, core::result::Result<core::option::Option<rustc_interface::queries::Linker>, rustc_span::ErrorGuaranteed>>
  22: rustc_span::create_session_globals_then::<core::result::Result<(), rustc_span::ErrorGuaranteed>, rustc_interface::util::run_in_thread_with_globals<rustc_interface::util::run_in_thread_pool_with_globals<rustc_interface::interface::run_compiler<core::result::Result<(), rustc_span::ErrorGuaranteed>, rustc_driver_impl::run_compiler::{closure#0}>::{closure#1}, core::result::Result<(), rustc_span::ErrorGuaranteed>>::{closure#0}, core::result::Result<(), rustc_span::ErrorGuaranteed>>::{closure#0}::{closure#0}::{closure#0}>
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

error: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md

note: rustc 1.84.0 (9fc6b4312 2025-01-07) running on aarch64-unknown-linux-gnu

note: compiler flags: --crate-type lib -C embed-bitcode=no -C debuginfo=2

note: some of the compiler flags provided by cargo are hidden

query stack during panic:
#0 [resolver_for_lowering_raw] getting the resolver for lowering
end of query stack

@saethlin
Copy link
Member

Hmm, this might be something about incremental

I'm quite sure it isn't. We print the compiler flags at the bottom of the ICE message and there is no -C incremental.

@saethlin
Copy link
Member

In addition, the slice index here is telling. Very often we get these ICEs because the slice that's being indexed into is being decoded from an artifact that should have been invalidated but wasn't, or because the slice was truncated. But here the index is just completely bogus. Successfully accessing a (non-ZST) slice index of 503566387083609839 would imply at least a 447 PB allocation.

@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jan 23, 2025
@EliahKagan
Copy link

EliahKagan commented Jan 23, 2025

Correction: I used the wrong beta here, which was much older than intended. Oddly, that seems only to have made a small difference. See #135867 (comment) below, and GitoxideLabs/gitoxide#1790 (comment), for details.


I'm not sure how useful this will be, since based on #135867 (comment) it looks like the problem might already be understood, but I figured I'd report this anyway, in case it provides useful information about the environments in which the problem does and does not occur.

This happened in a ubuntu-24.04-arm job in gitoxide as well, described at GitoxideLabs/gitoxide#1790. Both the SIGSEGV (which sometimes had accompanying SIGBUS errors) and the "range start index" problems occurred repeatedly, but not always, when running rustc on that runner.

There was also one occurrence of "free(): invalid next size (fast)":

Run rustup toolchain install stable --profile minimal --no-self-update
  rustup toolchain install stable --profile minimal --no-self-update
  shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
  env:
    CARGO_HOME: /home/runner/.cargo
free(): invalid next size (fast)
/home/runner/work/_temp/30098f9b-bd1e-4e34-af34-[46](https://github.com/EliahKagan/gitoxide/actions/runs/12906046883/job/35986582074#step:3:51)5eb6bb47f8.sh: line 1:  5597 Aborted                 (core dumped) rustup toolchain install stable --profile minimal --no-self-update

actions/checkout also failed with no apparent error message a few times. This was much rarer than either the SIGSEGV or "range start index" error. I didn't get any details from it, and it may not be a memory error, but even if so, actions/checkout does not use Rust.

I was not able to reproduce any of the errors on an Azure cloud instance, also running rustc 1.84.0 from the stable channel of the aarch64-unknown-linux-gnu toolchain. I tried running cargo nextest run --workspace --no-fail-fast in the gitoxide repository on two ARM64 systems, one with Ubuntu 24.04 LTS and another with Ubuntu 24.10.

In addition, in GitHub Actions, as described along with some more details in GitoxideLabs/gitoxide#1790 (comment), the errors never seem to occur on the ubuntu-22.04-arm runner, which was made available together with the ubuntu-24.04-arm runner. All these errors were observed occasionally on ubuntu-24.04-arm and not at all in the same number of runs on ubuntu-22.04-arm, in two experiments:

  1. GitoxideLabs/gitoxide@e71b0cf - workflow run details
  2. GitoxideLabs/gitoxide@5a71963 - workflow run details

Experiment 1 tested ubuntu-22.04-arm and ubuntu-24.04-arm; with stable, beta, and nightly Rust channels; with and without RUST_MIN_STACK=16777216. Experiment 2 tested ubuntu-22.04-arm and ubuntu-24.04-arm, with stable and beta Rust channels, with more runs. Both included a number of identical runs, so as to try and distinguish correlation from happenstance: experiment 1 ran each combination 16 times, and experiment 2 ran each combination 64 times.

However, while both showed the problem only ever to occur on ubuntu-24.04-arm and not ubuntu-22.04-arm, strongly suggesting that the problem may be specific to the runner images or some part of the CI infrastructure that hosts the runners, they also showed that only the stable builds seemed affected. In both experiments, an equal number of runs with beta channel Rust toolchains were performed, and none of these problems occurred in them.

Experiment 1 also included nightly channel runs, and these problems did not occur, but other problems sometimes caused failures, and I did not continue testing the nightly channel in the larger experiment 2.

Therefore, I don't know what's going on.

  • Switching from ubuntu-24.04-arm to arm-22.04-arm seems like it completely works around the problem.
  • But so does switching from the stable to the beta channel, even when continuing to use ubuntu-24.04-arm.

(The ubuntu-24.04-arm runner also has an issue about a SIGILL crash, actions/partner-runner-images#36, with further details at this discussion. That is different from anything described here or that I observed, but some cases of SIGILL can occur due to memory errors or, I think, due to compiler bugs that also lead to memory errors.)

@saethlin
Copy link
Member

saethlin commented Jan 23, 2025

some cases of SIGILL can occur due to memory errors or, I think, due to compiler bugs that also lead to memory errors.

Compiler bugs often result in programs with absurd execution behavior, SIGILL just means it tried to execute at an offset that doesn't form a valid instruction or was a ud2, and SIGSEGV just means it tried to access at memory that is forbidden or not mapped. There's usually no sense in reading into how exactly a miscompiled program crashes. A sufficiently mangled program will encounter invalid instructions and invalid memory access eventually.

it looks like the problem might already be understood,

I don't understand what's going on here, but something is very broken in a rather novel way. Thank you for your report, it was very informative.

I think there are basically two possibilities, either the stable toolchain for linux-aarch64 is incredibly broken and somehow people are noticing at exactly the same time as GitHub is making free linux-aarch64 runners available... or the newly-available runners are subtly buggy. At the moment, I think it's more likely the runners are buggy. If that's the case, GitHub people are probably scrambling to do something about these reports, so for now I'd just wait. The runners are a public beta, finding bugs is expected.

@saethlin saethlin added I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. C-external-bug Category: issue that is caused by bugs in software beyond our control labels Jan 23, 2025
@workingjubilee
Copy link
Member

However, while both showed the problem only ever to occur on ubuntu-24.04-arm and not ubuntu-22.04-arm, strongly suggesting that the problem may be specific to the runner images or some part of the CI infrastructure that hosts the runners, they also showed that only the stable builds seemed affected. In both experiments, an equal number of runs with beta channel Rust toolchains were performed, and none of these problems occurred in them.

Experiment 1 also included nightly channel runs, and these problems did not occur, but other problems sometimes caused failures, and I did not continue testing the nightly channel in the larger experiment 2.

I wonder if something changed in how we build the aarch64 compiler...

@saethlin saethlin changed the title Compiler crashes with SIGSEGV Compiler crashes with SIGSEGV on aarch64-unknown-linux-gnu Jan 23, 2025
@Kobzol
Copy link
Contributor

Kobzol commented Jan 23, 2025

Yes, we started optimizing it with LTO and PGO. But that's not on stable yet.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Jan 23, 2025
When using `dtolnay/rust-toolchain` with the `toolchain` key to
specify a channel, the action version should be given as `@master`.
But I accidentally kept it at `@stable`! This caused `beta` and
`nightly` to refer to the most recent beta and nightly builds
*prior* to the current stable version. That made the conclucions
about beta and nightly builds inaccurate. This rectifies that
error and repeats the experiment.

See e71b0cf (1f3f6b5), GitoxideLabs#1790, and rust-lang/rust#135867 for context.

(I made this mistake in both experiment 1 and experiment 2, having
wrongly thought I'd changed `@stable` to `@master` for experiment
1. This commit just repeats experiment 1, but experiment 2 should
also be repeated for the same reason.)
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Jan 23, 2025
As noted in the preceding commit, when I ran experiments 1 and 2
the first time, I accidentally used `dtolnay/rust-toolchain@stable`
instead of `dtolnay/rust-toolchain@master`, even though the latter
is needed to use current values of the `toolchain` key rather than
the builds they referred to at the time the most recent stable
build was updated. The preceding commit redid experiment 1 with
that fixed.

This commit redoes experiment 2 with te same fix.

See 5a71963 (1b3e2cd), GitoxideLabs#1790, and rust-lang/rust#135867 for context.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Jan 23, 2025
When using `dtolnay/rust-toolchain` with the `toolchain` key to
specify a channel, the action version should be given as `@master`.
But I accidentally kept it at `@stable`! This caused `beta` and
`nightly` to refer to the most recent beta and nightly builds
*prior* to the current stable version. That made the conclucions
about beta and nightly builds inaccurate. This rectifies that
error and repeats the experiment.

See e71b0cf (1f3f6b5), GitoxideLabs#1790, and rust-lang/rust#135867 for context.

(I made this mistake in both experiment 1 and experiment 2, having
wrongly thought I'd changed `@stable` to `@master` for experiment
1. This commit just repeats experiment 1, but experiment 2 should
also be repeated for the same reason.)
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Jan 23, 2025
As noted in the preceding commit, when I ran experiments 1 and 2
the first time, I accidentally used `dtolnay/rust-toolchain@stable`
instead of `dtolnay/rust-toolchain@master`, even though the latter
is needed to use current values of the `toolchain` key rather than
the builds they referred to at the time the most recent stable
build was updated. The preceding commit redid experiment 1 with
that fixed.

This commit redoes experiment 2 with te same fix.

See 5a71963 (1b3e2cd), GitoxideLabs#1790, and rust-lang/rust#135867 for context.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Jan 23, 2025
This varies:

- `ubuntu-22.04-arm` vs. `ubuntu-24.04.arm` GHA runner.
- Installing Rust via the `rust-toolchain` action vs. with curl.sh.
- Installing the stable vs. beta Rust toolchain.
- Installing nextest via `install-action` quickinstall/binstall.

*If* this also confirms that the only fully consistent factor in
whether errors happen is `ubuntu-22.04-arm` vs. `ubuntu-24.04.arm`,
then that will make it clearer that the problem is likely specific
to the `ubuntu-24.04.arm` runner.

See GitoxideLabs#1790 and rust-lang/rust#135867 for context.
@edmorley
Copy link
Contributor

edmorley commented Jan 23, 2025

We've been using ARM on GHA successfully for several months using their "larger runners" feature (where GitHub still manages the runners for you, unlike self-hosted, but allows you to customise the specs/architecture/... etc).

Today I switched to the new ubuntu-24.04-arm runners and immediately encountered this ICE:
#135939

And shortly after a colleague encountered this crash after they switched the ARM job of their repo from the larger ARM runner to the public ARM runner:

$ rustup update
free(): invalid next size (fast)
/home/runner/work/_temp/93acf597-1bda-46c0-942b-47faf8d8f6f3.sh: line 1:  3144 Aborted                 (core dumped) rustup update
Error: Process completed with exit code 134.

(https://github.com/heroku/buildpacks-dotnet/actions/runs/12937396345/job/36085223385?pr=185#step:4:7)

This would suggest that there's a difference in image or machine type between GitHub's larger runners ARM offering and the new public runner ARM offering, that's the cause of the ICE/crash.

In particular, I've just found out from actions/partner-runner-images#36 (comment) that the CPU type has changed:

The ubuntu-24.04-arm image uses the Cobalt 100 processor (Neoverse N2), as described in the ChangeLog. The custom runner pool used the previous generation of hardware (Neoverse N1).

...so it seems that this could be an issue specific to the ARM Cobalt 100 / Neoverse N2 CPU?

driftluo pushed a commit to driftluo/crossbeam that referenced this issue Jan 26, 2025
@EliahKagan
Copy link

Does this also happen with an older toolchain, e.g. 1.81?

I'm not sure, because when I test now, it seems to happen less often overall, though that may very well just be due to chance.

Testing 1.81, 1.82, 1.83, and 1.84, I saw it once on 1.83 and not on versions earlier than that. (The other failure on separate 1.83 run was in the runner software itself.)

This was at EliahKagan/gitoxide@cca8f00 (workflow run details). A subsequent experiment at EliahKagan/gitoxide@844c6bd (workflow run details) is likewise inconclusive.

@workingjubilee workingjubilee added the O-AArch64 Armv8-A or later processors in AArch64 mode label Jan 28, 2025
@ehuss
Copy link
Contributor

ehuss commented Jan 28, 2025

We should check what the exact feature differences is between these CPUs, to start.

The N1 runner's CPU info is:

BogoMIPS   : 50.00
Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer    : 0x41
CPU architecture: 8
CPU variant    : 0x3
CPU part   : 0xd0c
CPU revision   : 1

The N2 runner is:

BogoMIPS	: 2000.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm uscat ilrcpc flagm ssbs sb paca pacg dcpodp sve2 sveaes svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd49
CPU revision	: 0

taiki-e added a commit to taiki-e/upload-rust-binary-action that referenced this issue Jan 28, 2025
seungukshin pushed a commit to seungukshin/guest-components that referenced this issue Jan 29, 2025
rust on aarch64-unknown-linux-gnu has a bug which faces SIGSEGV
intermittently (rust-lang/rust#135867)
with 1.83.0 or later.
rust 1.82.0 will be used until the above issue is resolved.

Signed-off-by: Seunguk Shin <seunguk.shin@arm.com>
@saethlin saethlin marked this as a duplicate of #136260 Jan 29, 2025
@saethlin saethlin pinned this issue Jan 29, 2025
@saethlin saethlin marked this as a duplicate of #135939 Jan 29, 2025
@saethlin saethlin marked this as a duplicate of #136162 Jan 29, 2025
@saethlin saethlin marked this as a duplicate of #136267 Jan 30, 2025
@saethlin saethlin changed the title Compiler crashes with SIGSEGV on aarch64-unknown-linux-gnu Compiler crashes/ICEs on aarch64-unknown-linux-gnu Jan 30, 2025
@saethlin saethlin changed the title Compiler crashes/ICEs on aarch64-unknown-linux-gnu Compiler crashes/ICEs on new aarch64 GHA runners Jan 30, 2025
@edmorley
Copy link
Contributor

I tried opening a GitHub support ticket to raise awareness of this issue, however, was directed back to the public discussion group (where we've not yet had a reply/acknowledgement of the issue from GitHub) due to the ARM runners being in preview.

Could everyone use the discussion group upvote arrow on this thread to raise it's visibility?
https://github.com/orgs/community/discussions/148648#discussioncomment-11890717

@jamesx-improving
Copy link

Upvoted in the discussion group, as my issue was closed as a duplication of this one.

Also, can confirm that for me, as a workaround, switching to ubuntu-22.04-arm (from 24.04) seems fixed the issue.

@lqd lqd marked this as a duplicate of #136342 Jan 31, 2025
@DenuxPlays
Copy link

DenuxPlays commented Feb 1, 2025

We are also getting this issue on non-arm runners but when Docker is emulating linux/arm64 builds.
Locally (running Fedora 41 or Ubuntu 22) it works even with Docker QEMU.

Running Rust 1.84.1

Note:

  • downgrading to ubuntu-22.04 it works!

@purplesyringa
Copy link
Contributor

@DenuxPlays Let me clarify: building financrr-app on x86-64 GitHub runners while simulating arm64 with Docker QEMU occasionally leads to SIGSEGVs, but doing the same thing on local conventional x86-64 hardware doesn't seem to trigger SIGSEGV under realistic conditions. Did I get that right?

@DenuxPlays
Copy link

Yes

It Happens since updating to Rust 1.84.1 (1.84.0 works)

@DenuxPlays
Copy link

Or maybe since ubuntu 24 is used.

I am not sure which one caused it

@saethlin
Copy link
Member

saethlin commented Feb 1, 2025

I provisioned myself a D8plsv6 VM on Azure (so that's a Cobalt 100 CPU, the same that these new GHA runners are using) running Ubuntu 24.04 and I got one of these crashes by running the compiler's test suite. x test got all the way to the incremental suite before running into:

thread 'rustc' panicked at /home/ubuntu/rust/compiler/rustc_serialize/src/opaque.rs:269:45:
range start index 1934374219505146277 out of range for slice of length 8016303

@saethlin saethlin changed the title Compiler crashes/ICEs on new aarch64 GHA runners Compiler crashes/ICEs on new aarch64 GHA runners (and/or Azure's Cobalt 100 VMs) Feb 1, 2025
@saethlin
Copy link
Member

saethlin commented Feb 1, 2025

Oh I see, I'm a fool and I didn't look at the numerous experiments that @EliahKagan has already done and basically pointed us at already.

It looks to me like you provoked >20 crashes, mostly on 1.84 but one or two on 1.83, and 100% of the crashes are on Ubuntu 24, even though you did an identical number of runs on Ubuntu 22. Is that right?

I do see some CI failures of yours on Ubuntu 22, but they all look like

        FAIL [   5.812s] gix-macros::macros momo::ux

as opposed to the smattering of segfaults, ICEs, and heap corruption that is happening on Ubuntu 24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. C-external-bug Category: issue that is caused by bugs in software beyond our control I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ O-AArch64 Armv8-A or later processors in AArch64 mode S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests