Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-compiling Rust to s390x yields a faulty toolchain #80810

Closed
Jakob-Naucke opened this issue Jan 8, 2021 · 33 comments
Closed

Cross-compiling Rust to s390x yields a faulty toolchain #80810

Jakob-Naucke opened this issue Jan 8, 2021 · 33 comments
Assignees
Labels
C-bug Category: This is a bug. O-SystemZ Target: SystemZ processors (s390x) P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Milestone

Comments

@Jakob-Naucke
Copy link

Code

I tried this code on s390x:

trait T1: {}
trait T2: {
    type Foo: T1; 
}
trait T3<T>: {
    fn f(&self) -> T::Foo where T: T2; 
}
fn main() {}

(originally discovered in and derived to make an MWE from gimli) with all of the following:

I expected to see this happen: a successful compiler run, as is on x86_64

Instead, this happened:

error[E0220]: associated type `Foo` not found for `T` 
 --> demo.rs:6:23
  |
6 |     fn f(&self) -> T::Foo where T: T2; 
  |                       ^^^ associated type `Foo` not found

error: aborting due to previous error

For more information about this error, try `rustc --explain E0220`.

Furthermore, when building Rust with x.py on s390x after the switch to the aforementioned 2020-12-30 stage0 (introduced in fe031180), an ICE is claimed:

Output

$ RUST_BACKTRACE=1 ./x.py build
Updating only changed submodules
Submodules updated in 0.02 seconds
    Finished dev [unoptimized + debuginfo] target(s) in 0.13s
Building stage0 std artifacts (s390x-unknown-linux-gnu -> s390x-unknown-linux-gnu)
   Compiling core v0.0.0 (/home/fedora/rust/library/core)
error: internal compiler error: compiler/rustc_privacy/src/lib.rs:500:25: item Item { ident: #0, hir_id: HirId { owner: DefId(0:472 ~ core[5990]::num::flt2dec::{misc#0}), local_id: 0 }, attrs: [], kind: Use(Path { span: library/core/src/num/flt2dec/mod.rs:125:9: 125:70 (#0), res: Err, segments: [PathSegment { ident: self#0, hir_id: Some(HirId { owner: DefId(0:472 ~ core[5990]::num::flt2dec::{misc#0}), local_id: 1 }), res: Some(Err), args: None, infer_args: false }, PathSegment { ident: decoder#0, hir_id: Some(HirId { owner: DefId(0:472 ~ core[5990]::num::flt2dec::{misc#0}), local_id: 2 }), res: Some(Def(Mod, DefId(0:478 ~ core[5990]::num::flt2dec::decoder))), args: None, infer_args: false }] }, ListStem), vis: Spanned { node: Inherited, span: library/core/src/num/flt2dec/mod.rs:125:9: 125:9 (#0) }, span: library/core/src/num/flt2dec/mod.rs:125:1: 125:71 (#0) } with DefKind Struct

thread 'rustc' panicked at 'Box<Any>', compiler/rustc_errors/src/lib.rs:958:9
stack backtrace:
   0: std::panicking::begin_panic
   1: rustc_errors::HandlerInner::bug
   2: rustc_errors::Handler::bug
   3: rustc_middle::util::bug::opt_span_bug_fmt::{{closure}}
   4: rustc_middle::ty::context::tls::with_opt::{{closure}}
   5: rustc_middle::ty::context::tls::with_opt
   6: rustc_middle::util::bug::opt_span_bug_fmt
   7: rustc_middle::util::bug::bug_fmt
   8: rustc_privacy::EmbargoVisitor::update_macro_reachable_def
   9: rustc_privacy::EmbargoVisitor::update_macro_reachable
  10: rustc_privacy::EmbargoVisitor::update_macro_reachable_def
  11: rustc_privacy::EmbargoVisitor::update_macro_reachable
  12: rustc_privacy::EmbargoVisitor::update_macro_reachable_def
  13: rustc_privacy::EmbargoVisitor::update_macro_reachable
  14: <rustc_privacy::EmbargoVisitor as rustc_hir::intravisit::Visitor>::visit_macro_def
  15: rustc_privacy::privacy_access_levels
  16: rustc_middle::ty::query::<impl rustc_query_system::query::config::QueryAccessors<rustc_middle::ty::context::TyCtxt> for rustc_middle::ty::query::queries::privacy_access_levels>::compute
  17: rustc_query_system::dep_graph::graph::DepGraph<K>::with_eval_always_task
  18: rustc_data_structures::stack::ensure_sufficient_stack
  19: rustc_query_system::query::plumbing::get_query_impl
  20: rustc_query_system::query::plumbing::ensure_query_impl
  21: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
  22: __rust_try.llvm.16454507701973609965
  23: rustc_session::utils::<impl rustc_session::session::Session>::time
  24: rustc_interface::passes::analysis
  25: rustc_middle::ty::query::<impl rustc_query_system::query::config::QueryAccessors<rustc_middle::ty::context::TyCtxt> for rustc_middle::ty::query::queries::analysis>::compute
  26: rustc_query_system::dep_graph::graph::DepGraph<K>::with_eval_always_task
  27: rustc_data_structures::stack::ensure_sufficient_stack
  28: rustc_query_system::query::plumbing::get_query_impl
  29: rustc_interface::queries::<impl rustc_interface::interface::Compiler>::enter
  30: rustc_span::with_source_map
  31: rustc_interface::interface::create_compiler_and_run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md

note: rustc 1.50.0-beta.1 (05b602367 2020-12-29) running on s390x-unknown-linux-gnu

note: compiler flags: -Z macro-backtrace -Z binary-dep-depinfo -Z force-unstable-if-unmarked -C opt-level=3 -C embed-bitcode=no -C debuginfo=0 -C link-args=-Wl,-rpath,$ORIGIN/../lib -C prefer-dynamic --crate-type lib

note: some of the compiler flags provided by cargo are hidden

query stack during panic:
#0 [privacy_access_levels] privacy access levels
#1 [analysis] running analysis passes on this crate
end of query stack
error: aborting due to previous error

error: could not compile `core`

To learn more, run the command again with --verbose.
command did not execute successfully: "/home/fedora/rust/build/s390x-unknown-linux-gnu/stage0/bin/cargo" "build" "--target" "s390x-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "4" "--release" "--features" "panic-unwind backtrace compiler-builtins-c" "--manifest-path" "/home/fedora/rust/library/test/Cargo.toml" "--message-format" "json-render-diagnostics"
expected success, got: exit code: 101
failed to run: /home/fedora/rust/build/bootstrap/debug/bootstrap build
Build completed unsuccessfully in 0:00:06

Rust compiled on s390x natively (immediately prior to that introduction) does not have this issue.

Since the Rust dists are built through cross-compilation for Tier 2 architectures, I believe that there is a problem specific to the cross-compilation with an s390x target.

Versions it worked on

It most recently worked on:

Versions with regression

e.g.

$ rustc --verbose --version
rustc 1.50.0-beta.5 (ff5998292 2021-01-05)
binary: rustc
commit-hash: ff59982926d98c8508008f0559f8a055260ac05e
commit-date: 2021-01-05
host: s390x-unknown-linux-gnu
release: 1.50.0-beta.5

Backtrace

no additional info for code example, ICE output had backtrace enabled

@rustbot modify labels: +regression-from-stable-to-beta -regression-untriaged

@rustbot rustbot added regression-from-stable-to-beta Performance or correctness regression from stable to beta. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jan 8, 2021
@rylev
Copy link
Member

rylev commented Jan 8, 2021

Assigning P-high as discussed as part of the Prioritization Working Group procedure and removing I-prioritize.

@rylev rylev added P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jan 8, 2021
@cuviper
Copy link
Member

cuviper commented Jan 8, 2021

I believe that there is a problem specific to the cross-compilation with an s390x target.

This has come up before when some part of the crate metadata is accidentally written in native-endian order -- so x86_64 rustc writes some little-endian data in the standard library rlibs, and s390x rustc reads that as big-endian.

@cuviper
Copy link
Member

cuviper commented Jan 8, 2021

I can reproduce this on RHEL 7 s390x, but not ppc64, so it would seem to be something other than endian order.
I will try to bisect this.

@camelid camelid added the O-SystemZ Target: SystemZ processors (s390x) label Jan 8, 2021
@camelid
Copy link
Member

camelid commented Jan 8, 2021

I believe that there is a problem specific to the cross-compilation with an s390x target.

This has come up before when some part of the crate metadata is accidentally written in native-endian order -- so x86_64 rustc writes some little-endian data in the standard library rlibs, and s390x rustc reads that as big-endian.

I know you said based on testing it doesn't seem to be endian order that's causing this issue, but in general can't we just standardize on either little-endian or big-endian somehow?

@cuviper
Copy link
Member

cuviper commented Jan 8, 2021

searched nightlies: from nightly-2020-10-01 to nightly-2021-01-08
regressed nightly: nightly-2020-12-04
searched commits: from f4db9ff to 5be3f9f
regressed commit: b4def89

bisected with cargo-bisect-rustc v0.6.0

Host triple: s390x-unknown-linux-gnu
Reproduce with:

cargo bisect-rustc --start 2020-10-01 --end 2021-01-08 -- check

That's the merge for #79637, a revert, which doesn't really make sense to me...
cc @spastorino @nikomatsakis @Mark-Simulacrum

@camelid
Copy link
Member

camelid commented Jan 8, 2021

Since this may be an rlib-related issue, did you have cargo-bisect-rustc run clean in between each toolchain run? Not sure if that could be the issue.

@cuviper
Copy link
Member

cuviper commented Jan 8, 2021

@camelid

I know you said based on testing it doesn't seem to be endian order that's causing this issue, but in general can't we just standardize on either little-endian or big-endian somehow?

I think we do normalize to/from little endian, but it's possible something was missed.

@cuviper
Copy link
Member

cuviper commented Jan 8, 2021

did you have cargo-bisect-rustc run clean in between each toolchain run?

Doesn't it use a new target dir by default? That seems implied by the option to --preserve-target anyway.

@camelid
Copy link
Member

camelid commented Jan 8, 2021

I think you're right – I forgot about that :)

@spastorino
Copy link
Member

spastorino commented Jan 11, 2021

searched nightlies: from nightly-2020-10-01 to nightly-2021-01-08
regressed nightly: nightly-2020-12-04
searched commits: from f4db9ff to 5be3f9f
regressed commit: b4def89
bisected with cargo-bisect-rustc v0.6.0

That's the merge for #79637, a revert, which doesn't really make sense to me...
cc @spastorino @nikomatsakis @Mark-Simulacrum

Sorry, I don't have time right now to check this properly but a quick answer I can give you now is:
Could be that this was "tested" between my original PR was merged and the revert was merged and that's the reason why this "regressed"?. If so, it's not really a regression, my original PR fixed a very old issue and this revert, reverted it because it uncovered a different bug. In that case, #80732 will fix the issue again, maybe try the compiler generated for that PR and see what happens.

@pnkfelix pnkfelix self-assigned this Jan 14, 2021
@Mark-Simulacrum Mark-Simulacrum added this to the 1.50.0 milestone Jan 14, 2021
@spastorino
Copy link
Member

@Jakob-Naucke @cuviper is it easy for you to test against #80732?, we did a try build so you can install the compiler with that PR included using rustup-toolchain-install-master.

@Jakob-Naucke
Copy link
Author

@spastorino I'm afraid try does not generate an s390x target:

downloading <https://ci-artifacts.rust-lang.org/rustc-builds/c963187c6f959417cbb13a33e9eaea4607696fc4/rustc-nightly-s390x-unknown-linux-gnu.tar.xz>...
error: missing component `rustc` on toolchain `c963187c6f959417cbb13a33e9eaea4607696fc4` on channel `nightly` for target `s390x-unknown-linux-gnu`

(but x86_64 works)
I can try it with my x86_64 and s390x machines anyways.

@Jakob-Naucke
Copy link
Author

@spastorino yes, that's working. Built with DEPLOY=1 src/ci/docker/run.sh dist-s390x-linux on x86_64:

$ rustc --version --verbose
rustc 1.51.0-nightly (455a0e1d9 2021-01-05)
binary: rustc
commit-hash: 455a0e1d91d3f56c4752a9f035e3622c614b7240
commit-date: 2021-01-05
host: s390x-unknown-linux-gnu
release: 1.51.0-nightly
$ cat demo.rs
trait T1: {}
trait T2: {
        type Foo: T1; 
}
trait T3<T>: {
        fn f(&self) -> T::Foo where T: T2; 
}
fn main() {}
$ rustc demo.rs && echo "no complaints"
no complaints

@cuviper
Copy link
Member

cuviper commented Jan 15, 2021

I do worry that we're still missing a root cause, because it doesn't look like that code should behave differently for cross compiling, let alone s390x in particular. That could be a totally unrelated codegen bug though, which just happens to get tickled the wrong way by this change.

@apiraino apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 20, 2021
@apiraino
Copy link
Contributor

@Jakob-Naucke checkin in on this issue. Do you need a feedback? Do we have enough info to assess a context? I'm trying to figure out the status of this issue and the next steps.

thanks!

@Jakob-Naucke
Copy link
Author

@apiraino By context, do you mean a category label? I'd say that this is a bug. There is a PR that fixes this, but it's blocked at the moment since it possibly breaks salsa, and @cuviper rightfully noted that it might not fix the root issue. As for feedback -- I'm not sure because I'm not suggesting a change here? I'm sorry, I'm not proficient in rust-lang development :) It would be good if current beta didn't become stable as-is.

@cuviper
Copy link
Member

cuviper commented Jan 28, 2021

It would be good if current beta didn't become stable as-is.

Even if it lands soon, #80732 isn't nominated for a beta backport, and it looks like it may be too invasive for that.

I wonder if you can run a dist-s390x-linux build from a native Z host? I've never tried this, but on the surface it seems like it could work -- I don't see why that docker build would technically require the host to be x86_64. There's a docker_dir in src/ci/docker/run.sh that's based on uname, but changing that to force host-x86_64 might work.

So if that build works with otherwise identical toolchain bits, then maybe we can look for binary differences with the cross-compiled build. If that native build fails, maybe we can find a difference from your "normal" native toolchain.

@cuviper cuviper added the C-bug Category: This is a bug. label Jan 29, 2021
@Jakob-Naucke
Copy link
Author

Using host-x86_64 for docker_dir instead of host-$(uname -m) like you said, the pass-2 compiler install fails:

Output

[INFO ]  Installing pass-2 core C gcc compiler
[ERROR]    checking for suffix of executables... configure: error: in `/tmp/build/.build/s390x-ibm-linux-gnu/build/build-cc-gcc-core-pass-2/s390x-ibm-linux-gnu/libgcc':
[ERROR]    configure: error: cannot compute suffix of object files: cannot compile
[ERROR]    checking whether it is safe to define __EXTENSIONS__... make[1]: *** [configure-target-libgcc] Error 1
[ERROR]   
[ERROR]  >>  
[ERROR]  >>  Build failed in step 'Installing pass-2 core C gcc compiler'
[ERROR]  >>        called in step '(top-level)'
[ERROR]  >>  
[ERROR]  >>  Error happened in: CT_DoExecLog[scripts/functions@257]
[ERROR]  >>        called from: do_gcc_core_backend[scripts/build/cc/100-gcc.sh@537]
[ERROR]  >>        called from: do_gcc_core_pass_2[scripts/build/cc/100-gcc.sh@160]
[ERROR]  >>        called from: do_cc_core_pass_2[scripts/build/cc.sh@42]
[ERROR]  >>        called from: main[scripts/crosstool-NG.sh@646]
[ERROR]  >>  
[ERROR]  >>  For more info on this error, look at the file: 'build.log'
[ERROR]  >>  There is a list of known issues, some with workarounds, in: 
[ERROR]  >>      '/usr/local/share/doc/crosstool-ng//B - Known issues.txt'

I initially thought I was running into a known crosstool-NG issue under the version 1.22 that is used.
Since that was fixed in a later version, I tried using the crosstool-ng-1.24.sh script.
If you're doing this, you'll want docker build --ulimit nofiles=2048 (crosstool-NG will try to increase it which you can't do in Docker)
You'll also want to run ct-ng upgradeconfig before ct-ng build (see https://github.com/crosstool-ng/crosstool-ng/issues/913)
However, there was no change in output (save for the slightly modified error reporting in the newer version).

It turns out that the file /tmp/build/.build/s390x-ibm-linux-gnu/build/build-cc-gcc-core-pass-2/gcc/cc1 is not generated, which leads to this in libgcc's config.log:

configure:3474:  /tmp/build/.build/s390x-ibm-linux-gnu/build/build-cc-gcc-core-pass-2/./gcc/xgcc -B/tmp/build/.build/s390x-ibm-linux-gnu/build/build-cc-gcc-core-pass-2/./gcc/ -B/tmp/build/.build/s390x-ibm-linux-gnu/buildtools/s390x-ibm-linux-gnu/bin/ -B/tmp/build/.build/s390x-ibm-linux-gnu/buildtools/s390x-ibm-linux-gnu/lib/ -isystem /tmp/build/.build/s390x-ibm-linux-gnu/buildtools/s390x-ibm-linux-gnu/include -isystem /tmp/build/.build/s390x-ibm-linux-gnu/buildtools/s390x-ibm-linux-gnu/sys-include    -o conftest -O2 -g -I/tmp/build/.build/s390x-ibm-linux-gnu/buildtools/include   -g -Os   conftest.c  >&5
xgcc: error trying to exec 'cc1': execvp: No such file or directory

This file is indeed generated when building on x86_64 for s390x. When building for x86_64 on x86_64, the file .build/x86_64-unknown-linux-gnu/build/build-cc-gcc-core-pass-2/gcc/cc1 is also generated, i.e. it doesn't look like running crosstool-ng for native architecture won't work at all. @cuviper WDYT? Is it worth it to investigate further and try to get the "cross-compile" running on s390x?

@cuviper
Copy link
Member

cuviper commented Jan 29, 2021

I literally just got to that point of missing cc1 too. Unfortunately, I don't know why that would be -- I haven't delved that deeply into crosstool-ng and gcc bootstrapping before.

Maybe we can try just a plain native build of GCC 5.2.0 and see how that compares. The C/C++ toolchain might not matter much anyway, since the problem we're seeing is in the type system, firmly in Rust's domain. So I guess a potential codegen issue would just be in LLVM, less likely to be a two-level miscompilation of LLVM causing miscompilation of Rust.

Sorry for all the guesswork -- I'm just trying to find traction on the underlying issue. I think I will next try a detailed comparison of running the native beta vs. cross-compiled beta, regardless of the system toolchain difference.

@apiraino apiraino added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Feb 11, 2021
@cuviper
Copy link
Member

cuviper commented Feb 11, 2021

Well, I just encountered this on a native build of the complete toolchain for s390x:
https://koji.fedoraproject.org/koji/taskinfo?taskID=61725573
(The other arches in the parent task all completed successfully.)

Building stage2 tool rustfmt (s390x-unknown-linux-gnu)
   Compiling cc v1.0.60
   Compiling either v1.6.0
error[E0220]: associated type `Item` not found for `L`
   --> /builddir/build/BUILD/rustc-1.50.0-src/vendor/either/src/lib.rs:394:35
    |
394 |         R: IntoIterator<Item = L::Item>,
    |                                   ^^^^ associated type `Item` not found
error[E0220]: associated type `IntoIter` not found for `L`
   --> /builddir/build/BUILD/rustc-1.50.0-src/vendor/either/src/lib.rs:391:41
    |
391 |     pub fn into_iter(self) -> Either<L::IntoIter, R::IntoIter>
    |                                         ^^^^^^^^ associated type `IntoIter` not found
error[E0220]: associated type `IntoIter` not found for `R`
   --> /builddir/build/BUILD/rustc-1.50.0-src/vendor/either/src/lib.rs:391:54
    |
391 |     pub fn into_iter(self) -> Either<L::IntoIter, R::IntoIter>
    |                                                      ^^^^^^^^ associated type `IntoIter` not found
error: aborting due to 3 previous errors
For more information about this error, try `rustc --explain E0220`.
error: could not compile `either`

either also failed in the builds of clippy-driver and rls, and futures also failed similarly in rls.

Before when I had tried native, I was only building rustc+std and then attempting the OP reproducer code, which was fine. So I'm not sure if this native failure has been lingering all along or if something in Fedora tickled this a new way.


There is also an ICE in that log, but it doesn't look directly related:

 Documenting core v0.0.0 (/builddir/build/BUILD/rustc-1.50.0-src/library/core)
error: internal compiler error: compiler/rustc_privacy/src/lib.rs:500:25: item Item { ident: #0, hir_id: HirId { owner: DefId(0:474 ~ core[8787]::num::flt2dec::{misc#0}), local_id: 0 }, attrs: [], kind: Use(Path { span: library/core/src/num/flt2dec/mod.rs:125:9: 125:70 (#0), res: Err, segments: [PathSegment { ident: self#0, hir_id: Some(HirId { owner: DefId(0:474 ~ core[8787]::num::flt2dec::{misc#0}), local_id: 1 }), res: Some(Err), args: None, infer_args: false }, PathSegment { ident: decoder#0, hir_id: Some(HirId { owner: DefId(0:474 ~ core[8787]::num::flt2dec::{misc#0}), local_id: 2 }), res: Some(Def(Mod, DefId(0:480 ~ core[8787]::num::flt2dec::decoder))), args: None, infer_args: false }] }, ListStem), vis: Spanned { node: Inherited, span: library/core/src/num/flt2dec/mod.rs:125:9: 125:9 (#0) }, span: library/core/src/num/flt2dec/mod.rs:125:1: 125:71 (#0) } with DefKind Struct
thread 'rustc' panicked at 'Box<Any>', compiler/rustc_errors/src/lib.rs:958:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: Unrecognized option: 'markdown-css'
error: aborting due to previous error
error: could not document `core`

@Jakob-Naucke
Copy link
Author

@cuviper fe03118 used to introduce the ICE in the OP for me, but now it's introducing the ICE you're mentioning. That's weird.

@cuviper
Copy link
Member

cuviper commented Feb 15, 2021

I ran a new native bisection using the either failure as the test, and it blamed eba5432, which was #81257 backported to beta in #81774. That's a totally different area of code than the previous bisection to #79637, so I think it's fair to guess that neither are really at fault -- probably just perturbed the compiler's build in a way that exposed a codegen issue.

@cuviper cuviper self-assigned this Feb 15, 2021
@cuviper
Copy link
Member

cuviper commented Feb 19, 2021

I noticed that a bunch of ui tests were failing too, so I started bisecting based on that, and I got to 0183b41 this time, from #79547 (cc @erikdesjardins @nagisa). I hope this is finally a real smoking gun, because that change could definitely have far-reaching codegen effects. That's not to say the change is necessarily wrong, but it may have exposed a latent LLVM bug for SystemZ.

@nagisa
Copy link
Member

nagisa commented Feb 19, 2021

Hm, I think it is fine if we revert that PR, as it was an attempt at optimizing the ABI, but we should also see if we can make a minimal reproducer so that we can pinpoint exactly what component is to blame here and if its LLVM – report it upstream.

@cuviper
Copy link
Member

cuviper commented Feb 19, 2021

FWIW, after a successful scratch build, I pushed this revert patch to Fedora so we can ship 1.50.

I'll continue trying to narrow down the effect before taking action here in rust-lang.

mtreinish added a commit to mtreinish/retworkx that referenced this issue Feb 21, 2021
Since the Rust 1.50.0 the s390x CI test job has been failing. This looks
to be caused by rust-lang/rust#80810. Until the issue is resolved in a
released version of rust this pins the rust version used in the job to
the previous release 1.49.0 which does not have this issue and should
work fine.
mtreinish added a commit to Qiskit/rustworkx that referenced this issue Feb 21, 2021
* Pin rust in s390x test job

Since the Rust 1.50.0 the s390x CI test job has been failing. This looks
to be caused by rust-lang/rust#80810. Until the issue is resolved in a
released version of rust this pins the rust version used in the job to
the previous release 1.49.0 which does not have this issue and should
work fine.

* Call rustup in the before_install stage
@cuviper
Copy link
Member

cuviper commented Feb 22, 2021

Filed: https://bugs.llvm.org/show_bug.cgi?id=49322

That came from analysis of a stage1 testing failure in ui/proc-macro/macro-rules-derive, specifically from its macro here:

#[proc_macro_attribute]
pub fn first(_attr: TokenStream, item: TokenStream) -> TokenStream {
let tokens: TokenStream = "#[derive(Second)]".parse().unwrap();
let wrapped = TokenTree::Group(Group::new(Delimiter::None, item.into_iter().collect()));
tokens.into_iter().chain(std::iter::once(wrapped)).collect()
}

I see _attr starts as TokenStream handle 1, item is 2, and tokens gets 3. When the second collect is called, that iterator is 12 bytes and now gets passed in LLVM IR as one i96 argument. That argument is converted in SystemZ back to being indirect on the stack, and this overflows to clobber _attr to now look like handle 3 too. Then the _attr drop panics:

thread 'rustc' panicked at 'use-after-free in `proc_macro` handle', /root/rust/library/proc_macro/src/bridge/handle.rs:35:30

This might not be the only issue, but it was the first thing that I could positively isolate.

@alex
Copy link
Member

alex commented Feb 22, 2021

Excellent job with the debugging! Very impressive minimization.

@cuviper
Copy link
Member

cuviper commented Feb 23, 2021

On the Rust side, maybe max_by_val_size should be target-specific? LLVM is converting back to indirect in this case, so it's kind of a waste even if the bug is fixed.

@cuviper
Copy link
Member

cuviper commented Mar 9, 2021

A fix for LLVM 49322 has landed, and I confirmed it does fix my minimization, but that Rust test still fails. It seems I must have minimized too much, so I'll have to step back and see what else went wrong.

@cuviper
Copy link
Member

cuviper commented Mar 9, 2021

but that Rust test still fails.

Scratch that, I think something in my build failed to re-link, even though I thought I had cleaned it sufficiently. I couldn't figure out why my rustc was still clobbering stack in that test, while its build of llc did the right thing, so I ran a completely clean rebuild with the fix D97514 applied and now it is fine. I confirmed both on the regressed commit 0183b41 and on 1.50.0 final. So, yay! 🎉

I'll prepare a backport for rust-lang/llvm-project which we can update into 1.52-nightly, and then we can decide whether to also backport for 1.51-beta before that moves to stable.

@uweigand
Copy link
Contributor

@cuviper thanks for tracking this down! If possible, it would be great to the have the fix in 1.51 ...

@cuviper
Copy link
Member

cuviper commented Mar 11, 2021

@uweigand #82996 will pull the fix into 1.51-beta.

@cuviper
Copy link
Member

cuviper commented Mar 12, 2021

I've confirmed that both are working on the original reproducer and on the either crate. I was also able to build a native rustc bootstrapped using that beta as the stage0 compiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. O-SystemZ Target: SystemZ processors (s390x) P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests