-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross-compiling Rust to s390x yields a faulty toolchain #80810
Comments
Assigning P-high as discussed as part of the Prioritization Working Group procedure and removing I-prioritize. |
This has come up before when some part of the crate metadata is accidentally written in native-endian order -- so x86_64 rustc writes some little-endian data in the standard library rlibs, and s390x rustc reads that as big-endian. |
I can reproduce this on RHEL 7 s390x, but not ppc64, so it would seem to be something other than endian order. |
I know you said based on testing it doesn't seem to be endian order that's causing this issue, but in general can't we just standardize on either little-endian or big-endian somehow? |
searched nightlies: from nightly-2020-10-01 to nightly-2021-01-08 bisected with cargo-bisect-rustc v0.6.0Host triple: s390x-unknown-linux-gnu cargo bisect-rustc --start 2020-10-01 --end 2021-01-08 -- check That's the merge for #79637, a revert, which doesn't really make sense to me... |
Since this may be an rlib-related issue, did you have cargo-bisect-rustc run |
I think we do normalize to/from little endian, but it's possible something was missed. |
Doesn't it use a new target dir by default? That seems implied by the option to |
I think you're right – I forgot about that :) |
Sorry, I don't have time right now to check this properly but a quick answer I can give you now is: |
@Jakob-Naucke @cuviper is it easy for you to test against #80732?, we did a try build so you can install the compiler with that PR included using rustup-toolchain-install-master. |
@spastorino I'm afraid try does not generate an s390x target:
(but |
@spastorino yes, that's working. Built with
|
I do worry that we're still missing a root cause, because it doesn't look like that code should behave differently for cross compiling, let alone s390x in particular. That could be a totally unrelated codegen bug though, which just happens to get tickled the wrong way by this change. |
@Jakob-Naucke checkin in on this issue. Do you need a feedback? Do we have enough info to assess a context? I'm trying to figure out the status of this issue and the next steps. thanks! |
@apiraino By context, do you mean a category label? I'd say that this is a bug. There is a PR that fixes this, but it's blocked at the moment since it possibly breaks |
Even if it lands soon, #80732 isn't nominated for a beta backport, and it looks like it may be too invasive for that. I wonder if you can run a So if that build works with otherwise identical toolchain bits, then maybe we can look for binary differences with the cross-compiled build. If that native build fails, maybe we can find a difference from your "normal" native toolchain. |
Using Output
It turns out that the file
This file is indeed generated when building on x86_64 for s390x. When building for x86_64 on x86_64, the file |
I literally just got to that point of missing Maybe we can try just a plain native build of GCC 5.2.0 and see how that compares. The C/C++ toolchain might not matter much anyway, since the problem we're seeing is in the type system, firmly in Rust's domain. So I guess a potential codegen issue would just be in LLVM, less likely to be a two-level miscompilation of LLVM causing miscompilation of Rust. Sorry for all the guesswork -- I'm just trying to find traction on the underlying issue. I think I will next try a detailed comparison of running the native beta vs. cross-compiled beta, regardless of the system toolchain difference. |
Well, I just encountered this on a native build of the complete toolchain for s390x:
Before when I had tried native, I was only building There is also an ICE in that log, but it doesn't look directly related:
|
I ran a new native bisection using the |
I noticed that a bunch of |
Hm, I think it is fine if we revert that PR, as it was an attempt at optimizing the ABI, but we should also see if we can make a minimal reproducer so that we can pinpoint exactly what component is to blame here and if its LLVM – report it upstream. |
FWIW, after a successful scratch build, I pushed this revert patch to Fedora so we can ship 1.50. I'll continue trying to narrow down the effect before taking action here in rust-lang. |
Since the Rust 1.50.0 the s390x CI test job has been failing. This looks to be caused by rust-lang/rust#80810. Until the issue is resolved in a released version of rust this pins the rust version used in the job to the previous release 1.49.0 which does not have this issue and should work fine.
* Pin rust in s390x test job Since the Rust 1.50.0 the s390x CI test job has been failing. This looks to be caused by rust-lang/rust#80810. Until the issue is resolved in a released version of rust this pins the rust version used in the job to the previous release 1.49.0 which does not have this issue and should work fine. * Call rustup in the before_install stage
Filed: https://bugs.llvm.org/show_bug.cgi?id=49322 That came from analysis of a stage1 testing failure in rust/src/test/ui/proc-macro/auxiliary/first-second.rs Lines 10 to 15 in 15598a8
I see
This might not be the only issue, but it was the first thing that I could positively isolate. |
Excellent job with the debugging! Very impressive minimization. |
On the Rust side, maybe |
A fix for LLVM 49322 has landed, and I confirmed it does fix my minimization, but that Rust test still fails. It seems I must have minimized too much, so I'll have to step back and see what else went wrong. |
Scratch that, I think something in my build failed to re-link, even though I thought I had cleaned it sufficiently. I couldn't figure out why my rustc was still clobbering stack in that test, while its build of llc did the right thing, so I ran a completely clean rebuild with the fix D97514 applied and now it is fine. I confirmed both on the regressed commit 0183b41 and on 1.50.0 final. So, yay! 🎉 I'll prepare a backport for rust-lang/llvm-project which we can update into 1.52-nightly, and then we can decide whether to also backport for 1.51-beta before that moves to stable. |
@cuviper thanks for tracking this down! If possible, it would be great to the have the fix in 1.51 ... |
I've confirmed that both are working on the original reproducer and on the |
Code
I tried this code on s390x:
(originally discovered in and derived to make an MWE from gimli) with all of the following:
DEPLOY=1 ./src/ci/docker/run.sh dist-s390x-linux
on x86_64 and installed on s390xI expected to see this happen: a successful compiler run, as is on x86_64
Instead, this happened:
Furthermore, when building Rust with
x.py
on s390x after the switch to the aforementioned 2020-12-30 stage0 (introduced infe031180
), an ICE is claimed:Output
Rust compiled on s390x natively (immediately prior to that introduction) does not have this issue.
Since the Rust dists are built through cross-compilation for Tier 2 architectures, I believe that there is a problem specific to the cross-compilation with an s390x target.
Versions it worked on
It most recently worked on:
Versions with regression
e.g.
Backtrace
no additional info for code example, ICE output had backtrace enabled
@rustbot modify labels: +regression-from-stable-to-beta -regression-untriaged
The text was updated successfully, but these errors were encountered: