Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash collision on /usr/lib/rust/lib-1.84.1/librustc_driver-???.so #136701

Closed
clan opened this issue Feb 7, 2025 · 59 comments · Fixed by #137036
Closed

hash collision on /usr/lib/rust/lib-1.84.1/librustc_driver-???.so #136701

clan opened this issue Feb 7, 2025 · 59 comments · Fixed by #137036
Assignees
Labels
C-bug Category: This is a bug. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)

Comments

@clan
Copy link

clan commented Feb 7, 2025

https://bugs.gentoo.org/949374

We have users' report on unable to run rustc after build rust 1.84.0 and 1.84.1 (no report for earlier version), and it is high suspected the problem is on the collision on librustc_driver-???.so, two version (1.84.0 & 1.84.1) use the same filename (in different path), so rustc-1.84.1 will use librustc_driver in rust 1.84.0, then failed with error:

/usr/lib/rust/1.84.1/bin/rustc: symbol lookup error: /usr/lib/rust/1.84.1/bin/rustc: undefined symbol: _ZN3std2rt19lang_start_internal17h12de313c8fa04a78E

ldd confirm:

> demize@yveltal ~ $ lddtree /usr/bin/rustc-*
> rustc-1.84.0 => /usr/bin/rustc-1.84.0 (interpreter => /lib64/ld-linux-x86-64.so.2)
>     librustc_driver-6f6761e2f1b0d7ca.so => /usr/lib/rust/lib-1.84.0/librustc_driver-6f6761e2f1b0d7ca.so
>         libLLVM.so.19.1+libcxx => /usr/lib/llvm/19/lib64/libLLVM.so.19.1+libcxx
>             libffi.so.8 => /usr/lib64/libffi.so.8
>             libz.so.1 => /usr/lib64/libz.so.1
>             libzstd.so.1 => /usr/lib64/libzstd.so.1
>         libc++.so.1 => /usr/lib64/libc++.so.1
>         libc++abi.so.1 => /usr/lib64/libc++abi.so.1
>         libunwind.so.1 => /usr/lib64/libunwind.so.1
>         libm.so.6 => /usr/lib64/libm.so.6
>         ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
>     libc.so.6 => /usr/lib64/libc.so.6
> rustc-1.84.1 => /usr/bin/rustc-1.84.1 (interpreter => /lib64/ld-linux-x86-64.so.2)
>     librustc_driver-6f6761e2f1b0d7ca.so => /usr/lib/rust/lib-1.84.0/librustc_driver-6f6761e2f1b0d7ca.so
>         libLLVM.so.19.1+libcxx => /usr/lib/llvm/19/lib64/> libLLVM.so.19.1+libcxx
>             libffi.so.8 => /usr/lib64/libffi.so.8
>             libz.so.1 => /usr/lib64/libz.so.1
>             libzstd.so.1 => /usr/lib64/libzstd.so.1
>         libc++.so.1 => /usr/lib64/libc++.so.1
>         libc++abi.so.1 => /usr/lib64/libc++abi.so.1
>         libunwind.so.1 => /usr/lib64/libunwind.so.1
>         libm.so.6 => /usr/lib64/libm.so.6
>         ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
>     libc.so.6 => /usr/lib64/libc.so.6

both rustc 1.84.0 & 1.84.1 use librustc_driver.so with the same name 'librustc_driver-6f6761e2f1b0d7ca.so'

My question is:

  1. is 6f6761e2f1b0d7ca a hash value?
  2. if so, how it's calculated?
  3. what's the probability of collsion?
  4. howto avoid if it's collision?

thanks.

@clan clan added C-bug Category: This is a bug. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels Feb 7, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 7, 2025
@onur-ozkan onur-ozkan removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 7, 2025
@jyn514
Copy link
Member

jyn514 commented Feb 8, 2025

that hash is a -C metadata value calculated by cargo (based on a bunch of things, one of which is the version number).

i would be extremely surprised to see a collision here. how did you install 1.84.0 and .1? are you sure they are actually different compilers?

@workingjubilee
Copy link
Member

1.84.0:

        librustc_driver-7bea7fca18409d2b.so => /home/jubilee/.rustup/toolchains/1.84.0-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-7bea7fca18409d2b.so (0x00007479a6c00000)

1.84.1:

        librustc_driver-cbb5ad48aac6e327.so => /home/jubilee/.rustup/toolchains/1.84.1-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-cbb5ad48aac6e327.so (0x0000791fb9e00000)

@clan I believe the gentoo build system has a bug.

@jieyouxu
Copy link
Member

jieyouxu commented Feb 8, 2025

Yeah I was about to say, they're both searching under 1.84.0 dir which is sus, but then it's the same driver...

@clan
Copy link
Author

clan commented Feb 8, 2025

yes, it turns out the reason is 'channel = "nightly"' in config.toml
so the question is:

  1. how it's calculated? where is the code?
  2. what's the best method to make it unique non-unique between each release even with channel="nightly"? "-C metadata" is a option.

updated: i mean non-unique but write unique, :(

@clan
Copy link
Author

clan commented Feb 8, 2025

that hash is a -C metadata value calculated by cargo (based on a bunch of things, one of which is the version number).

i would be extremely surprised to see a collision here. how did you install 1.84.0 and .1? are you sure they are actually different compilers?

it seems version is cut if channel="nightly"?

@clan
Copy link
Author

clan commented Feb 8, 2025

1.84.0:

        librustc_driver-7bea7fca18409d2b.so => /home/jubilee/.rustup/toolchains/1.84.0-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-7bea7fca18409d2b.so (0x00007479a6c00000)

1.84.1:

        librustc_driver-cbb5ad48aac6e327.so => /home/jubilee/.rustup/toolchains/1.84.1-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-cbb5ad48aac6e327.so (0x0000791fb9e00000)

@clan I believe the gentoo build system has a bug.

it's probably the first time to have 1.X.0 and 1.X.1 in gentoo, so have this bug.

@jieyouxu

This comment has been minimized.

@clan
Copy link
Author

clan commented Feb 8, 2025

Why is gentoo using a nightly channel when building stable versions? Can you trying building stable toolchains via the dist profile and avoid overriding the channel?

Gentoo is a source based distribution, user can control the feature themselves by the so called USE flag, that's why it took some time to find out the reason.

@workingjubilee
Copy link
Member

@clan The last time I counseled someone about a bug they were having with Gentoo's build system, it became apparent Gentoo is doing some pretty... opinionated... things with how y'all are building our software.

In other words, it seems to me the default way you are building our things is not what I would do if I was configuring rustc for a source-based distribution.

I would not make it an option to build rustc with a different channel, I would simply make nightly rustc a separate package.

@jieyouxu
Copy link
Member

jieyouxu commented Feb 8, 2025

If the user used (heh) a USE flag that overrides the channel of a dist build with nightly, then this will occur AFAIK.

It's possible that this might be a different issue. But without a repro that rules out the gentoo build system it's hard to say.

@jieyouxu jieyouxu added C-discussion Category: Discussion or questions that doesn't represent real issues. and removed C-bug Category: This is a bug. labels Feb 8, 2025
@workingjubilee
Copy link
Member

workingjubilee commented Feb 8, 2025

From our view, only the date XOR the semver is truly relevant.

  • Nightly means the date is relevant.
  • Stable means the semver is relevant.

So the idea of "1.84.1 nightly" is not something we're going to plan around, and using our build system to produce such a beast will provide you with such "entertaining" results as you have seen.

From the perspective of "someone who would like to be able to triage bug reports from Gentoo users", it is most preferable for us if all nightly versions are actual nightly versions, and all stable versions are actual stable versions. Stable versions can wind up with patches that a nightly version will never have, so their commit hashes would not be equal, and the dates would be mismatched. Of course, if there are additional patches on top, I would not object, but it would be best if they at least used the source tarball that matched that stable or nightly version.

@jieyouxu
Copy link
Member

jieyouxu commented Feb 8, 2025

Oh, the gentoo build system based on https://gitweb.gentoo.org/repo/gentoo.git/tree/dev-lang/rust/rust-1.84.0.ebuild#n373 seems to be setting a dist profile then also conditionally overriding the channel. I think that might cause issues?

@workingjubilee
Copy link
Member

what the tux

@jieyouxu jieyouxu added S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. requires-custom-config This issue requires custom config/build for rustc in some way labels Feb 8, 2025
@clan
Copy link
Author

clan commented Feb 8, 2025

My current understanding is "nightly" have more features than stable (I don't know is it still true for released tarballs)? So we provide options to let user can try by their own choice.

From the distribution's side, we'd like to find a solution to make this hash not unique in any case (try best).

We've found "-C metadata" might be a option, but since I don't how this hash is calculated now, it may or may not be the best solution?

@workingjubilee
Copy link
Member

From the distribution's side, we'd like to find a solution to make this hash not unique in any case (try best).

...why do you want to make the hash not unique?

@jieyouxu

This comment has been minimized.

@jieyouxu

This comment has been minimized.

@jieyouxu
Copy link
Member

jieyouxu commented Feb 8, 2025

I'll take a look.

@clan is there a smaller repro (maybe with the exact config.toml resolved) I can use to test locally without having to deal with another layer of Gentoo's build system on top of bootstrap (which is its own build system)?

@jieyouxu jieyouxu added C-bug Category: This is a bug. E-needs-investigation Call for partcipation: This issues needs some investigation to determine current status and removed C-discussion Category: Discussion or questions that doesn't represent real issues. labels Feb 8, 2025
@clan
Copy link
Author

clan commented Feb 8, 2025

let me prepare a config.toml for 1.84.1

I'll take a look.

@clan is there a smaller repro (maybe with the exact config.toml resolved) I can use to test locally without having to deal with another layer of Gentoo's build system on top of bootstrap (which is its own build system)?

@clan
Copy link
Author

clan commented Feb 8, 2025

Huh...

@clan can you try something: are you able to build a working 1.84.1 toolchain without specifying [install] section, just try to see if you can produce a working toolchain under a local build directory?

I'll see, but not very soon, build take long time, ...

@jieyouxu
Copy link
Member

jieyouxu commented Feb 8, 2025

I'll see, but not very soon, build take long time, ...

Try a minimal config against 1.84.1 sources like:

profile = "dist"
change-id = 999999


[build]
build-stage = 1
test-stage = 1
extended = false
tools = []

[llvm]
download-ci-llvm = true

[rust]
channel = "nightly"
description = "meow"

codegen-backends = ["llvm"]
download-rustc = false
debug = false
debuginfo-level = 1

and just ./x build, and see if you can produce a working stage 1 rustc built via 1.84.1 sources.

@clan
Copy link
Author

clan commented Feb 8, 2025

I'll see, but not very soon, build take long time, ...

Try a minimal config against 1.84.1 sources like:

profile = "dist"
change-id = 999999

[build]
build-stage = 1
test-stage = 1
extended = false
tools = []

[llvm]
download-ci-llvm = true

[rust]
channel = "nightly"
description = "meow"

codegen-backends = ["llvm"]
download-rustc = false
debug = false
debuginfo-level = 1
and just ./x build, and see if you can produce a working stage 1 rustc built via 1.84.1 sources.

ok, i'll try this when free, thanks for the help

@Noratrieb
Copy link
Member

I did a bit of investigation on where this hash comes from in the first place.

  • In rustc itself, it's just taken from -Cextra-filename. rustc does not compute anything.
    let libname = format!("{}{}", crate_name, sess.opts.cg.extra_filename);
  • Our own distributed nightly-2025-02-07 and nightly-2025-02-08 have the same filename (librustc_driver-0fc0bb987c85669a.so). A much older nightly-2024-09-28 does have a different filename.
  • Our own distributed 1.83.0, 1.84.0, and 1.84.1 all have different filenames.
  • Bootstrap (our own custom cargo wrapper) does not set -Cextra-filename. It's only set by cargo, and I do not know which logic it uses. But somehow it ends up the same for your builds but different for our builds.

@Noratrieb
Copy link
Member

The two most recent nightlies having the same filename seems like a bug too (not that I think that it will cause issues in practice, but I think fixing that would also fix this).

@jieyouxu
Copy link
Member

jieyouxu commented Feb 8, 2025

Our own distributed nightly-2025-02-07 and nightly-2025-02-08 have the same filename (librustc_driver-0fc0bb987c85669a.so). A much older nightly-2024-09-28 does have a different filename.

nightly-2025-02-06 has a different librustc_driver.so filename too.

@jyn514
Copy link
Member

jyn514 commented Feb 8, 2025

I expected this collision to be because rust.omit-git-hash ended up being set. But you are not setting it explicitly, https://github.com/rust-lang/rust/blob/1.84.1/src/bootstrap/src/core/config/config.rs#L1699-L1700 is only set for channel = dev (not for nightly), and https://github.com/rust-lang/rust/blob/1.84.1/src/bootstrap/defaults/config.dist.toml is not overriding it.

I wish we had added a command to bootstrap for debugging the resolved value of each config, not just the ones that have been explicitly overridden. @onur-ozkan i think that would be useful in general, not just for this issue - maybe you are interested in adding it?

The rustc_driver itself is being built with a different version number (this can be seen when you overrode the .so file with LD_LIBRARY_PATH). And you said you are building from a tarball, so you end up with a GitInfo::RecordedForTarball (as an aside, read_commit_info_file has a bug, it does not respect omit_git_hash). The only relevant things I can see that are different based only on the channel are:

I think that second bit of code is to blame, and I think it will be fixed by appending the version number to that metadata string. If I am correct, this bug will also be present if you use beta. @clan does this also reproduce with channel = beta? or only with nightly?

@jyn514
Copy link
Member

jyn514 commented Feb 8, 2025

yes ok, this happens because cargo hashes the version of rustc used during the build (see hash_rustc_version). but it does not hash the version number of the code being built. that seems like a bug in cargo; but we could workaround it in rustc with that __CARGO_DEFAULT_LIB_METADATA variable.

@bjorn3
Copy link
Member

bjorn3 commented Feb 8, 2025

but it does not hash the version number of the code being built.

For the nightly channel it is intentional that -Cextra-filename omits the version. This to prevent artifacts from multiple nightly versions accumulating in the target dir if you update nightly often. All nightly versions using the same -Cextra-filename causes artifacts for each nightly version to overwrite each other.

@jyn514
Copy link
Member

jyn514 commented Feb 8, 2025

I am slandering cargo. The real problem is the version number is always set to 0.0.0: https://github.com/rust-lang/rust/blob/1.84.1/compiler/rustc_driver/Cargo.toml#L3

We should be overriding that with the resolved version from src/version.

For the nightly channel it is intentional that -Cextra-filename omits the version.

Yes, you're right. That happens here: https://github.com/rust-lang/cargo/blob/66221abdeca2002d318fde6efff516aab091df0e/src/cargo/core/compiler/build_runner/compilation_files.rs#L692-L711

That seems a little suspicious, but I think we are getting "lucky" that cargo is considering the version number of the build compiler; that only works because gentoo is using 1.84.0 to build 1.84.1; it wouldn't help us if they were building with 1.83.0 (which is explicitly supported). The right fix is not to use 0.0.0 for rustc_driver unconditionally.

@jieyouxu
Copy link
Member

jieyouxu commented Feb 8, 2025

I am slandering cargo. The real problem is the version number is always set to 0.0.0: 1.84.1/compiler/rustc_driver/Cargo.toml#L3

I thought this is overriden?

// Set some configuration variables picked up by build scripts and
// the compiler alike
cargo
.env("CFG_RELEASE", builder.rust_release())
.env("CFG_RELEASE_CHANNEL", &builder.config.channel)
.env("CFG_VERSION", builder.rust_version());
// Some tools like Cargo detect their own git information in build scripts. When omit-git-hash
// is enabled in config.toml, we pass this environment variable to tell build scripts to avoid
// detecting git information on their own.
if builder.config.omit_git_hash {
cargo.env("CFG_OMIT_GIT_HASH", "1");
}

@jyn514
Copy link
Member

jyn514 commented Feb 8, 2025

No. Those env variables are only read by code in compiler/, not by cargo.

@jieyouxu
Copy link
Member

jieyouxu commented Feb 8, 2025

Oh oops sorry yeah, I misread that 😅

@jieyouxu jieyouxu removed the E-needs-investigation Call for partcipation: This issues needs some investigation to determine current status label Feb 8, 2025
@jieyouxu
Copy link
Member

jieyouxu commented Feb 8, 2025

I wish we had added a command to bootstrap for debugging the resolved value of each config, not just the ones that have been explicitly overridden.

Tracked separately in #136738.

@clan
Copy link
Author

clan commented Feb 8, 2025

I'll see, but not very soon, build take long time, ...

Try a minimal config against 1.84.1 sources like:

profile = "dist"
change-id = 999999

[build]
build-stage = 1
test-stage = 1
extended = false
tools = []

[llvm]
download-ci-llvm = true

[rust]
channel = "nightly"
description = "meow"

codegen-backends = ["llvm"]
download-rustc = false
debug = false
debuginfo-level = 1
and just ./x build, and see if you can produce a working stage 1 rustc built via 1.84.1 sources.

build success:

    Finished `release` profile [optimized + debuginfo] target(s) in 43.34s
Build completed successfully in 0:05:24

@clan
Copy link
Author

clan commented Feb 8, 2025

so can I confirm there is a bug for the hash calculation of librustc_driver-???.so which will cause same hash for build on different source?

what are the workarounds to avoid this hash collision? thanks a lot.

@jyn514
Copy link
Member

jyn514 commented Feb 8, 2025

@clan a workaround would be to patch compiler/rustc_driver/Cargo.toml to use version = 1.84.1 (and .0, respectively) instead of 0.0.0. You will have to update that patch for each new release until this gets fixed upstream.

(I have not tested that patch, if it doesn't work please let me know on this issue.)

@jieyouxu
Copy link
Member

jieyouxu commented Feb 8, 2025

Like jyn said, probably (at least) two things need to be done to address your issue:

  1. dist profile should probably make it so that omit_git_hash is not used.
  2. In the short run, as an immediate band-aid, the cargo used to build rustc crates likely need to be feed the version info via __CARGO_DEFAULT_LIB_METADATA to influence the crate name hash.

@jieyouxu jieyouxu removed the requires-custom-config This issue requires custom config/build for rustc in some way label Feb 8, 2025
@jieyouxu jieyouxu self-assigned this Feb 8, 2025
@jyn514
Copy link
Member

jyn514 commented Feb 8, 2025

@jieyouxu omit-git-hash is unrelated. This happens any time you use a channel other than stable.

@workingjubilee
Copy link
Member

Even if we "fix" the "bug", the reality is that we will never test anything like Gentoo's configuration, which allows you to put two incredibly patched nightly rustcs into the same PATH and just hope that works out, adequately. You simply cannot just jam two rustcs into /usr/bin and pray for success. That will bite you in the ass.

@workingjubilee
Copy link
Member

workingjubilee commented Feb 9, 2025

Provisional update: gentoo's RPATH was not being set in the config.toml that they used. I speculate that this may be because of this confusing advisory in our config.example.toml which suggests distros may want to disable it:

rust/config.example.toml

Lines 640 to 644 in 73bf794

# By default the `rustc` executable is built with `-Wl,-rpath` flags on Unix
# platforms to ensure that the compiler is usable by default from the build
# directory (as it links to a number of dynamic libraries). This may not be
# desired in distributions, for example.
#rpath = true

This advisory seems... unfortunate, because in the absence of an ELF-configured RPATH, Gentoo has developed... alternate solutions for the problem that RPATH solves. These solutions seem like they could have been adequately replaced by simply using an rpath = true build. A Gentoo maintainer has provisionally tested whether that is true (thank you!).

Even if it does not work out in larger-scale testing, I suspect we should stop telling people their business: the default way we build uses a relative RPATH: $ORIGIN/../lib, which seems like it will work out for a lot of distros. It certainly is probably going to work out better for most cases where you have more than one rustc on the system. If we want to provide a guidance, we could simply actually describe the possible use-cases enabled by disabling it.

@demize
Copy link

demize commented Feb 9, 2025

@clan a workaround would be to patch compiler/rustc_driver/Cargo.toml to use version = 1.84.1 (and .0, respectively) instead of 0.0.0. You will have to update that patch for each new release until this gets fixed upstream.

(I have not tested that patch, if it doesn't work please let me know on this issue.)

Tested this on my machine, I can confirm this works. I'll follow up with @clan on bgo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
9 participants