Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not hash absolute sysroot path into stdlib crates metadata. #14951

Merged
merged 1 commit into from
Dec 18, 2024

Conversation

Dirbaio
Copy link
Contributor

@Dirbaio Dirbaio commented Dec 17, 2024

stdlib crates get a SourceId which has an absolute path pointing into the sysroot. This makes the metadata hash change depending on where you've installed Rust. This is causing problems because the different hashes snowball into the optimizer making different decisions which ends up changing the binary size.

(Some context: at work we're working with embedded devices with little flash storage so it often happens that a binary builds locally and then fails to fit in flash in CI, just because CI has installed rustc to a different path. Improving binary size is not a goal of this PR, after the fix the size will be whatever, but at least it won't change based on the rustc path anymore)

Overview of the fix:

  • For libstd crates, the metadata hash now contains the path relative to the sysroot, instead of the absolute path.
  • The absolute path is still hashed into the fingerprint (not the metadata) so moving the rustc installation triggers a rebuild. This ensures stdlib crates are rebuilt when upgrading nightly versions.
  • The rustc version is still hashed into the metadata as usual, so upgrading Rust releases (not nightly versions) does cause a metadata change.

Repro of the bug:

$ git clone https://github.com/embassy-rs/embassy --branch cargo-nondet-repro
$ cd embassy/
$ cd examples/nrf52840
$ RUSTUP_HOME=~/.rustup1 cargo build --release --bin wifi_esp_hosted
....
    Finished `release` profile [optimized + debuginfo] target(s) in 13.33s
$ llvm-size target/thumbv7em-none-eabi/release/wifi_esp_hosted
   text	   data	    bss	    dec	    hex	filename
 114500	     80	  48116	 162696	  27b88	target/thumbv7em-none-eabi/release/wifi_esp_hosted
$ RUSTUP_HOME=~/.rustup2 cargo build --release --bin wifi_esp_hosted
....
    Finished `release` profile [optimized + debuginfo] target(s) in 9.64s
$ llvm-size target/humbv7em-none-eabi/release/wifi_esp_hosted
   text	   data	    bss	    dec	    hex	filename
 114272	     80	  48116	 162468	  27aa4	target/thumbv7em-none-eabi/release/wifi_esp_hosted

@rustbot
Copy link
Collaborator

rustbot commented Dec 17, 2024

r? @epage

rustbot has assigned @epage.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-layout Area: target output directory layout, naming, and organization S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 17, 2024
@Dirbaio
Copy link
Contributor Author

Dirbaio commented Dec 17, 2024

An alternative approach I tried is making stable_hash() strip the sysroot prefix, just like it does now with the workspace root. Diff is here. It felt a bit less clean to me because it requires threading the sysroot through a few places, but OTOH it doesn't require special-casing is_std. If that approach seems preferable let me know.

@Dirbaio Dirbaio changed the title Do not hash absolute sysroot path into build-std metadata. Do not hash absolute sysroot path into stdlib crates metadata. Dec 17, 2024
Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a definitive answer here. Overall the more reproducibility the better. OTOH, it is also a niche use case like this one, and relies on how the unstable -Zbuild-std work underneath.

src/cargo/core/compiler/build_runner/compilation_files.rs Outdated Show resolved Hide resolved
// to pull crates from anywhere without worrying about conflicts.
unit.pkg
.package_id()
.stable_hash(bcx.ws.root())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach to this: if a unit is from -Zbuild-std, pass its workspace root path to stable_hash instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's cleaner yes! I've changed the code to do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but I got paranoid that this could be applied to the same logic, for consistency.

pub fn target_short_hash(&self, unit: &Unit) -> String {
let hashable = unit.pkg.package_id().stable_hash(self.ws.root());
util::short_hash(&(METADATA_VERSION, hashable))
}

(It doesn't really matter though. The hash is for target directory layout for fingerprint and build-script output…)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind. It needs more tweaks and seems meaningless.

stdlib crates get a SourceId which has an absolute path pointing into the sysroot. This
makes the metadata hash change depending on where you've installed Rust. This is causing
problems because the different hashes snowball into the optimizer making different
decisions which ends up changing the binary size.

(Some context: at work we're working with embedded devices with little flash storage
so it often happens that a binary builds locally and then fails to fit in flash in CI,
just because CI has installed rustc to a different path. Improving binary size is
*not* a goal of this PR, after the fix the size will be whatever, but at least it
won't change based on the rustc path anymore)

Overview of the fix:
- For libstd crates, the metadata hash now contains the path relative to the sysroot, instead of the absolute path.
- The absolute path is still hashed into the fingerprint (not the metadata) so moving the rustc installation triggers a rebuild. This ensures stdlib crates are rebuilt when upgrading nightly versions.
- The rustc version is still hashed into the metadata as usual, so upgrading Rust releases (not nightly versions) does cause a metadata change.

Repro of the bug:

```
$ git clone https://github.com/embassy-rs/embassy --branch cargo-nondet-repro
$ cd embassy/
$ cd examples/nrf52840
$ RUSTUP_HOME=~/.rustup1 cargo build --release --bin wifi_esp_hosted
....
    Finished `release` profile [optimized + debuginfo] target(s) in 13.33s
$ llvm-size target/thumbv7em-none-eabi/release/wifi_esp_hosted
   text	   data	    bss	    dec	    hex	filename
 114500	     80	  48116	 162696	  27b88	target/thumbv7em-none-eabi/release/wifi_esp_hosted
$ RUSTUP_HOME=~/.rustup2 cargo build --release --bin wifi_esp_hosted
....
    Finished `release` profile [optimized + debuginfo] target(s) in 9.64s
$ llvm-size target/humbv7em-none-eabi/release/wifi_esp_hosted
   text	   data	    bss	    dec	    hex	filename
 114272	     80	  48116	 162468	  27aa4	target/thumbv7em-none-eabi/release/wifi_esp_hosted
```
Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dirbaio
I am about to merge this. Could you update the PR description a bit to reflect the latest change? Thank you!

@weihanglo weihanglo added Z-build-std Nightly: build-std A-reproducibility Area: reproducible / deterministic builds labels Dec 18, 2024
// SourceId for stdlib crates is an absolute path inside the sysroot.
// Pass the sysroot as workspace root so that we hash a relative path.
// This avoids the metadata hash changing depending on where the user installed rustc.
&bcx.target_data.get_info(unit.kind).unwrap().sysroot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note for future self: sysroot is something like
/home/user/.rustup/toolchains/nightly-aarch64-apple-darwin

It is not the real workspace root path of the standard library.
While this seems a bit out of sync, the relative path inside sysroot should be fairly deterministic. It's fine now and I don't think we need any hardcoded path logic like this.

@Dirbaio
Copy link
Contributor Author

Dirbaio commented Dec 18, 2024

@weihanglo Could you update the PR description a bit to reflect the latest change? Thank you!

done!

@weihanglo weihanglo added this pull request to the merge queue Dec 18, 2024
Merged via the queue into rust-lang:master with commit 58b2d60 Dec 18, 2024
20 checks passed
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 21, 2024
Update cargo

10 commits in 99dff6d77db779716dda9ca3b29c26addd02c1be..652623b779c88fe44afede28bf7f1c9c07812511
2024-12-18 00:55:17 +0000 to 2024-12-20 15:44:42 +0000
- fix(package): use relpath to cwd for vcs dirtiness report (rust-lang/cargo#14970)
- Enable triagebot merge conflict notifications (rust-lang/cargo#14972)
- fixed the error message for a user to open the crate (rust-lang/cargo#14969)
- fix(package): show dirty filepaths relative to git workdir (rust-lang/cargo#14968)
- Add the `test` cfg as a well known cfg before of compiler change (rust-lang/cargo#14963)
- refactor(cargo-package): let-else to flatten code (rust-lang/cargo#14959)
- fix(cargo-package): add more traces (rust-lang/cargo#14960)
- Do not hash absolute sysroot path into stdlib crates metadata. (rust-lang/cargo#14951)
- docs: add missing argument to Rustup Cargo workaround (rust-lang/cargo#14954)
- fix(cargo-rustc): stabilize higher precedence trailing flags (rust-lang/cargo#14900)
@rustbot rustbot added this to the 1.85.0 milestone Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-layout Area: target output directory layout, naming, and organization A-reproducibility Area: reproducible / deterministic builds S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. Z-build-std Nightly: build-std
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants