Skip to content

Commit

Permalink
feat: use stable hash from rustc-stable-hash
Browse files Browse the repository at this point in the history
This helps `-Ztrim-paths` build a stable cross-platform path for the
registry and git sources. Sources files then can be found from the same
path when debugging.

See rust-lang#13171 (comment)

A few caveats:

* This will invalidate the current downloaded caches.
  Need to put this in the Cargo CHANGELOG.
* As a consequence of changing how `SourceId` is hashed, the global cache
  tracker is also affected because Cargo writes source identifiers (e.g.
  `index.crates.io-6f17d22bba15001f`) to SQLite.
  * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/global_cache_tracker.rs#L388-L391
* The performance of rustc-stable-hash is slightly worse than the old
  SipHasher in std on short things like `SourceId`, but for long stuff
  like fingerprint. See appendix.

StableHasher is used in several places (some might not be needed?):

* Rebuild detection (fingerprints)
  * Rustc version, including all the CLI args running `rustc -vV`.
    * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/util/rustc.rs#L326
    * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/util/rustc.rs#L381
  * Build caches
    * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/fingerprint/mod.rs#L1456
* Compute rustc `-C metadata`
  * stable hash for SourceId
    * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/package_id.rs#L207
  * Also read and hash contents from custom target JSON file.
    * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/compile_kind.rs#L81-L91
* `UnitInner::dep_hash`
  * This is to distinguish same units having different features set between normal and build dependencies.
    * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/ops/cargo_compile/mod.rs#L627
* Hash file contents for `cargo package` to verify if files were modified before and after the build.
  * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/ops/cargo_package.rs#L999
* Rusc diagnostics deduplication
  * https://github.com/rust-lang/cargo/blob/6e236509b2331eef64df844b7bbc8ed352294107/src/cargo/core/compiler/job_queue/mod.rs#L311
* Places using `SourceId` identifier like `registry/src` path,
  and `-Zscript` target directories.

Appendix
--------

Benchmark on x86_64-unknown-linux-gnu

```
bench_hasher/RustcStableHasher/URL
                        time:   [33.843 ps 33.844 ps 33.845 ps]
                        change: [-0.0167% -0.0049% +0.0072%] (p = 0.44 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) low severe
  3 (3.00%) high mild
  2 (2.00%) high severe
bench_hasher/SipHasher/URL
                        time:   [18.954 ns 18.954 ns 18.955 ns]
                        change: [-0.1281% -0.0951% -0.0644%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) low severe
  4 (4.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe
bench_hasher/RustcStableHasher/lorem ipsum
                        time:   [659.18 ns 659.20 ns 659.22 ns]
                        change: [-0.0192% -0.0062% +0.0068%] (p = 0.34 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) low severe
  3 (3.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe
bench_hasher/SipHasher/lorem ipsum
                        time:   [1.2006 µs 1.2008 µs 1.2010 µs]
                        change: [+0.0117% +0.0467% +0.0808%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
```
  • Loading branch information
weihanglo committed Nov 16, 2024
1 parent 69e5959 commit c98308e
Show file tree
Hide file tree
Showing 8 changed files with 67 additions and 57 deletions.
2 changes: 1 addition & 1 deletion src/cargo/core/compiler/build_runner/compilation_files.rs
Original file line number Diff line number Diff line change
Expand Up @@ -681,7 +681,7 @@ fn compute_metadata(
}

Metadata {
meta_hash: UnitHash(hasher.finish()),
meta_hash: UnitHash(Hasher::finish(&hasher)),
use_extra_filename: use_extra_filename(bcx, unit),
}
}
Expand Down
2 changes: 1 addition & 1 deletion src/cargo/core/compiler/compile_kind.rs
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,6 @@ impl CompileTarget {
self.name.hash(&mut hasher);
}
}
hasher.finish()
Hasher::finish(&hasher)
}
}
2 changes: 1 addition & 1 deletion src/cargo/core/compiler/fingerprint/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1552,7 +1552,7 @@ fn calculate_normal(
local: Mutex::new(local),
memoized_hash: Mutex::new(None),
metadata,
config: config.finish(),
config: Hasher::finish(&config),
compile_kind,
rustflags: extra_flags,
fs_status: FsStatus::Stale,
Expand Down
83 changes: 56 additions & 27 deletions src/cargo/core/source_id.rs
Original file line number Diff line number Diff line change
Expand Up @@ -782,70 +782,99 @@ mod tests {
// Otherwise please just leave a comment in your PR as to why the hash value is
// changing and why the old value can't be easily preserved.
//
// The hash value depends on endianness and bit-width, so we only run this test on
// little-endian 64-bit CPUs (such as x86-64 and ARM64) where it matches the
// well-known value.
// The hash value should be stable across platforms, and doesn't depend on
// endianness and bit-width. One caveat is that absolute paths is inherently
// different on Windows than on Unix-like platforms. Unless we omit or strip
// the prefix components (e.g. `C:`), there is not way to have a
// cross-platform stable hash for absolute paths.
#[test]
#[cfg(all(target_endian = "little", target_pointer_width = "64"))]
fn test_cratesio_hash() {
let gctx = GlobalContext::default().unwrap();
let crates_io = SourceId::crates_io(&gctx).unwrap();
assert_eq!(crate::util::hex::short_hash(&crates_io), "1ecc6299db9ec823");
assert_eq!(crate::util::hex::short_hash(&crates_io), "83d63c3e13aca8cc");
}

// See the comment in `test_cratesio_hash`.
//
// Only test on non-Windows as paths on Windows will get different hashes.
#[test]
#[cfg(all(target_endian = "little", target_pointer_width = "64", not(windows)))]
fn test_stable_hash() {
use std::hash::Hasher;
use std::path::Path;

use crate::util::StableHasher;

#[cfg(not(windows))]
let ws_root = Path::new("/tmp/ws");
#[cfg(windows)]
let ws_root = Path::new(r"C:\\tmp\ws");

let gen_hash = |source_id: SourceId| {
let mut hasher = std::collections::hash_map::DefaultHasher::new();
source_id.stable_hash(Path::new("/tmp/ws"), &mut hasher);
hasher.finish()
let mut hasher = StableHasher::new();
source_id.stable_hash(ws_root, &mut hasher);
Hasher::finish(&hasher)
};

let source_id = SourceId::crates_io(&GlobalContext::default().unwrap()).unwrap();
assert_eq!(gen_hash(source_id), 7062945687441624357);
assert_eq!(crate::util::hex::short_hash(&source_id), "25cdd57fae9f0462");

let url = "https://my-crates.io".into_url().unwrap();
let source_id = SourceId::for_registry(&url).unwrap();
assert_eq!(gen_hash(source_id), 18108075011063494626);
assert_eq!(crate::util::hex::short_hash(&source_id), "fb60813d6cb8df79");
assert_eq!(gen_hash(source_id), 8310250053664888498);
assert_eq!(crate::util::hex::short_hash(&source_id), "b2d65deb64f05373");

let url = "https://your-crates.io".into_url().unwrap();
let source_id = SourceId::for_alt_registry(&url, "alt").unwrap();
assert_eq!(gen_hash(source_id), 12862859764592646184);
assert_eq!(crate::util::hex::short_hash(&source_id), "09c10fd0cbd74bce");
assert_eq!(gen_hash(source_id), 14149534903000258933);
assert_eq!(crate::util::hex::short_hash(&source_id), "755952de063f5dc4");

let url = "sparse+https://my-crates.io".into_url().unwrap();
let source_id = SourceId::for_registry(&url).unwrap();
assert_eq!(gen_hash(source_id), 8763561830438022424);
assert_eq!(crate::util::hex::short_hash(&source_id), "d1ea0d96f6f759b5");
assert_eq!(gen_hash(source_id), 16249512552851930162);
assert_eq!(crate::util::hex::short_hash(&source_id), "327cfdbd92dd81e1");

let url = "sparse+https://your-crates.io".into_url().unwrap();
let source_id = SourceId::for_alt_registry(&url, "alt").unwrap();
assert_eq!(gen_hash(source_id), 5159702466575482972);
assert_eq!(crate::util::hex::short_hash(&source_id), "135d23074253cb78");
assert_eq!(gen_hash(source_id), 6156697384053352292);
assert_eq!(crate::util::hex::short_hash(&source_id), "64a713b6a6fb7055");

let url = "file:///tmp/ws/crate".into_url().unwrap();
let source_id = SourceId::for_git(&url, GitReference::DefaultBranch).unwrap();
assert_eq!(gen_hash(source_id), 15332537265078583985);
assert_eq!(crate::util::hex::short_hash(&source_id), "73a808694abda756");

let path = Path::new("/tmp/ws/crate");
assert_eq!(gen_hash(source_id), 473480029881867801);
assert_eq!(crate::util::hex::short_hash(&source_id), "199e591d94239206");

let path = &ws_root.join("crate");
let source_id = SourceId::for_local_registry(path).unwrap();
assert_eq!(gen_hash(source_id), 18446533307730842837);
assert_eq!(crate::util::hex::short_hash(&source_id), "52a84cc73f6fd48b");
#[cfg(not(windows))]
{
assert_eq!(gen_hash(source_id), 11515846423845066584);
assert_eq!(crate::util::hex::short_hash(&source_id), "58d73c154f81d09f");
}
#[cfg(windows)]
{
assert_eq!(gen_hash(source_id), 6146331155906064276);
assert_eq!(crate::util::hex::short_hash(&source_id), "946fb2239f274c55");
}

let source_id = SourceId::for_path(path).unwrap();
assert_eq!(gen_hash(source_id), 8764714075439899829);
assert_eq!(crate::util::hex::short_hash(&source_id), "e1ddd48578620fc1");
assert_eq!(gen_hash(source_id), 215644081443634269);
#[cfg(not(windows))]
assert_eq!(crate::util::hex::short_hash(&source_id), "64bace89c92b101f");
#[cfg(windows)]
assert_eq!(crate::util::hex::short_hash(&source_id), "01e1e6c391813fb6");

let source_id = SourceId::for_directory(path).unwrap();
assert_eq!(gen_hash(source_id), 17459999773908528552);
assert_eq!(crate::util::hex::short_hash(&source_id), "6568fe2c2fab5bfe");
#[cfg(not(windows))]
{
assert_eq!(gen_hash(source_id), 6127590343904940368);
assert_eq!(crate::util::hex::short_hash(&source_id), "505191d1f3920955");
}
#[cfg(windows)]
{
assert_eq!(gen_hash(source_id), 10423446877655960172);
assert_eq!(crate::util::hex::short_hash(&source_id), "6c8ad69db585a790");
}
}

#[test]
Expand Down
2 changes: 1 addition & 1 deletion src/cargo/ops/cargo_compile/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -656,7 +656,7 @@ fn traverse_and_share(
.collect();
// Here, we have recursively traversed this unit's dependencies, and hashed them: we can
// finalize the dep hash.
let new_dep_hash = dep_hash.finish();
let new_dep_hash = Hasher::finish(&dep_hash);

// This is the key part of the sharing process: if the unit is a runtime dependency, whose
// target is the same as the host, we canonicalize the compile kind to `CompileKind::Host`.
Expand Down
25 changes: 3 additions & 22 deletions src/cargo/util/hasher.rs
Original file line number Diff line number Diff line change
@@ -1,24 +1,5 @@
//! Implementation of a hasher that produces the same values across releases.
//! A hasher that produces the same values across releases and platforms.
//!
//! The hasher should be fast and have a low chance of collisions (but is not
//! sufficient for cryptographic purposes).
#![allow(deprecated)]
//! This is a wrapper around [`rustc_stable_hash::StableHasher`].
use std::hash::{Hasher, SipHasher};

pub struct StableHasher(SipHasher);

impl StableHasher {
pub fn new() -> StableHasher {
StableHasher(SipHasher::new())
}
}

impl Hasher for StableHasher {
fn finish(&self) -> u64 {
self.0.finish()
}
fn write(&mut self, bytes: &[u8]) {
self.0.write(bytes)
}
}
pub use rustc_stable_hash::StableSipHasher128 as StableHasher;
4 changes: 2 additions & 2 deletions src/cargo/util/hex.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ pub fn to_hex(num: u64) -> String {
pub fn hash_u64<H: Hash>(hashable: H) -> u64 {
let mut hasher = StableHasher::new();
hashable.hash(&mut hasher);
hasher.finish()
Hasher::finish(&hasher)
}

pub fn hash_u64_file(mut file: &File) -> std::io::Result<u64> {
Expand All @@ -23,7 +23,7 @@ pub fn hash_u64_file(mut file: &File) -> std::io::Result<u64> {
}
hasher.write(&buf[..n]);
}
Ok(hasher.finish())
Ok(Hasher::finish(&hasher))
}

pub fn short_hash<H: Hash>(hashable: &H) -> String {
Expand Down
4 changes: 2 additions & 2 deletions src/cargo/util/rustc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -381,7 +381,7 @@ fn rustc_fingerprint(
_ => (),
}

Ok(hasher.finish())
Ok(Hasher::finish(&hasher))
}

fn process_fingerprint(cmd: &ProcessBuilder, extra_fingerprint: u64) -> u64 {
Expand All @@ -391,5 +391,5 @@ fn process_fingerprint(cmd: &ProcessBuilder, extra_fingerprint: u64) -> u64 {
let mut env = cmd.get_envs().iter().collect::<Vec<_>>();
env.sort_unstable();
env.hash(&mut hasher);
hasher.finish()
Hasher::finish(&hasher)
}

0 comments on commit c98308e

Please sign in to comment.