Skip to content

Commit 76c500e

Browse files
committed
Auto merge of rust-lang#81635 - michaelwoerister:structured_def_path_hash, r=pnkfelix
Let a portion of DefPathHash uniquely identify the DefPath's crate. This allows to directly map from a `DefPathHash` to the crate it originates from, without constructing side tables to do that mapping -- something that is useful for incremental compilation where we deal with `DefPathHash` instead of `DefId` a lot. It also allows to reliably and cheaply check for `DefPathHash` collisions which allows the compiler to gracefully abort compilation instead of running into a subsequent ICE at some random place in the code. The following new piece of documentation describes the most interesting aspects of the changes: ```rust /// A `DefPathHash` is a fixed-size representation of a `DefPath` that is /// stable across crate and compilation session boundaries. It consists of two /// separate 64-bit hashes. The first uniquely identifies the crate this /// `DefPathHash` originates from (see [StableCrateId]), and the second /// uniquely identifies the corresponding `DefPath` within that crate. Together /// they form a unique identifier within an entire crate graph. /// /// There is a very small chance of hash collisions, which would mean that two /// different `DefPath`s map to the same `DefPathHash`. Proceeding compilation /// with such a hash collision would very probably lead to an ICE and, in the /// worst case, to a silent mis-compilation. The compiler therefore actively /// and exhaustively checks for such hash collisions and aborts compilation if /// it finds one. /// /// `DefPathHash` uses 64-bit hashes for both the crate-id part and the /// crate-internal part, even though it is likely that there are many more /// `LocalDefId`s in a single crate than there are individual crates in a crate /// graph. Since we use the same number of bits in both cases, the collision /// probability for the crate-local part will be quite a bit higher (though /// still very small). /// /// This imbalance is not by accident: A hash collision in the /// crate-local part of a `DefPathHash` will be detected and reported while /// compiling the crate in question. Such a collision does not depend on /// outside factors and can be easily fixed by the crate maintainer (e.g. by /// renaming the item in question or by bumping the crate version in a harmless /// way). /// /// A collision between crate-id hashes on the other hand is harder to fix /// because it depends on the set of crates in the entire crate graph of a /// compilation session. Again, using the same crate with a different version /// number would fix the issue with a high probability -- but that might be /// easier said then done if the crates in questions are dependencies of /// third-party crates. /// /// That being said, given a high quality hash function, the collision /// probabilities in question are very small. For example, for a big crate like /// `rustc_middle` (with ~50000 `LocalDefId`s as of the time of writing) there /// is a probability of roughly 1 in 14,750,000,000 of a crate-internal /// collision occurring. For a big crate graph with 1000 crates in it, there is /// a probability of 1 in 36,890,000,000,000 of a `StableCrateId` collision. ``` Given the probabilities involved I hope that no one will ever actually see the error messages. Nonetheless, I'd be glad about some feedback on how to improve them. Should we create a GH issue describing the problem and possible solutions to point to? Or a page in the rustc book? r? `@pnkfelix` (feel free to re-assign)
2 parents 234781a + 9e50544 commit 76c500e

29 files changed

+345
-124
lines changed

compiler/rustc_ast/src/lib.rs

-1
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,6 @@ pub mod util {
4242
pub mod ast;
4343
pub mod ast_like;
4444
pub mod attr;
45-
pub mod crate_disambiguator;
4645
pub mod entry;
4746
pub mod expand;
4847
pub mod mut_visit;

compiler/rustc_data_structures/src/fingerprint.rs

+25-3
Original file line numberDiff line numberDiff line change
@@ -7,19 +7,30 @@ use std::hash::{Hash, Hasher};
77
use std::mem::{self, MaybeUninit};
88

99
#[derive(Eq, PartialEq, Ord, PartialOrd, Debug, Clone, Copy)]
10+
#[repr(C)]
1011
pub struct Fingerprint(u64, u64);
1112

1213
impl Fingerprint {
1314
pub const ZERO: Fingerprint = Fingerprint(0, 0);
1415

16+
#[inline]
17+
pub fn new(_0: u64, _1: u64) -> Fingerprint {
18+
Fingerprint(_0, _1)
19+
}
20+
1521
#[inline]
1622
pub fn from_smaller_hash(hash: u64) -> Fingerprint {
1723
Fingerprint(hash, hash)
1824
}
1925

2026
#[inline]
2127
pub fn to_smaller_hash(&self) -> u64 {
22-
self.0
28+
// Even though both halves of the fingerprint are expected to be good
29+
// quality hash values, let's still combine the two values because the
30+
// Fingerprints in DefPathHash have the StableCrateId portion which is
31+
// the same for all DefPathHashes from the same crate. Combining the
32+
// two halfs makes sure we get a good quality hash in such cases too.
33+
self.0.wrapping_mul(3).wrapping_add(self.1)
2334
}
2435

2536
#[inline]
@@ -92,8 +103,19 @@ impl<H: Hasher> FingerprintHasher for H {
92103
impl FingerprintHasher for crate::unhash::Unhasher {
93104
#[inline]
94105
fn write_fingerprint(&mut self, fingerprint: &Fingerprint) {
95-
// `Unhasher` only wants a single `u64`
96-
self.write_u64(fingerprint.0);
106+
// Even though both halves of the fingerprint are expected to be good
107+
// quality hash values, let's still combine the two values because the
108+
// Fingerprints in DefPathHash have the StableCrateId portion which is
109+
// the same for all DefPathHashes from the same crate. Combining the
110+
// two halfs makes sure we get a good quality hash in such cases too.
111+
//
112+
// Since `Unhasher` is used only in the context of HashMaps, it is OK
113+
// to combine the two components in an order-independent way (which is
114+
// cheaper than the more robust Fingerprint::to_smaller_hash()). For
115+
// HashMaps we don't really care if Fingerprint(x,y) and
116+
// Fingerprint(y, x) result in the same hash value. Collision
117+
// probability will still be much better than with FxHash.
118+
self.write_u64(fingerprint.0.wrapping_add(fingerprint.1));
97119
}
98120
}
99121

compiler/rustc_hir/src/definitions.rs

+51-20
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,16 @@
55
//! expressions) that are mostly just leftovers.
66
77
pub use crate::def_id::DefPathHash;
8-
use crate::def_id::{CrateNum, DefId, DefIndex, LocalDefId, CRATE_DEF_INDEX, LOCAL_CRATE};
8+
use crate::def_id::{
9+
CrateNum, DefId, DefIndex, LocalDefId, StableCrateId, CRATE_DEF_INDEX, LOCAL_CRATE,
10+
};
911
use crate::hir;
1012

11-
use rustc_ast::crate_disambiguator::CrateDisambiguator;
1213
use rustc_data_structures::fx::FxHashMap;
1314
use rustc_data_structures::stable_hasher::StableHasher;
15+
use rustc_data_structures::unhash::UnhashMap;
1416
use rustc_index::vec::IndexVec;
17+
use rustc_span::crate_disambiguator::CrateDisambiguator;
1518
use rustc_span::hygiene::ExpnId;
1619
use rustc_span::symbol::{kw, sym, Symbol};
1720

@@ -27,6 +30,7 @@ use tracing::debug;
2730
pub struct DefPathTable {
2831
index_to_key: IndexVec<DefIndex, DefKey>,
2932
def_path_hashes: IndexVec<DefIndex, DefPathHash>,
33+
def_path_hash_to_index: UnhashMap<DefPathHash, DefIndex>,
3034
}
3135

3236
impl DefPathTable {
@@ -39,6 +43,35 @@ impl DefPathTable {
3943
};
4044
self.def_path_hashes.push(def_path_hash);
4145
debug_assert!(self.def_path_hashes.len() == self.index_to_key.len());
46+
47+
// Check for hash collisions of DefPathHashes. These should be
48+
// exceedingly rare.
49+
if let Some(existing) = self.def_path_hash_to_index.insert(def_path_hash, index) {
50+
let def_path1 = DefPath::make(LOCAL_CRATE, existing, |idx| self.def_key(idx));
51+
let def_path2 = DefPath::make(LOCAL_CRATE, index, |idx| self.def_key(idx));
52+
53+
// Continuing with colliding DefPathHashes can lead to correctness
54+
// issues. We must abort compilation.
55+
//
56+
// The likelyhood of such a collision is very small, so actually
57+
// running into one could be indicative of a poor hash function
58+
// being used.
59+
//
60+
// See the documentation for DefPathHash for more information.
61+
panic!(
62+
"found DefPathHash collsion between {:?} and {:?}. \
63+
Compilation cannot continue.",
64+
def_path1, def_path2
65+
);
66+
}
67+
68+
// Assert that all DefPathHashes correctly contain the local crate's
69+
// StableCrateId
70+
#[cfg(debug_assertions)]
71+
if let Some(root) = self.def_path_hashes.get(CRATE_DEF_INDEX) {
72+
assert!(def_path_hash.stable_crate_id() == root.stable_crate_id());
73+
}
74+
4275
index
4376
}
4477

@@ -108,13 +141,10 @@ pub struct DefKey {
108141
}
109142

110143
impl DefKey {
111-
fn compute_stable_hash(&self, parent_hash: DefPathHash) -> DefPathHash {
144+
pub(crate) fn compute_stable_hash(&self, parent: DefPathHash) -> DefPathHash {
112145
let mut hasher = StableHasher::new();
113146

114-
// We hash a `0u8` here to disambiguate between regular `DefPath` hashes,
115-
// and the special "root_parent" below.
116-
0u8.hash(&mut hasher);
117-
parent_hash.hash(&mut hasher);
147+
parent.hash(&mut hasher);
118148

119149
let DisambiguatedDefPathData { ref data, disambiguator } = self.disambiguated_data;
120150

@@ -127,19 +157,13 @@ impl DefKey {
127157

128158
disambiguator.hash(&mut hasher);
129159

130-
DefPathHash(hasher.finish())
131-
}
160+
let local_hash: u64 = hasher.finish();
132161

133-
fn root_parent_stable_hash(
134-
crate_name: &str,
135-
crate_disambiguator: CrateDisambiguator,
136-
) -> DefPathHash {
137-
let mut hasher = StableHasher::new();
138-
// Disambiguate this from a regular `DefPath` hash; see `compute_stable_hash()` above.
139-
1u8.hash(&mut hasher);
140-
crate_name.hash(&mut hasher);
141-
crate_disambiguator.hash(&mut hasher);
142-
DefPathHash(hasher.finish())
162+
// Construct the new DefPathHash, making sure that the `crate_id`
163+
// portion of the hash is properly copied from the parent. This way the
164+
// `crate_id` part will be recursively propagated from the root to all
165+
// DefPathHashes in this DefPathTable.
166+
DefPathHash::new(parent.stable_crate_id(), local_hash)
143167
}
144168
}
145169

@@ -295,6 +319,12 @@ impl Definitions {
295319
self.table.def_path_hash(id.local_def_index)
296320
}
297321

322+
#[inline]
323+
pub fn def_path_hash_to_def_id(&self, def_path_hash: DefPathHash) -> LocalDefId {
324+
let local_def_index = self.table.def_path_hash_to_index[&def_path_hash];
325+
LocalDefId { local_def_index }
326+
}
327+
298328
/// Returns the path from the crate root to `index`. The root
299329
/// nodes are not included in the path (i.e., this will be an
300330
/// empty vector for the crate root). For an inlined item, this
@@ -332,7 +362,8 @@ impl Definitions {
332362
},
333363
};
334364

335-
let parent_hash = DefKey::root_parent_stable_hash(crate_name, crate_disambiguator);
365+
let stable_crate_id = StableCrateId::new(crate_name, crate_disambiguator);
366+
let parent_hash = DefPathHash::new(stable_crate_id, 0);
336367
let def_path_hash = key.compute_stable_hash(parent_hash);
337368

338369
// Create the root definition.

compiler/rustc_hir/src/lib.rs

+3
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ mod stable_hash_impls;
3030
mod target;
3131
pub mod weak_lang_items;
3232

33+
#[cfg(test)]
34+
mod tests;
35+
3336
pub use hir::*;
3437
pub use hir_id::*;
3538
pub use lang_items::{LangItem, LanguageItems};

compiler/rustc_hir/src/tests.rs

+39
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
use crate::definitions::{DefKey, DefPathData, DisambiguatedDefPathData};
2+
use rustc_data_structures::fingerprint::Fingerprint;
3+
use rustc_span::crate_disambiguator::CrateDisambiguator;
4+
use rustc_span::def_id::{DefPathHash, StableCrateId};
5+
6+
#[test]
7+
fn def_path_hash_depends_on_crate_id() {
8+
// This test makes sure that *both* halves of a DefPathHash depend on
9+
// the crate-id of the defining crate. This is a desirable property
10+
// because the crate-id can be more easily changed than the DefPath
11+
// of an item, so, in the case of a crate-local DefPathHash collision,
12+
// the user can simply "role the dice again" for all DefPathHashes in
13+
// the crate by changing the crate disambiguator (e.g. via bumping the
14+
// crate's version number).
15+
16+
let d0 = CrateDisambiguator::from(Fingerprint::new(12, 34));
17+
let d1 = CrateDisambiguator::from(Fingerprint::new(56, 78));
18+
19+
let h0 = mk_test_hash("foo", d0);
20+
let h1 = mk_test_hash("foo", d1);
21+
22+
assert_ne!(h0.stable_crate_id(), h1.stable_crate_id());
23+
assert_ne!(h0.local_hash(), h1.local_hash());
24+
25+
fn mk_test_hash(crate_name: &str, crate_disambiguator: CrateDisambiguator) -> DefPathHash {
26+
let stable_crate_id = StableCrateId::new(crate_name, crate_disambiguator);
27+
let parent_hash = DefPathHash::new(stable_crate_id, 0);
28+
29+
let key = DefKey {
30+
parent: None,
31+
disambiguated_data: DisambiguatedDefPathData {
32+
data: DefPathData::CrateRoot,
33+
disambiguator: 0,
34+
},
35+
};
36+
37+
key.compute_stable_hash(parent_hash)
38+
}
39+
}

compiler/rustc_metadata/src/creader.rs

+28-2
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ use crate::rmeta::{CrateDep, CrateMetadata, CrateNumMap, CrateRoot, MetadataBlob
66

77
use rustc_ast::expand::allocator::AllocatorKind;
88
use rustc_ast::{self as ast, *};
9-
use rustc_data_structures::fx::FxHashSet;
9+
use rustc_data_structures::fx::{FxHashMap, FxHashSet};
1010
use rustc_data_structures::svh::Svh;
1111
use rustc_data_structures::sync::Lrc;
1212
use rustc_expand::base::SyntaxExtension;
13-
use rustc_hir::def_id::{CrateNum, LocalDefId, LOCAL_CRATE};
13+
use rustc_hir::def_id::{CrateNum, LocalDefId, StableCrateId, LOCAL_CRATE};
1414
use rustc_hir::definitions::Definitions;
1515
use rustc_index::vec::IndexVec;
1616
use rustc_middle::middle::cstore::{CrateDepKind, CrateSource, ExternCrate};
@@ -42,6 +42,10 @@ pub struct CStore {
4242
allocator_kind: Option<AllocatorKind>,
4343
/// This crate has a `#[global_allocator]` item.
4444
has_global_allocator: bool,
45+
46+
/// This map is used to verify we get no hash conflicts between
47+
/// `StableCrateId` values.
48+
stable_crate_ids: FxHashMap<StableCrateId, CrateNum>,
4549
}
4650

4751
pub struct CrateLoader<'a> {
@@ -194,6 +198,11 @@ impl<'a> CrateLoader<'a> {
194198
metadata_loader: &'a MetadataLoaderDyn,
195199
local_crate_name: &str,
196200
) -> Self {
201+
let local_crate_stable_id =
202+
StableCrateId::new(local_crate_name, sess.local_crate_disambiguator());
203+
let mut stable_crate_ids = FxHashMap::default();
204+
stable_crate_ids.insert(local_crate_stable_id, LOCAL_CRATE);
205+
197206
CrateLoader {
198207
sess,
199208
metadata_loader,
@@ -207,6 +216,7 @@ impl<'a> CrateLoader<'a> {
207216
injected_panic_runtime: None,
208217
allocator_kind: None,
209218
has_global_allocator: false,
219+
stable_crate_ids,
210220
},
211221
used_extern_options: Default::default(),
212222
}
@@ -313,6 +323,20 @@ impl<'a> CrateLoader<'a> {
313323
res
314324
}
315325

326+
fn verify_no_stable_crate_id_hash_conflicts(
327+
&mut self,
328+
root: &CrateRoot<'_>,
329+
cnum: CrateNum,
330+
) -> Result<(), CrateError> {
331+
if let Some(existing) = self.cstore.stable_crate_ids.insert(root.stable_crate_id(), cnum) {
332+
let crate_name0 = root.name();
333+
let crate_name1 = self.cstore.get_crate_data(existing).name();
334+
return Err(CrateError::StableCrateIdCollision(crate_name0, crate_name1));
335+
}
336+
337+
Ok(())
338+
}
339+
316340
fn register_crate(
317341
&mut self,
318342
host_lib: Option<Library>,
@@ -334,6 +358,8 @@ impl<'a> CrateLoader<'a> {
334358
// Claim this crate number and cache it
335359
let cnum = self.cstore.alloc_new_crate_num();
336360

361+
self.verify_no_stable_crate_id_hash_conflicts(&crate_root, cnum)?;
362+
337363
info!(
338364
"register crate `{}` (cnum = {}. private_dep = {})",
339365
crate_root.name(),

compiler/rustc_metadata/src/locator.rs

+8
Original file line numberDiff line numberDiff line change
@@ -888,6 +888,7 @@ crate enum CrateError {
888888
MultipleMatchingCrates(Symbol, FxHashMap<Svh, Library>),
889889
SymbolConflictsCurrent(Symbol),
890890
SymbolConflictsOthers(Symbol),
891+
StableCrateIdCollision(Symbol, Symbol),
891892
DlOpen(String),
892893
DlSym(String),
893894
LocatorCombined(CombinedLocatorError),
@@ -970,6 +971,13 @@ impl CrateError {
970971
`-C metadata`. This will result in symbol conflicts between the two.",
971972
root_name,
972973
),
974+
CrateError::StableCrateIdCollision(crate_name0, crate_name1) => {
975+
let msg = format!(
976+
"found crates (`{}` and `{}`) with colliding StableCrateId values.",
977+
crate_name0, crate_name1
978+
);
979+
sess.struct_span_err(span, &msg)
980+
}
973981
CrateError::DlOpen(s) | CrateError::DlSym(s) => sess.struct_span_err(span, &s),
974982
CrateError::LocatorCombined(locator) => {
975983
let crate_name = locator.crate_name;

compiler/rustc_metadata/src/rmeta/decoder.rs

+4
Original file line numberDiff line numberDiff line change
@@ -635,6 +635,10 @@ impl CrateRoot<'_> {
635635
self.hash
636636
}
637637

638+
crate fn stable_crate_id(&self) -> StableCrateId {
639+
self.stable_crate_id
640+
}
641+
638642
crate fn triple(&self) -> &TargetTriple {
639643
&self.triple
640644
}

compiler/rustc_metadata/src/rmeta/encoder.rs

+1
Original file line numberDiff line numberDiff line change
@@ -653,6 +653,7 @@ impl<'a, 'tcx> EncodeContext<'a, 'tcx> {
653653
triple: tcx.sess.opts.target_triple.clone(),
654654
hash: tcx.crate_hash(LOCAL_CRATE),
655655
disambiguator: tcx.sess.local_crate_disambiguator(),
656+
stable_crate_id: tcx.def_path_hash(LOCAL_CRATE.as_def_id()).stable_crate_id(),
656657
panic_strategy: tcx.sess.panic_strategy(),
657658
edition: tcx.sess.edition(),
658659
has_global_allocator: tcx.has_global_allocator(LOCAL_CRATE),

compiler/rustc_metadata/src/rmeta/mod.rs

+2-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ use rustc_data_structures::svh::Svh;
77
use rustc_data_structures::sync::MetadataRef;
88
use rustc_hir as hir;
99
use rustc_hir::def::{CtorKind, DefKind};
10-
use rustc_hir::def_id::{DefId, DefIndex, DefPathHash};
10+
use rustc_hir::def_id::{DefId, DefIndex, DefPathHash, StableCrateId};
1111
use rustc_hir::definitions::DefKey;
1212
use rustc_hir::lang_items;
1313
use rustc_index::{bit_set::FiniteBitSet, vec::IndexVec};
@@ -203,6 +203,7 @@ crate struct CrateRoot<'tcx> {
203203
extra_filename: String,
204204
hash: Svh,
205205
disambiguator: CrateDisambiguator,
206+
stable_crate_id: StableCrateId,
206207
panic_strategy: PanicStrategy,
207208
edition: Edition,
208209
has_global_allocator: bool,

compiler/rustc_session/src/session.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@ use crate::parse::ParseSess;
88
use crate::search_paths::{PathKind, SearchPath};
99

1010
pub use rustc_ast::attr::MarkedAttrs;
11-
pub use rustc_ast::crate_disambiguator::CrateDisambiguator;
1211
pub use rustc_ast::Attribute;
1312
use rustc_data_structures::flock;
1413
use rustc_data_structures::fx::{FxHashMap, FxHashSet};
@@ -23,6 +22,7 @@ use rustc_errors::json::JsonEmitter;
2322
use rustc_errors::registry::Registry;
2423
use rustc_errors::{Applicability, Diagnostic, DiagnosticBuilder, DiagnosticId, ErrorReported};
2524
use rustc_lint_defs::FutureBreakage;
25+
pub use rustc_span::crate_disambiguator::CrateDisambiguator;
2626
use rustc_span::edition::Edition;
2727
use rustc_span::source_map::{FileLoader, MultiSpan, RealFileLoader, SourceMap, Span};
2828
use rustc_span::{sym, SourceFileHashAlgorithm, Symbol};

0 commit comments

Comments
 (0)