Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid hashing more than is strictly necessary, in the compiler. #56308

Open
eddyb opened this issue Nov 28, 2018 · 13 comments
Open

Avoid hashing more than is strictly necessary, in the compiler. #56308

eddyb opened this issue Nov 28, 2018 · 13 comments
Labels
C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@eddyb
Copy link
Member

eddyb commented Nov 28, 2018

It should be possible to build cheaper interners than what we currently have, using raw_entry.
cc @rust-lang/compiler @gankro @Amanieu

@Zoxc
Copy link
Contributor

Zoxc commented Nov 28, 2018

I have a branch which adds functions for interning based on this PR. I suggest that further work be based on that.

cc @nnethercote

Also relevant #55514 #56241

@eddyb
Copy link
Member Author

eddyb commented Nov 28, 2018

@Zoxc That is, change the implementation of the HashSetInterner trait to use the new APIs?
That seems good, and I'd even call the abstraction HashingInterner, InternByHash or something similar (since the "set" aspect is less relevant than the fact that this is based on hashing).

@Zoxc
Copy link
Contributor

Zoxc commented Nov 28, 2018

Sure. I think we have to change things to use a HashMap anyway. HashSet hides all the fun bits.

@eddyb
Copy link
Member Author

eddyb commented Nov 28, 2018

Oh, another place where we could take advantage of computing the hash only once is the query engine, in case we had to compute a query, and we're inserting the result.

@michaelwoerister
Copy link
Member

Another note, not directly related to interners: We have some things in the compiler that already contain high-quality hash values of themselves. E.g. it's unnecessary work to hash a DefPathHash. There's at least one such map here:

pub def_path_hash_to_def_id: Option<FxHashMap<DefPathHash, DefId>>,

There are probably a few additional, similar cases.

@eddyb
Copy link
Member Author

eddyb commented Nov 29, 2018

@michaelwoerister I think we can maybe use a wrapper for the hash state that uses specialization to avoid double-hashing, in that case, instead of relying on raw_entry.

@michaelwoerister
Copy link
Member

I think the raw_entry API would be the nicer way to do it. It seems to be tailored to support a case just like this (among other use cases).

@eddyb
Copy link
Member Author

eddyb commented Nov 29, 2018

@michaelwoerister Yes but it would require making an entire wrapper of the HashMap public API which carefully avoids letting anything hash the key (going as far as wrapping the key in a type that panics in its Hash impl).

@michaelwoerister
Copy link
Member

What would your solution look like?

kennytm added a commit to kennytm/rust that referenced this issue Nov 30, 2018
@eddyb eddyb changed the title Use raw_entry in the compiler for interners. Avoid hashing more than is strictly necessary, in the compiler. Dec 2, 2018
@eddyb
Copy link
Member Author

eddyb commented Dec 2, 2018

I'll reopen this because the discussion mentioned things other than interners.

Also, I still have to write those code examples for @michaelwoerister.

@eddyb eddyb reopened this Dec 2, 2018
@jonas-schievink jonas-schievink added C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 27, 2019
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 2, 2020
Avoid rehashing Fingerprint as a map key

This introduces a no-op `Unhasher` for map keys that are already hash-
like, for example `Fingerprint` and its wrapper `DefPathHash`. For these
we can directly produce the `u64` hash for maps. The first use of this
is `def_path_hash_to_def_id: Option<UnhashMap<DefPathHash, DefId>>`.

cc rust-lang#56308
r? @eddyb
@Noratrieb
Copy link
Member

The interners currently do use the raw entry APIs:

let entry = shard.raw_entry_mut().from_key_hashed_nocheck(hash, &value);

@oli-obk
Copy link
Contributor

oli-obk commented May 11, 2023

We should still look for other uses of hash maps with keys that are already hashes

bors added a commit to rust-lang-ci/rust that referenced this issue Jan 17, 2024
Use UnhashMap for a few more maps

This avoids a few cases of hashing data that's already hashed.

cc rust-lang#56308
@Mark-Simulacrum
Copy link
Member

I ran some analysis to find the maps changed in #120076 by looking for things put into HashMaps that looked like they were already hashes, using a few heuristics (e.g., similar number of 1 and 0 bits). I think that found most or all of the problem places. I don't see a good way to assert we don't regress there -- with a TypeId for non 'static things, we could assert in Hash64/Hash128 etc impls that we're not hashing them with FxHash (or that we're only hashing them with Unhash). But that assert doesn't exist today, and I don't think adding e.g. specialization or a doc(hidden) method on Hasher in std makes sense for this at this time.

I think the next big opportunity for reducing hashing would be maps with dense, relatively small integer valued keys, where we actually don't need to hash because they're sufficiently dense that a vec makes more sense. We saw big wins with #119977 from both (a) removing the hashing and (b) improved cache locality due to keeping things closer together. But those typically aren't as easy of a transformation to make, and to some extent are a time/space tradeoff.

bors added a commit to rust-lang-ci/rust that referenced this issue Jan 18, 2024
WIP: vendor rustc-hash and override str/length_prefix methods

Locally avoiding length and str sentinel hashing adds 2,611,200 cases of hashes with just one add_to_hash call (out of roughly 43,063,296 total cases). Unclear what kind of effect this will have on runtime performance at this time, but seems like it's worth trying!

cc rust-lang#56308
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 19, 2024
Use UnhashMap for a few more maps

This avoids a few cases of hashing data that's already hashed.

cc rust-lang#56308
bors added a commit to rust-lang-ci/rust that referenced this issue Jan 20, 2024
WIP: vendor rustc-hash and override str/length_prefix methods

Locally avoiding length and str sentinel hashing adds 2,611,200 cases of hashes with just one add_to_hash call (out of roughly 43,063,296 total cases). Unclear what kind of effect this will have on runtime performance at this time, but seems like it's worth trying!

cc rust-lang#56308
lnicola pushed a commit to lnicola/rust-analyzer that referenced this issue Apr 7, 2024
Use UnhashMap for a few more maps

This avoids a few cases of hashing data that's already hashed.

cc rust-lang/rust#56308
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 27, 2024
Use UnhashMap for a few more maps

This avoids a few cases of hashing data that's already hashed.

cc rust-lang/rust#56308
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants