-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early exit for empty HashMap (issue #38880) #48035
Changes from 2 commits
9bc5986
dcdd2c4
29f7148
fd78621
a295ec1
94c3c84
f3330ce
e034ddd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -397,9 +397,21 @@ pub struct HashMap<K, V, S = RandomState> { | |
resize_policy: DefaultResizePolicy, | ||
} | ||
|
||
/// Search for a pre-hashed key when the hash map is known to be non-empty. | ||
#[inline] | ||
fn search_hashed_nonempty<K, V, M, F>(table: M, hash: SafeHash, is_match: F) | ||
-> InternalEntry<K, V, M> | ||
where M: Deref<Target = RawTable<K, V>>, | ||
F: FnMut(&K) -> bool | ||
{ | ||
// Do not check the capacity as an extra branch could slow the lookup. | ||
search_hashed_body(table, hash, is_match) | ||
} | ||
|
||
/// Search for a pre-hashed key. | ||
/// If you don't already know the hash, use search or search_mut instead | ||
#[inline] | ||
fn search_hashed<K, V, M, F>(table: M, hash: SafeHash, mut is_match: F) -> InternalEntry<K, V, M> | ||
fn search_hashed<K, V, M, F>(table: M, hash: SafeHash, is_match: F) -> InternalEntry<K, V, M> | ||
where M: Deref<Target = RawTable<K, V>>, | ||
F: FnMut(&K) -> bool | ||
{ | ||
|
@@ -410,6 +422,16 @@ fn search_hashed<K, V, M, F>(table: M, hash: SafeHash, mut is_match: F) -> Inter | |
return InternalEntry::TableIsEmpty; | ||
} | ||
|
||
search_hashed_body(table, hash, is_match) | ||
} | ||
|
||
/// The body of the search_hashed[_nonempty] functions | ||
#[inline] | ||
fn search_hashed_body<K, V, M, F>(table: M, hash: SafeHash, mut is_match: F) | ||
-> InternalEntry<K, V, M> | ||
where M: Deref<Target = RawTable<K, V>>, | ||
F: FnMut(&K) -> bool | ||
{ | ||
let size = table.size(); | ||
let mut probe = Bucket::new(table, hash); | ||
let mut displacement = 0; | ||
|
@@ -550,17 +572,25 @@ impl<K, V, S> HashMap<K, V, S> | |
where K: Borrow<Q>, | ||
Q: Eq + Hash | ||
{ | ||
let hash = self.make_hash(q); | ||
search_hashed(&self.table, hash, |k| q.eq(k.borrow())) | ||
if self.table.capacity() != 0 { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this be checking for size() for the optimization to apply for any empty map? My understanding is that checking for 0 capacity only works for not yet allocated maps. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems to be the explanation for that: #38880 (comment) As it is written, it's a rearrangement of code that should result in no extra branches. If I understand correctly, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is what I wanted to do, but as @bluss points out, the problem is that when the map is empty but allocated (as opposed to empty and unallocated) and it doesn't find a key, it is supposed to return This is also a problem for linear search as suggested in #38880. (I did not fully realise this problem until after writing this pull request.) Since an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. N.B. I did actually try the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense. HashMaps are hard 😄 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Regarding the linear search, for that to yield a net win the backing table would need to support real fast iteration (packed KVs), otherwise iterating the buckets gets more expensive than the hashing. Potentially of interest for https://github.com/bluss/ordermap There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did some benchmarks for a linear search, and I think it might still be advantageous for some cases even if the map is not densely packed. Here's my reasoning – tell me what you think: Let m be the allocated size of the hashmap and let n be the number of items it contains. Let S be the cost of accessing the next bucket in the hashmap (regardless of if said bucket is empty), let E be the cost of calculating equality for two elements of a given type T, and let H be the cost of calculating the hash for an element of a given type T. Assume that E and H are constant across all elements of type T (this is not true for all types, e.g. strings). If mS + nE < H then it is faster to perform a linear search lookup for some key than to compute the hash and perform a direct lookup. The average case of a linear search for a key that exists in the map is half of the worst case. Now if you have a simple type (e.g. integers) then equality is very cheap and constant time, and computing the hash is (surprisingly) expensive (perhaps because of the DoS resilience?), so a linear search is faster for integer-keyed hashmaps with size <= ~ 20 items. It gets more complicated for other types, such as strings, because the equality operation doesn't take constant time and/or the hash operation doesn't take constant time. (Although for strings I think it is still the case that the hash operation always takes longer than the equality operation.) Of course all of this is moot if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe, it depends on a lot of variables, specially an expensive hasher. You can try something like this, but I'm afraid calculating VacantEntryState on misses is too expensive (edit: or not, as you don't need to compare keys anymore).
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So then it would be faster when searching for keys that exist, but slower when searching for keys that don't exist, right? I don't think that optimisation is worth it if it slows down the case when a key is not found. |
||
let hash = self.make_hash(q); | ||
search_hashed_nonempty(&self.table, hash, |k| q.eq(k.borrow())) | ||
} else { | ||
InternalEntry::TableIsEmpty | ||
} | ||
} | ||
|
||
#[inline] | ||
fn search_mut<'a, Q: ?Sized>(&'a mut self, q: &Q) -> InternalEntry<K, V, &'a mut RawTable<K, V>> | ||
where K: Borrow<Q>, | ||
Q: Eq + Hash | ||
{ | ||
let hash = self.make_hash(q); | ||
search_hashed(&mut self.table, hash, |k| q.eq(k.borrow())) | ||
if self.table.capacity() != 0 { | ||
let hash = self.make_hash(q); | ||
search_hashed_nonempty(&mut self.table, hash, |k| q.eq(k.borrow())) | ||
} else { | ||
InternalEntry::TableIsEmpty | ||
} | ||
} | ||
|
||
// The caller should ensure that invariants by Robin Hood Hashing hold | ||
|
@@ -1009,9 +1039,7 @@ impl<K, V, S> HashMap<K, V, S> | |
pub fn entry(&mut self, key: K) -> Entry<K, V> { | ||
// Gotta resize now. | ||
self.reserve(1); | ||
let hash = self.make_hash(&key); | ||
search_hashed(&mut self.table, hash, |q| q.eq(&key)) | ||
.into_entry(key).expect("unreachable") | ||
self.search_mut(&key).into_entry(key).expect("unreachable") | ||
} | ||
|
||
/// Returns the number of elements in the map. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this
search_hashed_nonempty
? Avoiding the extra function.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right