-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add low-level HashTable
API
#466
Conversation
I wonder if it would make sense to have |
I think it's hardly possible to predict what type users will want to store in a table. It can be |
In both I think that this is not possible to fix though: it is similar to arrays where there can only be one mutable reference to an element at a time (although there is The use case I was wondering about from #450 was if you want modify an entry's key (possibly changing its hash), but also check if the new key is already in the map before actually?
|
☔ The latest upstream changes (presumably #468) made this pull request unmergeable. Please resolve the merge conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played with actually converting indexmap
to this. Apart from get_many_mut
noted below, I think I would also need something like Occupied
/VacantEntry::into_table(self) -> &mut HashTable
, because my OccupiedEntry::remove
methods need to adjust other indices in the table.
It would be useful to have fallible variants of |
Optimize hash map operations in the query system This optimizes hash map operations in the query system by explicitly passing hashes and using more optimal operations. `find_or_find_insert_slot` in particular saves a hash table lookup over `entry`. It's not yet available in a safe API, but will be in rust-lang/hashbrown#466. <table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.6189s</td><td align="right">1.6129s</td><td align="right"> -0.37%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2353s</td><td align="right">0.2337s</td><td align="right"> -0.67%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.9344s</td><td align="right">0.9289s</td><td align="right"> -0.59%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.4693s</td><td align="right">1.4652s</td><td align="right"> -0.28%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">5.6606s</td><td align="right">5.6439s</td><td align="right"> -0.30%</td></tr><tr><td>Total</td><td align="right">9.9185s</td><td align="right">9.8846s</td><td align="right"> -0.34%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9956s</td><td align="right"> -0.44%</td></tr></table> r? `@cjgillot`
You mean fallible allocations? Isn't that already addressed by |
98e2f78
to
c33ce2d
Compare
I mean variants which won't grow, like |
I could see an |
It does make sense for |
eq: impl FnMut(&T) -> bool, | ||
hasher: impl Fn(&T) -> u64, | ||
) -> Entry<'_, T, A> { | ||
match self.raw.find_or_find_insert_slot(hash, eq, hasher) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm playing with HashTable and came across a small problem. Can we directly provide some version of the find_or_find_insert_slot
(checked insert) function?
It's very annoying that every time I need to insert a value I have to use the entry syntax (which is not that cheap) or use find + insert_unchecked
, which is slow.
It can be something like this one:
pub fn insert<V>(
&mut self,
hash: u64,
value: T,
mut eq: impl FnMut(&T, &T) -> bool,
hasher: impl Fn(&T) -> u64,
replace: impl FnOnce(&mut T, T) -> V,
) -> Option<V> {
match self
.raw
.find_or_find_insert_slot(hash, |found_val| eq(found_val, &value), hasher)
{
Ok(bucket) => Some(replace(unsafe { &mut bucket.as_mut() }, value)),
Err(slot) => {
unsafe {
self.raw.insert_in_slot(hash, slot, value);
}
None
}
}
}
Then it can be used like:
pub struct NewMap<K, V, S = DefaultHashBuilder, A: Allocator = Global> {
pub(crate) hash_builder: S,
pub(crate) table: HashTable<(K, V), A>,
}
impl<K, V, S, A> NewMap<K, V, S, A>
where
K: Eq + core::hash::Hash,
S: core::hash::BuildHasher,
A: Allocator,
{
pub fn insert(&mut self, k: K, v: V) -> Option<V> {
let hash = make_hash::<K, S>(&self.hash_builder, &k);
let hasher = make_hasher::<_, V, S>(&self.hash_builder);
self.table.insert(
hash,
(k, v),
|found, new| found.0 == new.0,
hasher,
|(_, val_ref), (_, val)| core::mem::replace(val_ref, val),
)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entry
already maps directly to find_or_find_insert_slot
. You can then use Entry::insert
to unconditionally overwrite an existing value, or Entry::or_insert
to only insert a new value if an old one doesn't already exist.
Optimize hash map operations in the query system This optimizes hash map operations in the query system by explicitly passing hashes and using more optimal operations. `find_or_find_insert_slot` in particular saves a hash table lookup over `entry`. It's not yet available in a safe API, but will be in rust-lang/hashbrown#466. <table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.6189s</td><td align="right">1.6129s</td><td align="right"> -0.37%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2353s</td><td align="right">0.2337s</td><td align="right"> -0.67%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.9344s</td><td align="right">0.9289s</td><td align="right"> -0.59%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.4693s</td><td align="right">1.4652s</td><td align="right"> -0.28%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">5.6606s</td><td align="right">5.6439s</td><td align="right"> -0.30%</td></tr><tr><td>Total</td><td align="right">9.9185s</td><td align="right">9.8846s</td><td align="right"> -0.34%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9956s</td><td align="right"> -0.44%</td></tr></table> r? `@cjgillot`
The primary use case for this type over [`HashMap`] or [`HashSet`] is to support types that do not implement the [`Hash`] and [`Eq`] traits, but instead require additional data not contained in the key itself to compute a hash and compare two elements for equality. `HashTable` has some similarities with `RawTable`, but has a completely safe API. It is intended as a replacement for the existing raw entry API, with the intend of deprecating the latter and eventually removing it. Examples of when this can be useful include: - An `IndexMap` implementation where indices into a `Vec` are stored as elements in a `HashTable<usize>`. Hashing and comparing the elements requires indexing the associated `Vec` to get the actual value referred to by the index. - Avoiding re-computing a hash when it is already known. - Mutating the key of an element in a way that doesn't affect its hash. To achieve this, `HashTable` methods that search for an element in the table require a hash value and equality function to be explicitly passed in as arguments. The method will then iterate over the elements with the given hash and call the equality function on each of them, until a match is found.
@bors r+ |
@bors r+ |
☀️ Test successful - checks-actions |
Optimize hash map operations in the query system This optimizes hash map operations in the query system by explicitly passing hashes and using more optimal operations. `find_or_find_insert_slot` in particular saves a hash table lookup over `entry`. It's not yet available in a safe API, but will be in rust-lang/hashbrown#466. <table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.6189s</td><td align="right">1.6129s</td><td align="right"> -0.37%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2353s</td><td align="right">0.2337s</td><td align="right"> -0.67%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.9344s</td><td align="right">0.9289s</td><td align="right"> -0.59%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.4693s</td><td align="right">1.4652s</td><td align="right"> -0.28%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">5.6606s</td><td align="right">5.6439s</td><td align="right"> -0.30%</td></tr><tr><td>Total</td><td align="right">9.9185s</td><td align="right">9.8846s</td><td align="right"> -0.34%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9956s</td><td align="right"> -0.44%</td></tr></table> r? `@cjgillot`
The primary use case for this type over
HashMap
orHashSet
is to support types that do not implement theHash
andEq
traits, but instead require additional data not contained in the key itself to compute a hash and compare two elements for equality.HashTable
has some similarities withRawTable
, but has a completely safe API. It is intended as a replacement for the existing raw entry API, with the intend of deprecating the latter and eventually removing it.Examples of when this can be useful include:
IndexMap
implementation where indices into aVec
are stored as elements in aHashTable<usize>
. Hashing and comparing the elements requires indexing the associatedVec
to get the actual value referred to by the index.To achieve this,
HashTable
methods that search for an element in the table require a hash value and equality function to be explicitly passed in as arguments. The method will then iterate over the elements with the given hash and call the equality function on each of them, until a match is found.