-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: add raw_entry API to HashMap #50821
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
@gankro pick a reviewer who is competent pls |
The actual impl is ~trivial since everything that was needed is already in hashmap for implementation reasons. So this mostly just needs API review from the @rust-lang/libs team. |
I'm a bit worried removing the |
Oh. Not requiring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks and sorry for the delay @gankro! Since these are all starting as unstable it's ok to not get a full libs-team sign-off because we'll do that before stabilization anyway. I do think that we'll want these in the long run so it seems good to add them to libstd to start experimenting with them.
I'll admit though that I haven't followed the RFC too too closely so this was the first time I was looking at a number of these APIs. I was expecting something to be unsafe
due to the "raw" terminology but it ended up not being so! I was then a little perplexed how this was sort of a standalone API from the existing Entry
API, but I think that's because Entry
basically implies an owned key stored internally, right?
src/libstd/collections/hash/map.rs
Outdated
/// | ||
/// Immutable raw entries have very limited use; you might instead want `raw_entry`. | ||
#[unstable(feature = "raw_entry", issue = "42069")] | ||
pub fn raw_entry_immut(&self) -> RawImmutableEntryBuilder<K, V, S> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming-wise could this perhaps be raw_entry
and the one above be raw_entry_mut
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm split on it
raw_entry
/raw_entry_mut
is absolutely more idiomatic, but entry
matches raw_entry_mut
and raw_entry_mut
is the really important one.
I'm definitely being "weird" here, and am willing to relent if anyone feels strong about it.
src/libstd/collections/hash/map.rs
Outdated
/// assert_eq!(map["poneyland"], 22); | ||
/// ``` | ||
#[unstable(feature = "raw_entry", issue = "42069")] | ||
pub fn or_insert(self, default_key: K, default_val: V) -> (&'a mut K, &'a mut V) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW Entry::or_insert
only returns &mut V
so this is somewhat inconsistent with that, albeit more general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's definitely an intentional design decision, but I could be convinced to go back
src/libstd/collections/hash/map.rs
Outdated
/// use std::collections::hash_map::Entry; | ||
/// | ||
/// let mut map: HashMap<&str, u32> = HashMap::new(); | ||
/// map.entry("poneyland").or_insert(12); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A number of these doc strings I've noticed are using entry
, but I think they may want to move towards raw_entry
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, this was just copy-pasted junk i hadn't updated yet
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Ping from triage, @gankro ! You got some test failures and had a question from your reviewer. |
@gankro: We haven't heard from you in two weeks, so we're closing this PR for now. Feel free to re-open in the future. |
will pick this up again in a bit, just had a productive meeting with @aturon:
|
@gankro Are you still working on this? I am interested in taking over if you are too busy to continue working on it. |
/// In particular, the hash used to initialized the raw entry must still be | ||
/// consistent with the hash of the key that is ultimately stored in the entry. | ||
/// This is because implementations of HashMap may need to recompute hashes | ||
/// when resizing, at which point only the keys are available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is worth pointing out that this specific case isn't actually a concern for the current implementation because it always stores the full hash. Rather the main issue is the failure case described below in which the entry is in the wrong place and thus becomes "lost".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is true for the current implementation, this might change in the future and code should not rely on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make sure that this section doesn't come across as saying that bogus hashes are OK as long as the HashMap isn't resized.
Actually, reading that paragraph again, it isn't clear to me why the case described would be a problem: if the HashMap is current in an invalid state due to bogus hashes then rehashing everything would fix it, not break it further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider a type Foo(u32, u32)
, who's Hash impl is just Foo.0.hash(), and you use this API to actually feed in Foo.1.hash().
If resizing triggers a rehash, then your code will work perfectly reasonably until a resize triggers, at which point everything will be moved to the location indicated by Foo.0, while you're still performing lookups with Foo.1. All existing keys will appear to "vanish" from the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see what you are saying. If you exclusively use the raw entry API with a bogus hash function things will work until elements are rehashed using the real hash function. Using the normal get/insert/entry functions would be have the opposite problem of working only once the resize happened.
@Amanieu please do! |
Also here is an example of usage: https://gist.github.com/Gankro/fb0bfe6f6770aba09b9a1cdf0ecf47e0 |
mem::replace(self.get_mut(), value) | ||
} | ||
|
||
/// Sets the value of the entry, and returns the entry's old value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: value -> key
How would people feel extending this API to provide a This addition would probably also require the |
Is this API available in some extern crate by any chance? :-) I'd like to see if I can implement a string interner on top of it as struct Interner {
/// Concatenation of all interned strings
data: String
/// A `HashSet<&'data str>` of interned strings,
/// but without those annoying ( :) ) lifetimes.
mapping: HashSet<(u32, u32)>
} |
@fintelia no that would expose too many implementation details that we're not willing to commit to. |
@gankro could you clarify which implementation details you'd be concerned about committing to? The very first line of the HashMap documentation already states that it is:
Unless I'm missing something, it seems like that should be sufficient to guarantee that there is at least some way to map from indices to raw entries. |
That sentence is documenting the current implementation, not guaranteeing that will be the implementation for all time. |
@Amanieu Have you had a chance to look at taking this PR and fixing the remaining problems with it? |
@Mark-Simulacrum Actually after talking with @gankro I won't be working on this. Someone else can take over if they want. |
where M: DerefMut<Target = RawTable<K, V>>, | ||
F: FnMut(&mut K) -> bool | ||
{ | ||
// This is the only function where capacity can be zero. To avoid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function and the original search_hashed functions are the only two where capacity can be zero. Both comments should be updated...
I'm willing to take over this PR. What is still remaining to do? @jonhoo sadly, the PR in the current form cannot support rahashmap's central feature: there is no way to get a random element from the map. Specifically, a raw_entry can't be constructed without knowing its exact hash in advance, though if RawVacantEntry were to have a next() function that might be enough... |
@alexcrichton @gankro What is required for me to take over this PR? |
@fintelia The easiest is probably to fetch this PR’s branch, rebase/rework it as needed, open a new PR, and link the new PR from here. Thanks for volunteering! |
I think my comments/checklist hold accurate |
Add raw_entry API to HashMap This is a continuation of #50821.
Add raw_entry API to HashMap This is a continuation of #50821.
This is an implementation of https://internals.rust-lang.org/t/pre-rfc-abandonning-morals-in-the-name-of-performance-the-raw-entry-api/7043 with some minor tweaks.
TODO:
K: Eq
requirement from raw_entry? (currently exists to satisfy internal APIs, but I don't think it's strictly needed)