-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement placement-in protocol for HashMap
#40390
Conversation
While this works technically, the implementation is not correct. The point of the placement-in protocol is to put value directly into some place, in this case into the To implement this you will likely need to do some internal changes to the Entry(-ies), so it would be possible to obtain a pointer for both |
Thank you for the review comment! cc @arthurprs Please correct me if anything is wrong. I used a temporary field to store the value because of panic safety. AFAK, if the I'm looking forward to your suggestions. |
Your suggestion sounds ok to me. It will avoid unnecessary V copies for Entry::Vacant. To avoid any unnecessary V copies for Entry::Occupied you probably need a variant of robin_hood that will make space without copying the uninitialized V into the bucket. For rollback you can implement Drop for EntryPlace (drop still runs in case of panics) and use pop_internal to fix the table if it comes to that (forget what it returns). BinaryHeap uses a similar strategy to avoid corrupting the structure if T comparisons panics. |
Thanks for your suggestion. I have updated the implementation. For now, it can avoid unnecessary V copy for Entry::Vacant. I'll continue to investigate more optimization. |
src/libstd/collections/hash/map.rs
Outdated
issue = "30172")] | ||
pub struct EntryPlace<'a, K: 'a, V: 'a> { | ||
bucket: Option<FullBucketMut<'a, K, V>>, | ||
panicked: Cell<bool>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using a finalized
flag instead. Also, the flag should probably be the last field as it may save 7 bytes of stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are unlikely to have more than one of these per hashmap alive at a time, so this is not very concerning. Also we’re getting field reordering soon, which will do this for everybody automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested below, using forget
can avoid the flag.
src/libstd/collections/hash/map.rs
Outdated
reason = "struct name and placement protocol is subject to change", | ||
issue = "30172")] | ||
pub struct EntryPlace<'a, K: 'a, V: 'a> { | ||
bucket: Option<FullBucketMut<'a, K, V>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something obvious but do we really need to wrap the bucket with Option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just lazy that I want to use existing FullBucket::take
to remove the entry. It takes a self
parameter. But in the drop
method, there is only &mut self
, the bucket field can't move.
It is fixed by adding another FullBucket::remove
method, which takes a &mut self
parameter. In drop
method, I can call this remove
now.
impl<'a, K, V> InPlace<V> for EntryPlace<'a, K, V> { | ||
type Owner = (); | ||
|
||
unsafe fn finalize(self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking a bit more about this you can forget(self)
here, avoiding the flag altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement over the previous version! @arthurprs’ notes seem very relevant (and they are also much more familiar with the HashMap
code), so these should be fixed.
src/libstd/collections/hash/map.rs
Outdated
issue = "30172")] | ||
pub struct EntryPlace<'a, K: 'a, V: 'a> { | ||
bucket: Option<FullBucketMut<'a, K, V>>, | ||
panicked: Cell<bool>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are unlikely to have more than one of these per hashmap alive at a time, so this is not very concerning. Also we’re getting field reordering soon, which will do this for everybody automatically.
I realised there’s one possible alternative in behaviour. Current implementation tries to recover the previous value if the placement expression fails, however it is not obvious to me whether this is a better approach compared to, say, simply making the key vacant in case of panic. Here are some points in favour of leaving the entry vacant instead of restoring the value if panic happens:
|
Very good points, leaving a previous filled bucket empty on panic sounds reasonable. |
cc @rust-lang/libs |
cc @rust-lang/libs, anyone have feedback on @nagisa's last comment? |
I agree that the precise state of the value being modified doesn't matter too much. |
src/libstd/collections/hash/table.rs
Outdated
self.table.size -= 1; | ||
unsafe { | ||
*self.raw.hash = EMPTY_BUCKET; | ||
ptr::read(self.raw.pair); // drop right now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possibly incorrect. I think you’ll notice why if you add a test that looks like this (you probably should one similar to it):
struct Banana<'a>(&'a mut bool);
impl Drop for Banana {
fn drop(&mut self) {
if !*self.0 { panic!("double drop!"); }
*self.0 = false;
}
}
let mut hm = HashMap::new();
let mut can_drop = true;
hm.insert(0, Banana(&mut can_drop));
hm.entry(0) <- panic!("boom") ;
// first drop happens in `make_place`, where the `Banana(true)` gets dropped and `can_drop` is set to false
// then a `*place.pointer() = panic!("boom")` is executed, which unwinds, thus dropping the place
// place destructor drops the `Banana(false)`, and thus double-panic occurs and the process aborts.
//
// In other words, current implementation of Drop reads uninitialized memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: that the code might not reproduce exactly the way I described it, but it is still reading uninitialized memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! Good point. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 more, likely final, tweaks.
src/libstd/collections/hash/table.rs
Outdated
self.table.size -= 1; | ||
unsafe { | ||
*self.raw.hash = EMPTY_BUCKET; | ||
ptr::read(self.raw.pair); // drop right now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: that the code might not reproduce exactly the way I described it, but it is still reading uninitialized memory.
src/libstd/collections/hash/map.rs
Outdated
let b = match self { | ||
Occupied(mut o) => { | ||
let uninit = unsafe { mem::uninitialized() }; | ||
o.insert(uninit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid doing this mem::uninitialized
dance by simply doing a
std::ptr::drop_in_place(o.elem.bucket.read_mut().1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/libstd/collections/hash/map.rs
Outdated
issue = "30172")] | ||
impl<'a, K, V> Drop for EntryPlace<'a, K, V> { | ||
fn drop(&mut self) { | ||
self.bucket.remove(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will drop and uninitialized V as you only inserted the key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, nagisa has mentioned this. I'm fixing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve only got nits left. Marking the functions internal functions as unsafe
makes sense as they leave around uninitialized data which the caller should handle appropriately.
r=me once nits are fixed
src/libstd/collections/hash/map.rs
Outdated
assert_eq!(map.len(), 9); | ||
assert!(!map.contains_key(&100)); | ||
|
||
// correctly drop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably be factored out into a separate test. (i.e. a different #[test]
function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/libstd/collections/hash/table.rs
Outdated
|
||
/// Remove this bucket's key and value from the hashtable. | ||
/// Only used for inplacement insertion. | ||
pub fn remove_key(&mut self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here, whole function unsafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/libstd/collections/hash/table.rs
Outdated
|
||
/// Puts given key, remain value uinitialized. | ||
/// It is only used for inplacement insertion. | ||
pub fn put_key(mut self, hash: SafeHash, key: K) -> FullBucket<K, V, M> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d probably make this whole function unsafe
. (i.e. pub unsafe fn put key
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/libstd/collections/hash/map.rs
Outdated
} | ||
|
||
// Only used for InPlacement insert. Avoid unnecessary value copy. | ||
fn insert_key(self) -> FullBucketMut<'a, K, V> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be unsafe fn
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@bors r+ |
Oh, bors didn’t notice the delegation above :/ |
@bors delegate=nagisa |
✌️ @nagisa can now approve this pull request |
@bors r+ |
📌 Commit 584c798 has been approved by |
Implement placement-in protocol for `HashMap` CC rust-lang#30172 r? @nagisa
CC #30172
r? @nagisa