Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACP: key_mut method for BTreeMap entries #356

Open
clarfonthey opened this issue Mar 18, 2024 · 3 comments
Open

ACP: key_mut method for BTreeMap entries #356

clarfonthey opened this issue Mar 18, 2024 · 3 comments

Comments

@clarfonthey
Copy link

Proposal

Problem statement

This is an ACP for a version of rust-lang/rust#112896 that supports both safe (checked) and unsafe (unchecked) mutations in one API.

Motivating examples or use cases

Essentially: we would like to be able to have both a safe and an unsafe version, but would like to make it easy for a user to directly mutate a key rather than having to pass in a closure.

This would be similar to the BinaryHeap::peek_mut wrapper which fulfills a similar task.

Solution sketch

For now, unless I can see another alternative, we'll have to have three types for each entry: one VacantKeyMut, one OccupiedKeyMut, and one KeyMut that is an enum of the two. I haven't investigated the internals of how BTreeMap works well enough to be able to determine if we actually can get away with a single KeyMut API, so, I think the ACP should assume that there would be three of them.

Similar to BinaryHeap::peek_mut, we can obtain the handle using a mutable borrow:

impl<...> Entry<...> {
    pub fn key_mut(&mut self) -> KeyMut<'_, ...> { ... }
}

And, also similarly to BinaryHeap::peek_mut, we would have to perform a bit of trickery to ensure that the handle works correctly when leaked. So, this is effectively how it would go:

  1. When DerefMut is called, part of the map or entry would be removed and stored temporarily in the handle. If the handle is leaked before this happens, no part of the map is leaked, but if it's leaked after this happens, some of the map will be leaked. The result is that regardless of what happens, the state of the map is still consistent. Additionally, any future operations on the entry will likely be part of the leaked map and not the actual map.
  2. When Drop is called, if the handle contains part of the map, the key is verified and the map is restored into the correct state based upon the new value of the key. This means that a vacant entry would have to recompute the new location that the key would be inserted into, and an occupied entry would have to move the existing node to the appropriate position in the map.

However, there's an important third step which is unique to this handle, where this entire validation can be skipped with a method that consumes the handle and restores the map without performing any extra checks:

impl<...> KeyMut<...> {
    pub fn skip_key_validation(self) { ... }
}

Currently, all existing APIs for key_mut use unsafe, but after reading more closely, I don't think that this method should be unsafe, since it's already stated that interior mutability can cause the ordering to break, and this isn't unsafe, just a logic error. We could make it unsafe to indicate that you shouldn't do this, but at least for now, I'd like to propose making it safe and then we can talk about it.


A final point of discussion here is whether this is acceptable from the unchecked point of view, since there are additional operations added compared to the plain key_mut method. I believe that these can easily be optimised out, since the order of operations would be roughly:

  1. spot_in_entry = Some(mem::take(&mut spot_in_map))
  2. *key = new_key
  3. if let Some(thing) = spot_in_entry { spot_in_map = mem::take(spot_in_entry) }

Which could reorder to:

  1. *key = new_key
  2. spot_in_entry = Some(mem::take(&mut spot_in_map))
  3. if let Some(thing) = spot_in_entry { spot_in_map = mem::take(&mut spot_in_entry) }

Remove the branch:

  1. *key = new_key
  2. spot_in_entry = Some(mem::take(&mut spot_in_map))
  3. spot_in_map = mem::take(&mut spot_in_entry) }

Notice the double-swap, and then remove it. Of course, we'd have to be careful about how the code is written to help facilitate this, but I think that at worst, we'd probably end up with a constant number of extra writes, which is still substantially faster than O(log(N)) comparisons.

Alternatives

A few here:

  1. We don't add any key mutation APIs. I disagree with this approach, since these APIs were one of the key motivating factors for the cursors API in BTreeMap cursors #141.
  2. We make the skip_key_validation method unsafe. Again, I disagree because this can't actually cause UB, but this is a valid alternative.
  3. We go with the previous plan, which is to add a separate key_mut_unchecked method offering &mut K and a key_mut method which accepts a closure. I'm not a fan of the closure-based API, and BinaryHeap::peek_mut does provide precedent here, but this is an alternative.

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
@programmerjake
Copy link
Member

I know it was mentioned somewhere else, but one possible reason that unsafe would still be desired is that allows unsafe code to depend on BTreeMap<TrustedType, T> being sorted, duplicate free, etc. if it can unsafely trust TrustedType (e.g. if TrustedType is really i32)

@clarfonthey
Copy link
Author

clarfonthey commented May 15, 2024

I think that's a reasonable argument, but it feels like an unprecedented way to define unsafety in the standard library. The weird reliance on this particular method being unsafe to uphold a safety contract when other related methods that also must uphold that contract are not unsafe (particularly, the Ord implementation), feels just wrong.

Additionally, it denies the opportunity to use the method in code bases that wants to avoid unsafe code, since it cannot actually trigger unsafety on its own and would otherwise be fine to use in these code bases.

@clarfonthey
Copy link
Author

clarfonthey commented Dec 8, 2024

Okay, so, I was poking at maybe implementing this, and realised that there was an issue with my original idea.

Basically, the issue arises when you mutate a key such that it now overlaps with one of the keys in the map. This kind of prevents us from having a fn(&mut self) -> KeyMut<'_, …> method as proposed, since when the handle is dropped, it can't return control to a valid entry since the entry is no longer valid. Panicking in this situation would also not be desired.

I'm thinking up some solutions to this, but the one that seems the most reasonable is to simply take the entry by value and then offer methods to get the entry back after you've mutated the key. Sketching this out:

impl OccupiedKeyMut {
    fn skip_validation(self) -> OccupiedEntry;
    fn move_entry(self) -> Result<OccupiedEntry, KeyMut>;
    fn replace_entry(self) -> (OccupiedEntry, Option<V>);
}
impl VacantKeyMut {
    fn skip_validation(self) -> VacantEntry;
    fn move_entry(self) -> Entry;
}
impl KeyMut {
    fn skip_validation(self) -> Entry;
    fn move_entry(self) -> Result<Entry, KeyMut>;
    fn replace_entry(self) -> (Entry, Option<V>);
}

Note that because we're consuming the entry, vacant entries can simply become occupied entries if they move onto an already-used key, but since the map can't have duplicates, occupied entries will fail to overlap with each other.

For vacant entries, the implementation is fairly simple: mutating the key inside the entry does nothing because you need to either explicitly validate it, or ignore the validation, to get a new entry. So, even if we mutate the key, we can't disrupt the ordering of the map, since we never mutated inside the map anyway.

For occupied entries, the implementation is a bit more complicated. What seems like the easiest solution (i.e., one that can be most likely optimized out) is to immediately store the map's root node inside the smart ref whenever the key is mutated. If the handle is leaked at this point, the entire map is leaked, but the resulting empty map is still valid. Even though the leaked map may not have the right ordering, this is fine because it's leaked and thus inaccessible.

I'm thinking that the default for just dropping the KeyMut is to call replace_entry, which will overwrite any overlapping entry and return its value. If you drop the KeyMut without calling this method, it just drops the value from that entry, which feels fair. We can mark the structure with #[must_use] to help push people toward actually calling replace_entry or move_entry depending on what they want to do.


Also, after writing this, there's another option: keep a copy of the original key for the entry in occupied entries too. This increases the size of occupied entries, but not by much, and it just delays when we drop it in most cases. However, it means that we can safely mutate this key inside the entry without affecting the map, and then dropping or forgetting the mutable key handle will just have the effect of doing nothing to the actual map. This feels like a better option, but I figure I might as well present both.


Also, final mention: it feels like this API is getting complicated to the point where having multiple methods to directly mutate the key, e.g. with a closure, might be more desirable. Again, will leave that up to libs-api to decide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants