Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for map_try_insert #82766

Open
1 of 3 tasks
Tracked by #16
m-ou-se opened this issue Mar 4, 2021 · 38 comments
Open
1 of 3 tasks
Tracked by #16

Tracking Issue for map_try_insert #82766

m-ou-se opened this issue Mar 4, 2021 · 38 comments
Labels
A-collections Area: `std::collection` C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@m-ou-se
Copy link
Member

m-ou-se commented Mar 4, 2021

Feature gate: #![feature(map_try_insert)]

This is a tracking issue for BTreeMap::try_insert and HashMap::try_insert.

Unlike .insert(), .try_insert() does not overwrite existing values, and has a meaningful error type with more context. See #82764

Public API

// alloc::collections::btree_map

impl<K: Ord, V> BTreeMap<K, V> {
    pub fn try_insert(&mut self, key: K, value: V) -> Result<&mut V, OccupiedError<'_, K, V>>;
}

pub struct OccupiedError<'a, K: 'a, V: 'a> {
    pub entry: OccupiedEntry<'a, K, V>,
    pub value: V,
}

impl<K: Debug + Ord, V: Debug> Debug for OccupiedError<'_, K, V>;
impl<'a, K: Debug + Ord, V: Debug> Display for OccupiedError<'a, K, V>;
impl<'a, K: Debug + Ord, V: Debug> Error for OccupiedError<'a, K, V>;

// std::collections::hash_map

impl<K: Eq + Hash, V, S: BuildHasher> HashMap<K, V, S> {
    pub fn try_insert(&mut self, key: K, value: V) -> Result<&mut V, OccupiedError<'_, K, V>>;
}

pub struct OccupiedError<'a, K: 'a, V: 'a> {
    pub entry: OccupiedEntry<'a, K, V>,
    pub value: V,
}

impl<K: Debug, V: Debug> Debug for OccupiedError<'_, K, V>;
impl<'a, K: Debug, V: Debug> fmt::Display for OccupiedError<'a, K, V>;
impl<'a, K: Debug, V: Debug> Error for OccupiedError<'a, K, V>;

Steps / History

Unresolved Questions

@m-ou-se m-ou-se added A-collections Area: `std::collection` T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels Mar 4, 2021
@m-ou-se m-ou-se changed the title Tracking Issue for XXX Tracking Issue for map_try_insert Mar 4, 2021
@alecmocatta
Copy link
Contributor

Is there a reason why this isn't implemented for {Hash,BTree}Set as well?

@m-ou-se
Copy link
Member Author

m-ou-se commented Mar 21, 2021

Is there a reason why this isn't implemented for {Hash,BTree}Set as well?

@alecmocatta The sets have a different kind of insert method which doesn't replace any existing value. (They have a separate replace method for that.) So in a way, their insert is already a try_insert, except it returns a bool rather than a Result.

@alecmocatta
Copy link
Contributor

alecmocatta commented Mar 21, 2021

Personally I feel the motivation still stands.

let x = map.insert(0, ());
assert!(x.is_none());
let x = set.insert(0);
assert!(x);

vs

map.try_insert(0, ()).unwrap();
set.try_insert(0).unwrap();

The latter communicate "panic on duplicate" much more obviously IMO. Plus HashSet::try_insert would save me having to look up "does true mean it's a duplicate, or is it false?" every single time.

@nagisa
Copy link
Member

nagisa commented Aug 17, 2021

I think it is important that this API returns the K value that was supplied to the function. Right now it is Dropped internally and the key from the entry in the map is returned.

@linclelinkpart5
Copy link

Seems like this could benefit from a lazy try_insert_with variant as well, where the value is computed from a closure only if the key isn't found?

@Luro02
Copy link
Contributor

Luro02 commented Sep 27, 2021

Is the name of the method still open for debate?

I found myself confusing it with the try methods of for example Vec.
I expected HashMap::try_insert to return an error if it failed to allocate the required space for the new item, so I was a bit astonished that this method has a different use.

This is what I expected the method to look like;

pub fn try_insert(&mut self, key: K, value: V) -> Result<Option<V>, TryReserveError>;

Maybe it would be better to rename it to something like HashMap::try_insert_or_keep?

@safinaskar
Copy link
Contributor

@Luro02 . I agree. If we ever add method, which will report both "already exists" and "allocation error" errors, it will be called try_try_insert, and this is very bad

@c410-f3r
Copy link
Contributor

c410-f3r commented Jan 21, 2022

HashMap::try_insert -> Tries to insert a key-value pair into the map (https://doc.rust-lang.org/std/collections/struct.HashMap.html#method.try_insert).

From my experience, any try_* function/method from any situation simply means fallible. If there is a desire to segregate different prefixes from different contexts then the majority of all possible scenarios should be considered (which I personally don't think is worth it).

@Stargateur
Copy link
Contributor

I think insert_if_vacant is way better, try is quite confusing with allocation error on collection. Also, try_ is too much generic on this case of specific "only if vacant".

The downside of this is it's rather long. Since even VacantEntry use insert() I think we should keep insert() key word.

My proposition is insert_vacant(). (aka insert if vacant entry).

@joshtriplett
Copy link
Member

I do think we need some way to handle the out-of-memory case. We don't have to solve that with this method, but I think we should think about naming with that in mind.

One naming possibility: insert_mut, since it's like insert but returns a &mut (or an error for the occupied case).

@cuviper
Copy link
Member

cuviper commented Jan 23, 2022

Maybe insert_new?

@Stargateur
Copy link
Contributor

I do think we need some way to handle the out-of-memory case. We don't have to solve that with this method, but I think we should think about naming with that in mind.

One naming possibility: insert_mut, since it's like insert but returns a &mut (or an error for the occupied case).

I think it's a terrible idea:

  • this pattern is always used for something with the same behavior but just return a mut reference, this feature doesn't share the same behavior of insert()
  • doesn't hint about the behavior "set only if vacant" (critical for me)

I think such name would be error prone.

@Mathnerd314
Copy link

I offer the name nonoverwriting_insert, it's a bit long but IMO more clear than the names proposed so far.

@senneh
Copy link

senneh commented May 24, 2022

Option<T> has three methods called get_or_insert, get_or_insert_default and get_or_insert_with

I propose we take the same naming. Essentially they do the same thing, which is: get the thing already inside or insert a new one, except the naming for this is clearer. It also help to have consisted naming across different types.

I'm also not convinced that returning an OccupiedError is necessary. It seems to overly complicate the api without any real benefit. If you want to know if a new insertion happened you could achieve the same result with.

let mut occupied = false;
let val : &mut Value = my_map.get_or_insert_with(key, ||{ occupied = true; Value::new() });

@oberien
Copy link
Contributor

oberien commented Jun 23, 2022

I'm also not convinced that returning an OccupiedError is necessary. It seems to overly complicate the api without any real benefit. If you want to know if a new insertion happened you could achieve the same result with [HashMap::get_or_insert_with]

You can already do that with map.entry(key).or_insert_with(|| …). The try_insert function's main reason to exist is to make exactly the use-case of inserting only if something doesn't exist and getting back a Result easier.

@cyqsimon
Copy link
Contributor

cyqsimon commented Nov 3, 2022

In my opinion there are several issues that need to be addressed:

  1. I'm heavily in favour of renaming to insert_vacant for its descriptiveness. Also, if it is your intention to only insert when the KV-pair is absent, then the case where the KV-pair exists is not really an error semantically per-se (which is a little inconsistent with other try_* functions).
  2. I am in agreement with many here that it is overkill to have a separate OccupiedError type. This is more than sufficient:
// returns Some(value) if KV-pair already exists
pub fn insert_vacant(&mut self, key: K, value: V) -> Option<V>;
  • There is precedence for using Option this way: HashSet::replace and BTreeSet::replace (although the operation is semantically opposite).
  • You might have noticed the return type does not contain a mutable reference, which brings me to the next issue:
  1. Why return a mutable reference? This behaviour is not conveyed or implied by the function name try_insert (or insert_vacant), nor is it consistent with the behaviour of insert. Much better is to simply return "nothing" (semantically) when insert is successful.

I do understand that in some situations it would be very useful to get a reference to the inserted value. In fact this need is exactly what brought me here. That being said, I think it should most definitely be a separate function with a more descriptive name. I have put together a RFC for this which you can find here.

@JohnTheCoolingFan
Copy link
Contributor

Suggestion: returned error might contain the owned key as well as the value.

@Stargateur
Copy link
Contributor

Suggestion: returned error might contain the owned key as well as the value.

That something we could add in https://doc.rust-lang.org/std/collections/hash_map/struct.OccupiedError.html

@martinthomson
Copy link

I find that I often need something like this, but it isn't a great fit. The pattern I see in a few places is maybe fundamentally different to this, but maybe more like.

impl<K: Eq + Hash, V, S> HashMap<K, V, S> {
    pub fn get_or<Q: ?Sized>(&mut self, k: &Q, v: V) -> Option<&V>where K: Borrow<Q>, Q: Hash + Eq { ... }
    pub fn get_mut_or<Q: ?Sized>(&mut self, k: &Q, v: V) -> Option<&mut V>where K: Borrow<Q>, Q: Hash + Eq { 
        if let Some(v) = self.get(k) {
            v
        } else {
            self.entry(k.to_owned()).or_insert(v)
        }
    }
    pub fn get_or_with<Q: ?Sized, F: FnOnce() -> V>(&mut self, k: &Q, f: F) -> Option<&V>where K: Borrow<Q>, Q: Hash + Eq { ... }
    pub fn get_mut_or_with<Q: ?Sized, F: FnOnce() -> V>(&mut self, k: &Q, f: F) -> Option<&mut V>where K: Borrow<Q>, Q: Hash + Eq { ... }
impl<K: Eq + Hash, V: Default, S> HashMap<K, V, S> {
    pub fn get_or_default<Q: ?Sized>(&mut self, k: &Q) -> Option<&V>where K: Borrow<Q>, Q: Hash + Eq { ... }
    pub fn get_mut_or_default<Q: ?Sized>(&mut self, k: &Q) -> Option<&mut V>where K: Borrow<Q>, Q: Hash + Eq { ... }
}

That is, I want to Entry::or_insert[_with|_default] without having to pass an owned key, as HashMap::entry() requires. That's fine when K : Copy, but I don't always have that.

@c410-f3r
Copy link
Contributor

What is the current status of this feature?

@jacks0n9

This comment was marked as off-topic.

@John-Nagle
Copy link

What, it's 2024 now and we still don't have either try_insert or insert_if_vacant? This is embarrassing. I still have to code a get followed by an insert.

@IsaacShelton
Copy link

IsaacShelton commented Mar 4, 2024

Personally, I'd love to see something like this that takes a function both for the key and the value

let my_map: HashMap<String, Vec<String>> = HashMap::default();
my_map.get_or_insert_with(&firstname, |key| key.clone(), || Vec::with_capacity(1024));

Since currently, there isn't a way to do "get_or_insert" without looking the key up twice or pre-cloning the key.

@JohnTheCoolingFan
Copy link
Contributor

Usually the naming for such method would be get_or_insert_with

@IsaacShelton
Copy link

IsaacShelton commented Mar 4, 2024

Yeah naming it get_or_insert_with would be better, some others have been suggesting a get_or_insert_with as taking only a function to compute the value and not the key, but I think this version makes more sense for that name.

Then get_or_insert_with_value or get_or_insert_with_key could also be a thing (although probably not needed)

@ChrisJefferson
Copy link
Contributor

ChrisJefferson commented Apr 15, 2024

I like this functionality, I wouldn't mind a name change, but nothing else is required.

However, I would personally perfer OccupiedError be changed from:

pub struct OccupiedError<'a, K: 'a, V: 'a, A: Allocator + Clone = Global> {
    /// The entry in the map that was already occupied.
    pub entry: OccupiedEntry<'a, K, V, A>,
    /// The value which was not inserted, because the entry was already occupied.
    pub value: V,
}

To:

pub struct OccupiedError<K, V> {
    /// The key in the map that was not inserted, as it was already present.
    pub entry: K,
    /// The value which was not inserted, because the entry was already occupied.
    pub value: V,
}

I can see both are useful -- depends on if someone wants the entry, to maybe do something else with, or they want the key they passed in.

One big advantage of this change it it removes the lifetime. With the current OccupiedError, I find it hard to use ? to return the error, or wrap it in anyhow, as now I need to make sure the HashMap lasts as long as the error.

@programmerjake
Copy link
Member

why not return both the key/value you tried to insert and the OccupiedEntry? i'd expect the key to just be dropped by entry anyways, so returning both wouldn't need to clone the key

@pronebird
Copy link

Yeah naming it get_or_insert_with would be better, some others have been suggesting a get_or_insert_with as taking only a function to compute the value and not the key, but I think this version makes more sense for that name.

Then get_or_insert_with_value or get_or_insert_with_key could also be a thing (although probably not needed)

In my experience get_or_insert do not error and have a signature akin to this:

fn get_or_insert<T>(&mut self, default: T) -> &T

try_insert() on the other side, errors if the entry already exists, which is helpful in scenarios when only one entry can exist and should be immutable, i.e:

if map.contains_key(&key) {
  Err(KeyExistsError)
} else {
  let old_value = map.insert(key, new_value);
  assert!(old_value.is_none())
  Ok(())
}

With try_insert() this becomes a oneliner.

@Quba1
Copy link

Quba1 commented Aug 18, 2024

Pardon my beginner question, but why the proposed try_insert() return &mut V as ok value?

As I understand it's supposed to be analogous to insert() but without overwriting and using Result instead of Option. If that's the case then insert() doesn't return anything upon successful insertion and try_insert() counterpart should be Ok(()).

This doesn't change much as &mut V can be simply left unused. But I'm trying to understand the reason for that API to maybe learn something new or if that return type ends up not needed then it could help with naming.

@Stargateur
Copy link
Contributor

Stargateur commented Aug 18, 2024

The thread seem to confuse the purpose of this function, that why a renaming is more than vital, it seems get_or_insert would be a terrible idea. to be clear the goal is to be analog to:

use std::collections::hash_map::{Entry, HashMap};

fn main() -> Result<(), ()> {
    let mut foo = HashMap::new();

    // try_insert
    let _v = match foo.entry("toto") {
        Entry::Occupied(_) => return Err(()),
        Entry::Vacant(v) => Ok(v.insert(())),
    }?;
    
    Ok(())
}

Someone can use entry like that to avoid double lookup.

@Quba1 it's always a good practice to return the thing the user created in a collection. The user can then ignore it or not. You push an item on a vector but want to use it afterward you need to do a last() unwrapping the result even if you know there is an item since you just push it. It's very functional and a good thing to return the item. Compiler job will do the necessary optimization if you don't use the value.

@Quba1
Copy link

Quba1 commented Aug 18, 2024

How about try_insert_without_overwrite()?

Maybe it's a good idea to make a poll to move the naming issue forward?

@Stargateur that makes a lot of sense. thanks for explaining

@GF-Huang

This comment was marked as spam.

@jacks0n9
Copy link

jacks0n9 commented Sep 1, 2024

why was this marked as spam lol

@GF-Huang
Copy link

GF-Huang commented Sep 1, 2024

why was this marked as spam lol

I believe that Musk is right to lay off 90% of his employees. This problem has not been solved for 3 years.

@jacks0n9
Copy link

jacks0n9 commented Sep 1, 2024

yeah how are they still talking about what to name it? just hover over the function

@camsteffen
Copy link
Contributor

Today I went to use this with Option but then I looked it up and saw that it's actually a method on HashMap.

@giorgiga
Copy link

This function adds an entry, erroring out if the new key conflicts with an existing one... That's called insert :)

The current insert actually performs what other languages (those that don't have a special syntax for it) call "put" (Java) or "add" (OCaml), and I would probably just call "set" (*).

With this I'm not saying the current insert should be renamed (I guess it would be too impractical?)... I just want to point out that if insert_new and other proposals sound a bit weird, it's probably because the "natural" name of this operation is already taken.


(*) SQL calls this same operation "merge" (or sometimes "upsert"), but I'd say those names make little sense in a hashmap with only two "columns".

@jacks0n9
Copy link

you all are still talking about what to name this? you've been talking about this for three years... i think people are fine with it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: `std::collection` C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests