Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding an Iterator-like Entry API to give sugary methods #353

Closed
huonw opened this issue Oct 5, 2014 · 17 comments
Closed

Consider adding an Iterator-like Entry API to give sugary methods #353

huonw opened this issue Oct 5, 2014 · 17 comments

Comments

@huonw
Copy link
Member

huonw commented Oct 5, 2014

The deprecation of methods like find_or_insert resulted in more verbose code (needing more imports too), e.g. hash_map.find_or_insert(key, vec![]).push(value) becomes

use std::collections::hashmap;

match hash_map.entry() {
     hashmap::Occupied(o) => o.into_mut().push(value),
     hashmap::Vacant(v) => { v.set(vec![value]); }
}

which is slightly annoying. We could have an Entry trait that has an interface sufficient to have default methods for functionality like the old find_or_insert (like Iterator just has the next method which is enough for things like map & collect etc), so the above would become

hash_map.entry().get_or_insert(vec![]).push(value);

This would also mean that all entries get these methods, avoiding the historical problem of HashMap having everything, with TreeMap missing a lot.

(We should keep the requirements for the basic Entry trait as minimal as possible, so it can be used as widely as possible; if there is fancy functionality that needs more restrictions, we can theoretically have extra traits, similar to DoubleEndedIterator etc.)

@huonw
Copy link
Member Author

huonw commented Oct 5, 2014

cc @gankro, @aturon.

(I don't have time to properly consider what the non-default-method interface of Entry should be right now, but for now I just want to avoid forgetting this idea after it came up in passing on IRC.)

@Gankra
Copy link
Contributor

Gankra commented Oct 5, 2014

Yeah providing convenience methods on Entry is something that's come up a few times. It was part of the RFC discussion, where I vaguely recall it was deemed "too much". Possibly just a pre-1.0 thing. I actually implemented a mock find_or_insert on entry as a proof of concept in one of the branches I posted in the RFC. It's super easy to do. Providing find_or_insert would be super reasonable in my mind, since from the code I've seen, you can get away with 99.9% of the functionality you want with that and a dash of clever sequencing. All the others are better off not being included IMO. They're either complicated enough that they should just be done with the full API, or they can be easily written on top of find_or_insert to do the heavy lifting.

However I don't think this merits a trait. We're avoiding adding traits in libcollections at the moment, and if it has a single method, it's not a huge deal to maintain concretely? Dunno if this work should wait for post-1.0 or not.

@huonw
Copy link
Member Author

huonw commented Oct 5, 2014

However I don't think this merits a trait. We're avoiding adding traits in libcollections at the moment, and if it has a single method, it's not a huge deal to maintain concretely?

The point of a trait is to get the implementations automatically, for all entries; just like all iterators automatically get map, fold etc.. With manual methods on each entry type, you can't reuse the code easily and each implementor has to remember to provide all those methods, at the risk of leaving the APIs inconsistent between contains as they are now. It's not a huge problem if all the data structures have entry APIs (unlike now, where this sort of functionality doesn't exist in efficient form on types like TreeMap and TrieMap at all), but it will be annoying.

E.g.

trait Entry<'a, K, V> {
    /// Returns Some if there's already data here.
    fn into_mut(self) -> Result<&'a mut V, Self>;
    fn set(self, value: V) -> &'a mut V;

    fn find_or_insert(self, value: V) -> &'a mut V {
        match self.into_mut() {
            Ok(x) => x,
            Err(e) => e.set(value)
        }
    }
}

Now that I think about it, it's not obvious to me how many other convenience methods there are that make sense (as you say); that said, find_or_insert does more work than necessary due to value not being lazy (yes, I know the collections reform possibly addresses this, although unboxed closures seem to throw a spanner in the works).

(Also, side note: I just noticed, there's no equivalent to find_or_insert_with with the Entry API; that is, if you need to do any computations using the key, you're now sunk... you're forced to compute unconditionally before calling entry, since that consumes the key.)

@Gankra
Copy link
Contributor

Gankra commented Oct 5, 2014

See here for some of my notes on what's up with the current treatment of keys in maps. At very least in the rust repo, the key functionality was never actually used with any of the find_or_* stuff.

Also having to precompute the value when using find_or_insert isn't as big of a deal as you would think. Two of the biggest use cases I saw were "maintain a count" and "maintain a Vec of seen values". Both of those are basically free to make unconditionally (empty Vec doesn't allocate), and can be updated to the right thing unconditionally after the find_or_insert. The other big one "ensure there's some value here for me to do unconditional work with afterwards" is the one where it might matter. But then they can use the full API when it matters.


Ah, I misunderstood precisely what you're proposing. You want to introduce three traits, one for Entry, one for VacantEntry, and one for OccupiedEntry. The latter two would be concretely impled, and the former would be automatically impled.

In fact, you can do one better and only have VacantEntry and OccupiedEntry. Entry can be a concrete type in libcollections shared by all the maps:

enum Entry<K, V, Vacant, Occupied> {
    Vacant(Vacant),
    Occupied(Occupied),
}

This would be the optimal code-reuse configuration. It would also have a nice documentation benefit in that there would be a reasonable place to put the Entry API docs. I'm not opposed to this, although I don't have particularly strong feelings either way.

@huonw
Copy link
Member Author

huonw commented Oct 5, 2014

See here for some of my notes on what's up with the current treatment of keys in maps. At very least in the rust repo, the key functionality was never actually used with any of the find_or_* stuff.

That discussion is not particularly relevant, something like find_or_insert_with just feeds the passed-in key back into the closure to get around the ownership issues, when creating a new value (i.e. there's no old key to worry about). There are actually concrete uses of this functionality in the wild (there's a few just random tests, but most seem to be somewhat "legitimate").

You want to introduce three traits, one for Entry, one for VacantEntry, and one for OccupiedEntry

Well, I was actually only thinking of one trait, which gets implemented for each top level Entry. That is, (ignore the details about file names & import paths)

// hashmap.rs
use std::collections;

enum Entry<'a, K, V> {
    Vacant(VacantEntry<'a, K,V>),
    Occupied(Occupied<'a, K, V>)
}
impl<'a, K, V> collections::Entry<'a, K, V> for Entry<'a, K, V> {
    fn into_mut(self) -> Result<&'a mut V, Self> {
        match self {
            Occupied(o) => Ok(o.into_mut()),
            Vacant(v) => Err(Vacant(v))
        }
    }
    fn set /* etc. */
}

(I don't know if that precise trait definition is sensible for the one-trait scheme.)

But maybe having the generic Entry with two traits is a better approach?

@aturon
Copy link
Member

aturon commented Oct 6, 2014

I want to be very careful here, because:

  • A large part of the motivation for Entry was to cut down the somewhat bewildering group of convenience methods.
  • In looking at the roll-out of Entry throughout std and rustc, it seemed that code did become a bit longer, but also (to my eyes) more clear.

That said, and ignoring the issue of whether to try to put this in a trait, it seems like if you add the suggested basic methods to Entry (which are intuitive, easy to remember, and basically match the API elsewhere):

impl<'a, K, V> Entry<'a, K, V> {
    /// Returns Some if there's already data here.
    fn into_mut(self) -> Result<&'a mut V, VacantEntry<'a, K, V>>;
    fn set(self, value: V) -> &'a mut V;
}

then hash_map.find_or_insert(key, vec![]).push(value) would become:

hash_map.entry(key).into_mut()
    .unwrap_or_else(|v| v.set(vec![]))
    .push(value)

which is getting pretty ergonomic (and computes the default value lazily).

(Note that I adjusted the type of into_mut here to return a VacantEntry on error.)

(Update: I meant unwrap_or_else, not or_else. Note that the lazy variants come of out Result in this strategy; we don't have to provide the same duplication in Entry convenience methods.)

@sfackler
Copy link
Member

sfackler commented Oct 6, 2014

@aturon 👍, though into_mut doesn't seem to be quite the right name for that.

@aturon
Copy link
Member

aturon commented Oct 6, 2014

@sfackler, interesting point -- we don't have an explicit convention for conversions that can fail, but from_utf8 is an important example that just uses the normal conversion convention. Food for thought.

@huonw
Copy link
Member Author

huonw commented Oct 6, 2014

Hm, I wonder if we could use Result<Occupied, Vacant> as our "Entry" types directly, so we get all the convenience methods on it directly. (That is fn entry(&mut self, Key) -> Result<Occupied, Vacant>.)

The largest downside I can think of is maybe Err is a little harsh for Vacant (that is, it's not an error to be empty)? Oh, and not having whole types to which to attach documentation.

@aturon
Copy link
Member

aturon commented Oct 6, 2014

That's worth thinking about, although it would actually worsen the ergonomics in this case:

This

hash_map.entry(key)
    .into_mut()
    .unwrap_or_else(|v| v.set(vec![]))
    .push(value)

would become this:

hash_map.entry(key)
    .map(|o| o.into_mut())
    .unwrap_or_else(|v| v.set(vec![]))
    .push(value)

@Gankra
Copy link
Contributor

Gankra commented Oct 11, 2014

I'm tempted to let this sit for a month or so to let people really explore the usage state space (I've already seen some cool stuff I never anticipated before).

That said, I think the proposed into_mut is a pretty reasonable extension, since I think it captures the standard "make sure something's here" pattern. Although that means there's three different ways to express the same idea:

// Disjoint style
let stuff = match map.entry(key) {
  Vacant(entry) => entry.set(vec![data]),
  Occupied(entry) => {
    *entry.get_mut().push(data);
    entry.into_mut()
  }
}
// Minimal init style
let stuff = match map.entry(key) {
  Vacant(entry) => entry.set(vec![]),
  Occupied(entry) => entry.into_mut(),
}
stuff.push(data);
// Functional style
let stuff = map.entry(key)
              .into_mut()
              .unwrap_or_else(|e| e.set(vec![]));
stuff.push(data);

Which is a bit unfortunate, although I'm starting to lean against the "disjoint" style as awkward.

With into_mut it would mean not having to import the enum variants to get the basic functionality, which is a huge win IMO. However with my proposed refactor it would mean importing the Vacant trait. @sfackler is working on a proposal that may resolve that issue though (a mechanism for making some trait impls "inherent"). As a shim, @huonw and @sfackler have suggested a decent hack to simulate that behaviour:

impl Foo {
  pub fn bar(&self) { ... }
}

impl Bar for Foo {
  // calls concrete version because inherent impls win in concrete case.
  fn bar(&self) { self.bar() }
}

@Gankra
Copy link
Contributor

Gankra commented Nov 5, 2014

@cmr has also suggested OccupiedEntry impl'ing Deref(Mut). This is an interesting idea.

@aturon
Copy link
Member

aturon commented Nov 5, 2014

@gankro I remember hearing that suggestion before -- maybe by @huonw? I think it's worth trying out and seeing how it makes the client code feel.

@Gankra
Copy link
Contributor

Gankra commented Nov 25, 2014

Just a heads up to those interested. This is my working proposal API 2.0:

The old Entry API:

impl Map<K, V>
    fn entry<'a>(&'a mut self, key: K) -> Entry<'a, K, V>
pub enum Entry<'a, K: 'a, V: 'a> {
    Occupied(OccupiedEntry<'a, K, V>),
    Vacant(VacantEntry<'a, K, V>),
}
impl<'a, K, V> VacantEntry<'a, K, V>
    fn set(self, value: V) -> &'a mut V

impl<'a, K, V> OccupiedEntry<'a, K, V>
    fn get(&self) -> &V
    fn get_mut(&mut self) -> &mut V
    fn into_mut(self) -> &'a mut V
    fn set(&mut self, value: V) -> V
    fn take(self) -> V

Based on feedback and collections reform landing, we propose the following new API:

impl Map<K, V>
    fn entry<O: ToOwned<K>>(&'a mut self, key: O) -> Entry<'a, O, V>
pub enum Entry<'a, O: 'a, V: 'a> {
    Occupied(OccupiedEntry<'a, O, V>),
    Vacant(VacantEntry<'a, O, V>),
}
impl Entry<'a, O: 'a, V:'a> {
    get(self) -> Result<&'a mut V, VacantEntry<'a, O, V>>
}
impl<'a, K, V> VacantEntry<'a, K, V>
    fn insert(self, value: V) -> &'a mut V

impl<'a, K, V> OccupiedEntry<'a, K, V>
    fn into_mut(self) -> &'a mut V
    fn insert(&mut self, value: V) -> V // aturon: not sure about naming here
    fn remove(self) -> V

impl Deref<V> for OccupiedEntry<'a, O, V> 
impl DerefMut<V> for OccupiedEntry<'a, O, V>

@utkarshkukreti
Copy link

FYI: Cargo uses .entry 13 times as of commit 70f5205, 9 of which can be converted into 1 liners using get_or_insert, 2 into get_or_insert_with 2 liners, and 2 use statements can be removed from 6 files. (raw data)

@Gankra
Copy link
Contributor

Gankra commented Feb 3, 2015

We got get on the enum in collections v2. Can this be closed?

@aturon
Copy link
Member

aturon commented Mar 24, 2015

Closing, this is done.

@aturon aturon closed this as completed Mar 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants