-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add item recovery collection APIs #1194
Conversation
CC @cmr @eddyb @bluss @seppo0010 (can't remember all the people interested...) |
|
||
# Alternatives | ||
|
||
Do nothing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hold on, to be clear: "Do nothing" here means "Do nothing and let users write such caches via, e.g., HashMap<T, ()>
" ... right?
I don't particular mind adding the functionality described here to HashSet
, but I'm also not sure its strictly necessary, unless I have missed something with how HashMap<T, ()>
would work.
Update: Ah, re-reading the RFC, I now see that our current HashMap
API would not support that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not possible to use HashMap
that way, because it doesn't provide any methods that return &K
(or K
) other than via its iterators.
It'd be cool if you could add another copy of the code that demonstrates the problem but showing how to use your proposed APIs to resolve it. |
@blaenk I'd actually like to use a more concrete motivating example, but I'll add the revised code as well. |
@gankro Do you have any ideas for a better motivating example, like an algorithm that uses a set as a cache? |
@apasel422 that's the usecase in the compiler: sets of hundreds of thousands of elements, used for interning/caching, that have to be identity maps right now, wasting some memory space. |
@eddyb Does that mean, if this RFC is implemented, the compiler can be tweaked to use less memory? |
I've added a WIP implementation of this RFC for |
@apasel422 It could be argued by metaphor to the current naming in #1195 that these methods could just called |
This is a bit more dubious for HashMap; but not crazy. |
@gankro I actually had that same thought a little while ago, but I don't think it fully translates, and would be even weirder for entries: impl<'a, K, V> OccupiedEntry<'a, K, V> {
fn get(&self) -> &V;
fn get_eq(&self) -> (&K, &V); // what does eq have to do with this?
...
}
|
Hmm... Entry does seem to mess things up. That said... is it a tragedy if it's a bit misaligned from everything else? |
I think they should be consistent. Here are some options that work for both maps and occupied entries (in addition to
And here are some options for sets (assuming that the changes in rust-lang/rust#27135 canonicalize "element" over "value" when referring to sets):
|
One common optimization that can't be done in java because of set item recovery is to just store hashes instead of elements, for the case where identity-mapping is not desirable. Are you going to give up this special case? |
@i30817 Rust's |
Mmm makes sense. Still, it's a somewhat common optimization, maybe a bloom filter type could be added to the language. |
@i30817 Off-topic, but [https://crates.io/search?q=bloom filter](https://crates.io/search?q=bloom filter) |
I'm definitely in favor of the ideas laid out here. I have the same motivating problem - a cache of strings. I've used some unsafe code to avoid double-allocating the strings, but I still have essentially a |
How would you feel about modifying |
@bkoropoff That could be done, but it has the problem that a new key is only present for I'm therefore more inclined to add that kind of key-recovery functionality as pub enum Entry<'a, K: 'a, V: 'a> {
Occupied(OccupiedEntry<'a, K, V>, K),
Vacant(VacantEntry<'a, K, V>),
} instead of pub struct OccupiedEntry<'a, K: 'a, V: 'a> {
new_key: K,
// ...
}
impl<'a, K, V> OccupiedEntry<'a, K, V> {
/// Returns the key that was used to acquire this entry.
// This could return `Option<K>` in order to better model the `max_entry` situation
pub fn into_new_key(self) -> K { self.key }
} but that would not be a backwards-compatible change. Additionally, storing the new key in the struct itself has the benefit of allowing us to provide an additional impl<'a, K, V> OccupiedEntry<'a, K, V> {
/// Replaces the entry's key with the one that was used to acquire this entry, if any, and
/// returns the old key.
///
/// This method always return `None` after the first call to it and for all entries
/// acquired through `max_entry` etc.
pub fn replace_key(&mut self) -> Option<K>;
} This adds some complexity to the API surface and makes it harder to reason about what the behavior is. It's possible that we could add what you're proposing in a subsequent RFC instead. |
🔔 HERE YE HERE YE THIS RFC IS ENTERING ITS FINAL COMMENT PERIOD 🔔 |
Sorry to be late to this party (I also had to miss the libs meeting this week). I'm on board with the basic motivation here, and regret the stabilization of the That said, I feel like the RFC is proposing significantly more API expansion than is actually needed to solve the original problem -- in particular, I don't see why any changes to the entry API are needed. Could we instead take the following as a starting point (bikesheds painted in my favorite colors): impl<T> Set<T> {
// Like `contains`, but returns a reference to the element if the set contains it.
fn get<Q: ?Sized>(&self, element: &Q) -> Option<&T>;
// Like `remove`, but returns the element if the set contained it.
fn take<Q: ?Sized>(&mut self, element: &Q) -> Option<T>;
// Like `insert`, but replaces the element with the given one and returns the previous element
// if the set contained it.
fn replace(&mut self, element: T) -> Option<T>;
}
impl<K, V> Map<K, V> {
// Like `get`, but additionally returns a reference to the entry's key.
fn key_value<Q: ?Sized>(&self, key: &Q) -> Option<(&K, &V)>;
// Like `get_mut`, but additionally returns a reference to the entry's key.
fn key_value_mut<Q: ?Sized>(&mut self, key: &Q) -> Option<(&K, &mut V)>;
// Like `remove`, but additionally returns the entry's key.
fn remove_key_value<Q: ?Sized>(&mut self, key: &Q) -> Option<(K, V)>;
// Like `insert`, but additionally replaces the key with the given one and returns the previous
// key and value if the map contained it.
fn replace(&mut self, key: K, value: V) -> Option<(K, V)>;
} In particular, the fact that the entry APIs need an owned key to use (today, at least) seems to make the key-accessing functionality questionable. But maybe I'm missing something? |
@aturon We will presumably want the entry methods once #1195 is accepted, but they could be omitted for now. I think that both RFCs need to be considered together though, and it probably makes sense to avoid a proliferation of impl<K, V> Map<K, V> {
fn get_pair(&self, key: &Q) -> Option<(&K, &V)>;
fn get_entry(&mut self, key: &Q) -> Option<OccupiedEntry<K, V>>;
fn replace(&mut self, key: K, val: V) -> Option<(K, V)>;
fn get_max(&self) -> Option<(&K, &V)>;
fn max_entry(&mut self) -> Option<OccupiedEntry<K, V>>>;
fn get_lt(&self, key: &Q) -> Option<(&K, &V)>;
fn lt_entry(&mut self, key: &Q) -> Option<OccupiedEntry<K, V>>>;
// get_* and *_entry for le, ge, gt, min
}
impl<'a, K, V> OccupiedEntry<'a, K, V> {
fn pair(&self) -> (&K, &V);
fn pair_mut(&mut self) -> (&K, &mut V);
fn into_pair_mut(self) -> (&'a K, &'a mut V);
fn take(self) -> (K, V);
} |
The libs team discussed this RFC today, and our conclusion was that it may be best to hone this down to what's precisely necessary to satisfy the motivation in the outset. To that end would it be possible to only include the set methods? Specifically: impl<T> Set<T> {
fn get<Q: ?Sized>(&self, element: &Q) -> Option<&T>;
fn take<Q: ?Sized>(&mut self, element: &Q) -> Option<T>;
fn replace(&mut self, element: T) -> Option<T>;
} |
Specifically, I believe the supporting map methods were also decided to just be |
I'd personally prefer the methods to be freestanding in the module so they're private to the outside world but public to the crate rather than having them in the inherent API at all. |
@alexcrichton How does that work? Privacy can only reach up, and not down or sideways. Maps and Sets are defined in sibling modules. |
They could be defined in a crate private trait. That's how I've gotten On Wed, Aug 12, 2015, 8:01 PM Alexis Beingessner notifications@github.com
|
@apasel422 Can you amend the RFC to be minimal per aturon's request? I think we're good to go when that's done. |
I don't understand the motivation to allow item recover from a I was actually expecting that feature to move items from one |
Same here, the use case that got me here was with a |
@seppo0010 The usecase I hit that needed the feature for |
@apasel422 ping about the RFC update, would love to merge! |
@alexcrichton I haven't updated yet because it seems like there's still some dissent, based on the last few comments. |
As another voice, only having it on sets would be acceptable for me. I am in the same boat as @eddyb — a cache. |
@alexcrichton Updated. |
Ok, thanks @apasel422! The consensus of the libs team is that this is a great step forward for sets and we can continue to explore the problem space for maps as the needs arise, but it seems like the most pressing parts to work with are sets today. And of course, thanks again for the RFC @apasel422! |
I find myself needing this for maps. In my case, I am building a string interner using a The Alternatively, it would be useful for the What is the best route forward here? Should I write up a new RFC? |
…sert-with-iterator-to-last-inserted function ( see rust-lang/rfcs#1194 )
Well, I needed this for a function where I insert into a set, but then I immediately want an iterator to that last, inserted element in the set. Since a The application is a scanline algorithm, where the set consists out of ordered points. I need to insert a point into a scanline and then know where it has been inserted (the position), so that I can construct an iterator to the next and previous point in the (ordered) scanline. So for now I've forked the |
Rendered