-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collection Views #216
Collection Views #216
Conversation
|
||
``` | ||
let mut view = map.view(key); | ||
if view.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this should be !view.is_empty()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d'oh! fixed :)
This looks really nice. Agreed that the current methods on hashmap are unwieldy; the changes proposed here look like a nice improvement from both a readability/usability perspective and a performance one. |
I've been thinking about a view-like objects like this; it would allow us to implement this once for each data structure and get all the functions This seems a little like lenses in Haskell, I wonder if we can draw inspiration from them. |
I like this proposal overall. Even though I would rather use an enum with two variants for this: one that allows mutation of the entry, and another that provides means of inserting values into an empty spot. That, or adaptors. Also, you are stepping out of the formal style. At some point I started wondering if "we" refers to broader audience or only those people that work on APIs. |
@pczarn: Sorry, I tend to drift into using the academic formal style, in which We is used in odd ways. I am We, you are We, everyone is We, There Has Never Not Been We. Anyway. If we're really interested in adapters/traits I would probably lean towards something like the following:
This loses all key information for maps, but it's very generic and simple, and exactly how useful that information is to a user is unclear. We could also bifurcate this into MapView and uh... NotMapView. However @pczarn raises an interesting idea about the possibility of an empty/full enum. This would cut out a lot of the options and state checks, but would lead to some duplication. |
pub fn is_empty(&self) -> bool; | ||
|
||
/// Get a reference to the Entry's key | ||
pub fn key(&self) -> &K; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why both key()
and get_key()
? Does get_key()
ever return None
? (What does key()
return then?)
Edit: To answer my own question: The key presently in the map and the one in this view may be "equivalent", but they might not be the same. get_key()
is the key (possibly) already there, key()
is the one we're "searching" with.
But maybe this could be made clearer somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this me just being a maximalist. key() yields the guarantor, get_key() gets the key in the actual collection.
Because I think it's interesting, I ported my Hashmap design to @pczarn's Enum style. The consequence is that basically all of the complexity gets pushed into the types themselves. The resultant API is much simpler and cleaner. I also took the opportunity to add I also impled insert_or_update_with on the enum as a demo of what that would look like. You can also properly use match instead of an is_empty check to Resultant types (impl details stripped):
impls (impl details stripped)
|
You mentioned implementing Views for "collections", but the only example provided was for HashMap. Will this only be needed for Map types, or will other collections have Views as well? If so, how would they look? |
Maps are the nicest example, since they often have complex search procedures. However there's nothing preventing this from being implemented on index-based structures. The only API difference (at least with the minimal version posted just above) would be I could also see having a Outside of maps and lists, this functionality seems fairly irrelevant. Sets as boolean maps don't really benefit from this complex behaviour. More exotic things like BitV's and PriorityQueues similarly don't really need this. So really I'd say this is for collections that implement |
On second thought, that needs to be made more extreme: it should also be non-trivial to predetermine if a key will yield a value. If you can know trivially, then |
@gankro Hmm, I'd still like to see this for sets. There's no guarantee that two elements of a set are "exactly" equal from a perspective of what data they contain even if the "More exotic things like BitV's and PriorityQueues similarly don't really need this." Maybe they don't need it, but it again would be nice to have. I was implementing Prim's the other day, and was wishing that the PriorityQueue API let me update the edge weights I was putting in the queue. Currently, you have to add new edges to the graph and just leave the old ones in there, which slows down the algorithm a bit. I think the RFC should specifically state which collections this will be implemented for, just so we're all clear on this. |
@gsingh93 Unfortunately, all of our sets are invariably just a thin wrapper around Further, you can't perform an in-place replacement on a Set because it's structured on its values, unlike a map. To change the value would change where it "goes" in the structure, making the In theory, I could flesh out my minimal enum-style design to regain the notion of "keys", and we could treat a Set as View<T,()>. Then we could provide a For priority queues it sounds like you want |
I've been playing with another related idea using Zippers, Editors, and Contexts, but I'm hitting the issue that |
@reem Why are you doing that? AFAIK even if that worked, there's no benefit to doing that. Since every implementor of In any case, this is covered under rust-lang/rust#12511, any commentary on this stack overflow should go there. |
6357402
to
e0acdf4
Compare
What I'm actually doing involves a slightly more complex recursion, where it's not a bound on |
@reem Sounds like rust-lang/rust#12644 then. |
|
||
We replace all the internal mutation methods with a single method on a collection: `view`. | ||
The signature of `view` will depend on the specific collection, but generally it will be similar to | ||
the signature for searching in that structure. `view` will in turn return a `View` object, which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A View
object? Do you mean an Entry
object, judging from below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
View is to Iterator as Entry is to Entries. View is the abstract notion, where Entry is the concrete implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense!
@gankro This is really great work. I agree that @pczarn's variant is cleaner; can you update the RFC with that as the main proposal? For history, you can keep the original proposal as an alternative. (I think we should use the keyless variant in particular, which is squeaky-clean.) Really, my only hesitation is the name. Would you consider using "Cursor"? I'm imagining the term as generically referring to an object that points into some other collection, allowing inspection and mutation, but not necessarily navigation. As far as adapting to other collections: I see no problem with landing this for our map types to begin with, and then being on the lookout for similar opportunities elsewhere. This API will likely remain as experimental for some time in any case, and my general feeling is that we should only standardize general APIs when we have many good concrete instances in hand. |
@aturon So we're not interested in providing keys for giving some functionality to Sets, then? I suppose key stuff can be bolted on after, anyway. Do we want to leverage ToOwned here, or can it wait and we'll just upgrade it if the collections reform goes through? If we're using ToOwned what's the semantics we want here? If there's already a key, should we avoid applying that transform on insertion on the assumption that Eq keys are indistinguishable? Or maybe we should provide set as lazy and swap as aggresive wrt keys? Probably too complicated. I've been leaning towards treating keys as indistinguishable. We can add backcompat methods for "when you care" later. |
Yeah, that was my thought: we can always add the key manipulation functionality later, if it turns out to be strongly desired. I understand the hypothetical arguments, but I have yet to see a compelling, concrete example where you'd really want that functionality.
I'd like to keep the RFCs separate for now, and this one's likely to land first. So let's revise with As to the ownership semantics in general, the easiest solution is the API you've given: take an owned key up front, even though if the item exists it isn't needed. An alternative that might not be bad is: take a borrowed key for So I think for now, I'd stick with today's ownership story as you have been. |
@aturon sounds good, go for works-right-now back-compat minimalism. I'm not a huge fan of calling these things Cursors though, since the "real" Cursor design we've been working on has radically different semantics. To conflate the two seems unhelpful. Unless you want Cursors to have a different name? |
I think the major worry is that specific designs like this won't be rolled into a future, HKT + Collection trait driven world where you get to write highly generic code instead of highly specific code and we will end up with two different ways to do the same thing. As the language approaches 1.0 and people will expect more stability from libraries, this is an extremely important thing to consider. If we had, as part of this RFC, a "plan forward" for integrating with more general approaches, I think it would be fine to have these specific things. However, moving methods and such is a backwards compatibility hazard, so we have to be really really careful about how we organize this. Just as a proof of concept, this allows you to write a generic // Yes more parameters, but shortened for explanatory purposes
fn replace<T: Editable>(collection: T, dir: Direction, new: Data) -> T {
match collection.deconstruct().remove(dir) {
Ok(_, ctx) => ctx.insert(new),
Err(edtr) => edtr
}.reconstruct()
} and this works for mutable data structures like this:
or like this:
and for persistent structures like this:
Methods like these could even go on the Editable or Editor traits like for Iterator to provide a really flexible but highly generic API for interacting with collections. |
I've rewritten the RFC to be based on the enum design. I've also refactored the wording to be more "RFCish" based on @pczarn's comments. |
I'm sympathetic to these concerns, and much of the Collections Reform RFC is geared toward this kind of conservative API design. But as with so many things, there's a balance to be struck. In particular, I feel strongly that generic APIs should not come at the cost of clear and simple concrete APIs. In this particular example, the zipper-style interface makes many more distinctions than the Put another way, a newcomer seeing the Given that maps are ubiquitous, and (I suspect) most programming against maps will be against a concrete version like On the other hand, having multiple ways to do something -- one concrete, tailored API, and one generic one through traits -- is not such a bad thing. We frequently offer convenience methods that are "redundant" in this sense, but aid ergonomics, performance, or understanding. If or when we add zippers, having them sit along side All that said, while I'm hoping to stabilize much of the collections API as part of collections reform, I don't think we need to stabilize this |
I agree that the Entry interface is much simpler and is probably the way to go - for now. For clarity, I think that the Zipper interface would be used behind the curtains of more generic helpers - the same way most new users will never call |
This RFC was discussed during a weekly meeting and accepted as-is. |
@aturon link is broken |
@reem Sorry, the minutes haven't been posted yet, but you can see them here for now: https://etherpad.mozilla.org/Rust-meeting-weekly |
\o/ |
I haven't been following this RFC so this may have been covered already, but what exactly are the costs involved in keeping |
@sfackler For OccupiedEntry it should be free, since it's a swap, and you obviously have another Key afterwards. Destroying it in that case is partially legacy from the old more complicated design, and partially symmetry. I'm amenable to changing it, though I don't think it would be very valuable. Actually, upon reflection, "set" is really a subset of the get_mut behaviour, modulo the Key getting swapped (which is likely uninteresting). |
|
@sfackler I'm currently working on migrating all the code in Rust to use the new Entry API, so I'll get back to you once I have a better view of how this stuff is used. So far, it seems like The nasty case seems to be when they're just using this to guarantee there's something there (e.g. a collection), and then doing complex logic on it.
which I could only port to
Edit: for the most part though, I've had general code quality improvements, in my subjective opinion! |
It's looking like I could probably eliminate all the troubles here by making VacantEntry.set yield a mutable reference to the inserted values. I know this can be done efficiently with a bit of unsafe code on HashMap (grab a raw ptr to where you're going to insert it), which is the primary use case. BTreeMap will struggle because you can't know where it will be inserted memory-wise until part-way through the operation. I can refactor a bit to expose that information as part of the internal insertion method, though. TreeMap I continue to avoid vehemently, but it shouldn't be too bad since once you've made the node for the element, you know where in memory to find the values. @aturon thoughts? |
@gankro That sounds reasonable, but I wonder if we could/should go a step further: impl<'a, K, V> VacantEntry<'a, K, V> {
/// Set the value stored in this Entry
pub fn set(self, value: V) -> OccupiedEntry<'a, K, V>;
} That should make it very easy to deal with the kind of example you gave above, and it gives you the full suite of While this makes the signature slightly more complex, I think it's intuitive and of course you can always ignore the result. You could also imagine doing something similar on the other side: impl<'a, K, V> OccupiedEntry<'a, K, V> {
/// Take the value stored in this Entry
pub fn take(self) -> (V, VacantEntry<'a, K, V>);
} though I don't think that change is very well-motivated, and it makes it (slightly) harder to use |
@aturon Getting the reference is cheap and easy because we generally know exactly where it will be very soon in the operation, and you only need the address to get the ref. Constructing a full Occupied/VacantEntry afterwards would be much more difficult, and probably the only reasonable way would be to just construct it from scratch, which the user may as well do themselves. For a hashmap you could definitely do it by just remembering the index and/or cloning the hash, but for the tree-based maps, you generally need a full search path to perform an insertion or deletion correctly. And after an insertion or deletion the search path will in general be quite different. |
@sfackler After poking at your idea, I did run into one small problem in hashmap. Its internal API wants the items by-value to do the swap, but you can't take the values out by-value if the Entry's &mut. Of course, that can be bypassed with a bit of unsafe code, but it's still a bummer. |
Oh well. It seems reasonable to implement it as accepted and see if this actually ends up being a pain point. We'll have time later to tweak the API before it stabilizes. |
Ahh, much better.
|
Obvious oversight: get_mut has to borrow the entry, but that means get_mut can't outlive the entry, which is necessary for this pattern. Need:
Easy to provide. |
I am loving this new API. I just used it to drastically improve (and speed-up) my Trie's remove method. There's some other junk that I cleaned up, but the main improvement ccomes from being able to find hashmap entries and remove them later without re-finding (I used |
@michaelsproul Awesome! I didn't think anyone would actually have a use for Since you seem to be a Trie wizard, would you be interested in implementing this API on our TrieMap? I'm a bit too swamped with school work and writing RFCs to tackle this on all of our maps myself atm. 😢 |
@gankro: Oooh, I'd love to. I've got a bit of uni work at the moment too, but I'll give it a shot. |
Add additional iterator-like View objects to collections.
Views provide a composable mechanism for in-place observation and mutation of a single element in the collection, without having to "re-find" the element multiple times. This deprecates several "internal mutation" methods like hashmap's
find_or_insert_with
.Rendered View