From ca618d8ffc2bafbf7b91b5bc7289e02a92fd753a Mon Sep 17 00:00:00 2001 From: Alexis Beingessner Date: Thu, 28 Aug 2014 13:46:37 -0400 Subject: [PATCH 1/6] Create 0000-collection-views.md --- active/0000-collection-views.md | 183 ++++++++++++++++++++++++++++++++ 1 file changed, 183 insertions(+) create mode 100644 active/0000-collection-views.md diff --git a/active/0000-collection-views.md b/active/0000-collection-views.md new file mode 100644 index 00000000000..52e157ec11f --- /dev/null +++ b/active/0000-collection-views.md @@ -0,0 +1,183 @@ +- Start Date: 2014-08-28 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary + +Add additional iterator-like View objects to collections. +Views provide a composable mechanism for in-place observation and mutation of a +single element in the collection, without having to "re-find" the element multiple times. +This deprecates several "internal mutation" methods like hashmap's `find_or_insert_with`. + +# Motivation + +As we approach 1.0, we'd like to normalize the standard APIs to be consistent, composable, +and simple. However, this currently stands in opposition to manipulating the collections in +an *efficient* manner. For instance, if we wish to build an accumulating map on top of one +of our concrete maps, we need to distinguish between the case when the element we're inserting +is *already* in the map, and when it's *not*. One way to do this is the following: + +``` +if map.contains_key(&key) { + let v = map.find_mut(&key).unwrap(); + let new_v = *v + 1; + *v = new_v; +} else { + map.insert(key, 1); +} +``` + +However, this requires us to search for `key` *twice* on every operation. +We might be able to squeeze out the `update` re-do by matching on the result +of `find_mut`, but the `insert` case will always require a re-search. + +To solve this problem, we have an ad-hoc mix of "internal mutation" methods which +take multiple values or closures for the collection to use contextually. Hashmap in particular +has the following methods: + +``` +fn find_or_insert<'a>(&'a mut self, k: K, v: V) -> &'a mut V +fn find_or_insert_with<'a>(&'a mut self, k: K, f: |&K| -> V) -> &'a mut V +fn insert_or_update_with<'a>(&'a mut self, k: K, v: V, f: |&K, &mut V|) -> &'a mut V +fn find_with_or_insert_with<'a, A>(&'a mut self, k: K, a: A, found: |&K, &mut V, A|, not_found: |&K, A| -> V) -> &'a mut V +``` + +Not only are these methods fairly complex to use, but they're over-engineered and +combinatorially explosive. They all seem to return a mutable reference to the region +accessed "just in case", and `find_with_or_insert_with` takes a magic argument `a` to +try to work around the fact that the *two* closures it requires can't both close over +the same value (even though only one will ever be called). `find_with_or_insert_with` +is also actually performing the role of `insert_with_or_update_with`, +suggesting that these aren't well understood. + +Rust has been in this position before: internal iteration. Internal iteration was (I'm told) +confusing and complicated. However the solution was simple: external iteration. You get +all the benefits of internal iteration, but with a much simpler interface, and greater +composability. Thus, we propose the same solution to the internal mutation problem. + +# Detailed design + +A fully tested "proof of concept" draft of this design has been implemented on top of pczarn's +pending hashmap PR, as hashmap seems to be the worst offender, while still being easy +to work with. You can +[read the diff here](https://github.com/Gankro/rust/commit/6d6804a6d16b13d07934f0a217a3562384e55612). + +We replace all the internal mutation methods with a single method on a collection: `view`. +The signature of `view` will depend on the specific collection, but generally it will be similar to +the signature for searching in that structure. `view` will in turn return a `View` object, which +captures the *state* of a completed search, and allows mutation of the area. + +For convenience, we will use the hashmap draft as an example. + +``` +pub fn view<'a>(&'a mut self, key: K) -> Entry<'a, K, V>; +``` + +Of course, the real meat of the API is in the View's interface (impl details removed): + +``` +impl<'a, K, V> Entry<'a, K, V> { + /// Get a reference to the value at the Entry's location + pub fn get(&self) -> Option<&V>; + + /// Get a mutable reference to the value at the Entry's location + pub fn get_mut(&mut self) -> Option<&mut V>; + + /// Get a reference to the key at the Entry's location + pub fn get_key(&self) -> Option<&K>; + + /// Return whether the Entry's location contains anything + pub fn is_empty(&self) -> bool; + + /// Get a reference to the Entry's key + pub fn key(&self) -> &K; + + /// Set the key and value of the location pointed to by the Entry, and return any old + /// key and value that might have been there + pub fn set(self, value: V) -> Option<(K, V)>; + + /// Retrieve the Entry's key + pub fn into_key(self) -> K; +} +``` + +There are definitely some strange things here, so let's discuss the reasoning! + +First, `view` takes a `key` by value, because we observe that this is how all the internal mutation +methods work. Further, taking the `key` up-front allows us to avoid *validating* provided keys if +we require an owned `key` later. This key is effectively a *guarantor* of the view. +To compensate, we provide an `into_key` method which converts the entry back into its guarantor. +We also provide a `key` method for getting an immutable reference to the guarantor, in case its +value effects any computations one wishes to perform. + +Taking the key by-value might change once associated types land, +and we successfully tackle the "equiv" problem. For now, it's an acceptable solution in our mind. +In particular, the primary use case of this functionality is when you're *not sure* if you need to +insert, in which case you should be prepared to insert. Otherwise, `find_mut` is likely sufficient. + +Next, we provide a nice simple suite of "standard" methods: +`get`, `get_mut`, `get_key`, and `is_empty`. +These do exactly what you would expect, and allow you to query the view to see if it is logically +empty, and if not, what it contains. + +Finally, we provide a `set` method which inserts the provided value using the guarantor key, +and yields the old key-value pair if it existed. Note that `set` consumes the View, because +we lose the guarantor, and the collection might have to shift around a lot to compensate. +Maintaining the entry after an insertion would add significant cost and complexity for no +clear gain. + +Let's look at how we now `insert_or_update`: + +``` +let mut view = map.view(key); +if view.is_empty() { + let v = view.get_mut().unwrap(); + let new_v = *v + 1; + *v = new_v; +} else { + view.set(1); +} +``` + +We can now write our "intuitive" inefficient code, but it is now as efficient as the complex +`insert_or_update` methods. In fact, this matches so closely to the inefficient manipulation +that users could reasonable ignore views *until performance becomes an issue*. At which point +it's an almost trivial migration. We also don't need closures to dance around the fact that we +want to avoid generating some values unless we have to, because that falls naturally out of our +normal control flow. + +If you look at the actual patch that does this, you'll see that Entry itself is exceptional +simple to implement. Most of the logic is trivial. The biggest amount of work was just +capturing the search state correctly, and even that was mostly a cut-and-paste job. + +# Drawbacks + +* More structs, and more methods in the short-term + +* More collection manipulation "modes" for the user to think about + +* `insert_or_update_with` is kind of convenient for avoiding the kind of boiler-plate +found in the examples + +# Alternatives + +* We can just put our foot down, say "no efficient complex manipulations", and drop +all the internal mutation stuff without a replacement. + +* We can try to build out saner/standard internal manipulation methods. + +# Unresolved questions + +One thing omitted from the design was a "take" method on the Entry. The reason for this +is primarily that this doesn't seem to be a thing people are interested in having for +internal manipulation. However, it also just would have meant more complexity, especially +if it *didn't* consume the View. Do we want this functionality? + +The internal mutation methods cannot actually be implemented in terms of the View, because +they return a mutable reference at the end, and there's no way to do that with the current +View design. However, it's not clear why this is done by them. We believe it's simply to +validate what the method *actually did*. If this is the case, then Views make this functionality +obsolete. However, if this is *still* desirable, we could tweak `set` to do this as well. +Do we want this functionality? + +Naming bikesheds! From 6ee0ee014022cee2558aa20ab7e95081cb59b197 Mon Sep 17 00:00:00 2001 From: Alexis Beingessner Date: Thu, 28 Aug 2014 13:56:44 -0400 Subject: [PATCH 2/6] Update 0000-collection-views.md --- active/0000-collection-views.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/active/0000-collection-views.md b/active/0000-collection-views.md index 52e157ec11f..b33b20e6f3e 100644 --- a/active/0000-collection-views.md +++ b/active/0000-collection-views.md @@ -148,7 +148,13 @@ normal control flow. If you look at the actual patch that does this, you'll see that Entry itself is exceptional simple to implement. Most of the logic is trivial. The biggest amount of work was just -capturing the search state correctly, and even that was mostly a cut-and-paste job. +capturing the search state correctly, and even that was mostly a cut-and-paste job. + +With Views, we also open up the gate for... *adaptors*! +You really want `insert_or_update`? We can provide that for you! Generically! +However we believe such discussion is out-of-scope for this RFC. Adaptors can +be tackled in a back-compat manner after this has landed, and we have a better sense +of what we need or want. # Drawbacks @@ -180,4 +186,7 @@ validate what the method *actually did*. If this is the case, then Views make th obsolete. However, if this is *still* desirable, we could tweak `set` to do this as well. Do we want this functionality? +Do we want to introduce a proper standard trait, or keep it all concrete and ad-hoc for a while +to figure out what does and doesn't work? + Naming bikesheds! From 873c1ad97a1ce3482a01c88f0e7c92ef4e7395b4 Mon Sep 17 00:00:00 2001 From: Alexis Beingessner Date: Thu, 28 Aug 2014 15:14:36 -0400 Subject: [PATCH 3/6] Update 0000-collection-views.md --- active/0000-collection-views.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/active/0000-collection-views.md b/active/0000-collection-views.md index b33b20e6f3e..df759d997f7 100644 --- a/active/0000-collection-views.md +++ b/active/0000-collection-views.md @@ -172,6 +172,11 @@ all the internal mutation stuff without a replacement. * We can try to build out saner/standard internal manipulation methods. +* Try to make this functionality a subset of *Cursors*, which would be effectively a mut_iter +where the returned references borrow Self, and mutation can be performed at the location of +the cursor. However preventing invalidation would be more expensive, and some things +might be more awkward. + # Unresolved questions One thing omitted from the design was a "take" method on the Entry. The reason for this From 26328ed6b41070cd3f6925e10845e23f44abc859 Mon Sep 17 00:00:00 2001 From: Alexis Beingessner Date: Thu, 28 Aug 2014 15:35:15 -0400 Subject: [PATCH 4/6] Update 0000-collection-views.md --- active/0000-collection-views.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/active/0000-collection-views.md b/active/0000-collection-views.md index df759d997f7..96edfa60c62 100644 --- a/active/0000-collection-views.md +++ b/active/0000-collection-views.md @@ -167,15 +167,18 @@ found in the examples # Alternatives -* We can just put our foot down, say "no efficient complex manipulations", and drop +* Just put our foot down, say "no efficient complex manipulations", and drop all the internal mutation stuff without a replacement. -* We can try to build out saner/standard internal manipulation methods. +* Try to build out saner/standard internal manipulation methods. -* Try to make this functionality a subset of *Cursors*, which would be effectively a mut_iter -where the returned references borrow Self, and mutation can be performed at the location of -the cursor. However preventing invalidation would be more expensive, and some things -might be more awkward. +* Try to make this functionality a subset of [Cursors](http://discuss.rust-lang.org/t/pseudo-rfc-cursors-reversible-iterators/386/7), +which would be effectively a bi-directional mut_iter +where the returned references borrow the cursor preventing aliasing/safety issues, +so that mutation can be performed at the location of the cursor. +However, preventing invalidation would be more expensive, and it's not clear that +cursor semantics would make sense on e.g. a HashMap, as you can't insert *any* key +in *any* location. # Unresolved questions From 9a77de8d4f5aadab587be7a4bba7d6551e84f4b0 Mon Sep 17 00:00:00 2001 From: Alexis Beingessner Date: Thu, 28 Aug 2014 17:05:39 -0400 Subject: [PATCH 5/6] Learn 2 code --- active/0000-collection-views.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/active/0000-collection-views.md b/active/0000-collection-views.md index 96edfa60c62..527d842cd3b 100644 --- a/active/0000-collection-views.md +++ b/active/0000-collection-views.md @@ -130,7 +130,7 @@ Let's look at how we now `insert_or_update`: ``` let mut view = map.view(key); -if view.is_empty() { +if !view.is_empty() { let v = view.get_mut().unwrap(); let new_v = *v + 1; *v = new_v; From b175db91ca4d62882ff3e0705a8d37a7198c6257 Mon Sep 17 00:00:00 2001 From: Alexis Beingessner Date: Mon, 15 Sep 2014 20:01:24 -0400 Subject: [PATCH 6/6] Major rewrite based on enum design and comments --- active/0000-collection-views.md | 171 ++++++++++++++++---------------- 1 file changed, 88 insertions(+), 83 deletions(-) diff --git a/active/0000-collection-views.md b/active/0000-collection-views.md index 527d842cd3b..5069ab5f447 100644 --- a/active/0000-collection-views.md +++ b/active/0000-collection-views.md @@ -4,8 +4,8 @@ # Summary -Add additional iterator-like View objects to collections. -Views provide a composable mechanism for in-place observation and mutation of a +Add additional iterator-like Entry objects to collections. +Entries provide a composable mechanism for in-place observation and mutation of a single element in the collection, without having to "re-find" the element multiple times. This deprecates several "internal mutation" methods like hashmap's `find_or_insert_with`. @@ -13,8 +13,8 @@ This deprecates several "internal mutation" methods like hashmap's `find_or_inse As we approach 1.0, we'd like to normalize the standard APIs to be consistent, composable, and simple. However, this currently stands in opposition to manipulating the collections in -an *efficient* manner. For instance, if we wish to build an accumulating map on top of one -of our concrete maps, we need to distinguish between the case when the element we're inserting +an *efficient* manner. For instance, if one wishes to build an accumulating map on top of one +of the concrete maps, they need to distinguish between the case when the element they're inserting is *already* in the map, and when it's *not*. One way to do this is the following: ``` @@ -27,11 +27,11 @@ if map.contains_key(&key) { } ``` -However, this requires us to search for `key` *twice* on every operation. -We might be able to squeeze out the `update` re-do by matching on the result +However, searches for `key` *twice* on every operation. +The second search can be squeezed out the `update` re-do by matching on the result of `find_mut`, but the `insert` case will always require a re-search. -To solve this problem, we have an ad-hoc mix of "internal mutation" methods which +To solve this problem, Rust currently has an ad-hoc mix of "internal mutation" methods which take multiple values or closures for the collection to use contextually. Hashmap in particular has the following methods: @@ -50,111 +50,120 @@ the same value (even though only one will ever be called). `find_with_or_insert_ is also actually performing the role of `insert_with_or_update_with`, suggesting that these aren't well understood. -Rust has been in this position before: internal iteration. Internal iteration was (I'm told) +Rust has been in this position before: internal iteration. Internal iteration was (author's note: I'm told) confusing and complicated. However the solution was simple: external iteration. You get all the benefits of internal iteration, but with a much simpler interface, and greater -composability. Thus, we propose the same solution to the internal mutation problem. +composability. Thus, this RFC proposes the same solution to the internal mutation problem. # Detailed design -A fully tested "proof of concept" draft of this design has been implemented on top of pczarn's -pending hashmap PR, as hashmap seems to be the worst offender, while still being easy -to work with. You can -[read the diff here](https://github.com/Gankro/rust/commit/6d6804a6d16b13d07934f0a217a3562384e55612). +A fully tested "proof of concept" draft of this design has been implemented on top of hashmap, +as it seems to be the worst offender, while still being easy to work with. You can +[read the diff here](https://github.com/Gankro/rust/commit/39a1fa7c7362a3e22e59ab6601ac09475daff39b). -We replace all the internal mutation methods with a single method on a collection: `view`. -The signature of `view` will depend on the specific collection, but generally it will be similar to -the signature for searching in that structure. `view` will in turn return a `View` object, which +All the internal mutation methods are replaced with a single method on a collection: `entry`. +The signature of `entry` will depend on the specific collection, but generally it will be similar to +the signature for searching in that structure. `entry` will in turn return an `Entry` object, which captures the *state* of a completed search, and allows mutation of the area. For convenience, we will use the hashmap draft as an example. ``` -pub fn view<'a>(&'a mut self, key: K) -> Entry<'a, K, V>; +/// Get an Entry for where the given key would be inserted in the map +pub fn entry<'a>(&'a mut self, key: K) -> Entry<'a, K, V>; + +/// A view into a single occupied location in a HashMap +pub struct OccupiedEntry<'a, K, V>{ ... } + +/// A view into a single empty location in a HashMap +pub struct VacantEntry<'a, K, V>{ ... } + +/// A view into a single location in a HashMap +pub enum Entry<'a, K, V> { + /// An occupied Entry + Occupied(OccupiedEntry<'a, K, V>), + /// A vacant Entry + Vacant(VacantEntry<'a, K, V>), +} ``` Of course, the real meat of the API is in the View's interface (impl details removed): ``` -impl<'a, K, V> Entry<'a, K, V> { - /// Get a reference to the value at the Entry's location - pub fn get(&self) -> Option<&V>; +impl<'a, K, V> OccupiedEntry<'a, K, V> { + /// Get a reference to the value of this Entry + pub fn get(&self) -> &V; - /// Get a mutable reference to the value at the Entry's location - pub fn get_mut(&mut self) -> Option<&mut V>; + /// Get a mutable reference to the value of this Entry + pub fn get_mut(&mut self) -> &mut V; - /// Get a reference to the key at the Entry's location - pub fn get_key(&self) -> Option<&K>; + /// Set the value stored in this Entry + pub fn set(mut self, value: V) -> V; - /// Return whether the Entry's location contains anything - pub fn is_empty(&self) -> bool; - - /// Get a reference to the Entry's key - pub fn key(&self) -> &K; - - /// Set the key and value of the location pointed to by the Entry, and return any old - /// key and value that might have been there - pub fn set(self, value: V) -> Option<(K, V)>; + /// Take the value stored in this Entry + pub fn take(self) -> V; +} - /// Retrieve the Entry's key - pub fn into_key(self) -> K; +impl<'a, K, V> VacantEntry<'a, K, V> { + /// Set the value stored in this Entry + pub fn set(self, value: V); } ``` There are definitely some strange things here, so let's discuss the reasoning! -First, `view` takes a `key` by value, because we observe that this is how all the internal mutation -methods work. Further, taking the `key` up-front allows us to avoid *validating* provided keys if -we require an owned `key` later. This key is effectively a *guarantor* of the view. -To compensate, we provide an `into_key` method which converts the entry back into its guarantor. -We also provide a `key` method for getting an immutable reference to the guarantor, in case its -value effects any computations one wishes to perform. +First, `entry` takes a `key` by value, because this is the observed behaviour of the internal mutation +methods. Further, taking the `key` up-front allows implementations to avoid *validating* provided keys if +they require an owned `key` later for insertion. This key is effectively a *guarantor* of the entry. + +Taking the key by-value might change once collections reform lands, and Borrow and ToOwned are available. +For now, it's an acceptable solution, because in particular, the primary use case of this functionality +is when you're *not sure* if you need to insert, in which case you should be prepared to insert. +Otherwise, `find_mut` is likely sufficient. -Taking the key by-value might change once associated types land, -and we successfully tackle the "equiv" problem. For now, it's an acceptable solution in our mind. -In particular, the primary use case of this functionality is when you're *not sure* if you need to -insert, in which case you should be prepared to insert. Otherwise, `find_mut` is likely sufficient. +The result is actually an enum, that will either be Occupied or Vacant. These two variants correspond +to concrete types for when the key matched something in the map, and when the key didn't, repsectively. -Next, we provide a nice simple suite of "standard" methods: -`get`, `get_mut`, `get_key`, and `is_empty`. -These do exactly what you would expect, and allow you to query the view to see if it is logically -empty, and if not, what it contains. +If there isn't a match, the user has exactly one option: insert a value using `set`, which will also insert +the guarantor, and destroy the Entry. This is to avoid the costs of maintaining the structure, which +otherwise isn't particularly interesting anymore. -Finally, we provide a `set` method which inserts the provided value using the guarantor key, -and yields the old key-value pair if it existed. Note that `set` consumes the View, because -we lose the guarantor, and the collection might have to shift around a lot to compensate. -Maintaining the entry after an insertion would add significant cost and complexity for no -clear gain. +If there is a match, a more robust set of options is provided. `get` and `get_mut` provide access to the +value found in the location. `set` behaves as the vacant variant, but also yields the old value. `take` +simply removes the found value, and destroys the entry for similar reasons as `set`. -Let's look at how we now `insert_or_update`: +Let's look at how we one now writes `insert_or_update`: ``` -let mut view = map.view(key); -if !view.is_empty() { - let v = view.get_mut().unwrap(); - let new_v = *v + 1; - *v = new_v; -} else { - view.set(1); +match map.entry(key) { + Occupied(entry) => { + let v = entry.get_mut(); + let new_v = *v + 1; + *v = new_v; + } + Vacant(entry) => { + entry.set(1); + } } ``` -We can now write our "intuitive" inefficient code, but it is now as efficient as the complex +One can now write something equivalent to the "intuitive" inefficient code, but it is now as efficient as the complex `insert_or_update` methods. In fact, this matches so closely to the inefficient manipulation -that users could reasonable ignore views *until performance becomes an issue*. At which point -it's an almost trivial migration. We also don't need closures to dance around the fact that we -want to avoid generating some values unless we have to, because that falls naturally out of our +that users could reasonable ignore Entries *until performance becomes an issue*. At which point +it's an almost trivial migration. Closures also aren't needed to dance around the fact that one may +want to avoid generating some values unless they have to, because that falls naturally out of normal control flow. -If you look at the actual patch that does this, you'll see that Entry itself is exceptional +If you look at the actual patch that does this, you'll see that Entry itself is exceptionally simple to implement. Most of the logic is trivial. The biggest amount of work was just capturing the search state correctly, and even that was mostly a cut-and-paste job. -With Views, we also open up the gate for... *adaptors*! -You really want `insert_or_update`? We can provide that for you! Generically! -However we believe such discussion is out-of-scope for this RFC. Adaptors can -be tackled in a back-compat manner after this has landed, and we have a better sense -of what we need or want. +With Entries, the gate is also opened for... *adaptors*! +Really want `insert_or_update` back? That can be written on top of this generically with ease. +However, such discussion is out-of-scope for this RFC. Adaptors can +be tackled in a back-compat manner after this has landed, and usage is observed. Also, this +proposal does not provide any generic trait for Entries, preferring concrete implementations for +the time-being. # Drawbacks @@ -180,21 +189,17 @@ However, preventing invalidation would be more expensive, and it's not clear tha cursor semantics would make sense on e.g. a HashMap, as you can't insert *any* key in *any* location. -# Unresolved questions - -One thing omitted from the design was a "take" method on the Entry. The reason for this -is primarily that this doesn't seem to be a thing people are interested in having for -internal manipulation. However, it also just would have meant more complexity, especially -if it *didn't* consume the View. Do we want this functionality? +* This RFC originally [proposed a design without enums that was substantially more complex] +(https://github.com/Gankro/rust/commit/6d6804a6d16b13d07934f0a217a3562384e55612). +However it had some interesting ideas about Key manipulation, so we mention it here for +historical purposes. +# Unresolved questions The internal mutation methods cannot actually be implemented in terms of the View, because they return a mutable reference at the end, and there's no way to do that with the current View design. However, it's not clear why this is done by them. We believe it's simply to validate what the method *actually did*. If this is the case, then Views make this functionality -obsolete. However, if this is *still* desirable, we could tweak `set` to do this as well. -Do we want this functionality? - -Do we want to introduce a proper standard trait, or keep it all concrete and ad-hoc for a while -to figure out what does and doesn't work? +obsolete. However, if this is *still* desirable, `set` could be tweaked to do this as well. +However for some structures it may incur additional cost. Is this desirable functionality? Naming bikesheds!