-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ordered query API #1195
ordered query API #1195
Conversation
Looks good to me. I assume you went with Should there also be Along those lines, we could also provide something like fn last_entry(&mut self) -> Option<OccupiedEntry<K, V>>;
fn pred_inc_entry<Q: ?Sized>(&mut self, key: &Q) -> Option<OccupiedEntry<K, V>>
where K: Borrow<Q>, Q: Ord;
... I've been experimenting with this in my BST library. It's a niche use-case, but allows code to inspect the key and value before deciding whether to remove it. |
|
||
where `pred(Unbounded)` is max, and `succ(Unbounded)` in min by assuming you're getting the | ||
predecessor and successor of positive and negative infinity. This RFC does not propose this | ||
API because it is crazy-pants and would make our users cry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a serious alternative.
Bound
and .range()
have existed for a while, are they not something we want to keep? Can I drag the alternative of using range syntax into this? (Bounded2 = Inclusive | Exclusive)
so std::ops::Range<Bounded2>
etc could be an alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to just shuffle the combinatorics around and make the calling convention more awkard, as far as I can tell. No?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's inconsistent if we want to keep using Bound as it is (or even changed) in some places (.range()), and then have these methods not use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I regard Bound as a necessary evil for range
because the combinatorics there seem truly catastrophic (18 iterator methods). That said I've never been super happy with the range design. Someone once suggested a builder pattern to me like:
// unbounded RHS
.range().from(x).into_iter()
// bounded RHS
.range().from(x).to(y).into_iter()
...etc
Might be worth considering that more seriously.
@apasel422 Also if your type is actually ordered I actually intended to add I had also concluded that an Entry API was silly since VacantEntry is nonsensical, but I suppose |
@gankro Presumably a dedicated removal method can be (slightly) more efficient than removing through the entry API, due to less bookkeeping. I hate to increase the combinatoric problem even more, but since the map types already have |
It's not clear to me that Only OccupiedEntry would have overhead. Constructing an OccupiedEntry is literally running |
It might be fine to just have the |
* succ_exc | ||
* first | ||
* last | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these names have precedence from other libraries? They seem a bit too succinct to me (although a big plus one to the actual functionality, I've wanted this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java: higher/lower/ceil/floor
C++: lower_bound/upper_bound (these names are terrible and I explicitly killed them in collections reform)
Everything else I looked at: chaos or doesn't seem to have this precise collection/functionality.
I briefly pondered before/after and next/prev before letting my theory background take over and demand predecessor/successor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another potential naming scheme could involve {lt, le, ge, gt}
, optionally with a prefix or suffix if we're concerned about conflicting with PartialOrd
's methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some ideas:
before, after, before_eq, after_eq
find_{lt, le, ge, gt}
get_{lt, le, ge, gt}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incidentally the lack of genericity over mutability is killing me. Don't how I'd do it, but there's so much repetition in API's these days because of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh damn right I wanted to avoid that auuuugh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is {next, next_or_eq, prev, prev_or_eq, first, last}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do really like that lt/leq/etc is an established naming convention that people can bring into understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leq
or le
? The former might be easier to grok, but the latter is consistent with PartialOrd
and has the minor benefit of having the same number of characters as {lt, gt}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh whoops, I thought that PartialOrd used leq.
Honestly, I would prefer using the builder suggestion from above. |
@benaryorg These are orthogal API discussions. One is for doing direct queries, one is for iterating ranges. While one can be implemented in terms of the other, this is not necessarily efficient or desirable. |
@gankro So you are planning to build two APIs, which one of them might (please) use a builder pattern and the other being cursor-like? Sorry if I do not quite get the idea behind the second API. |
This RFC is proposing an API just for answering queries of the form "who is the predecessor/successor/minimum/etc". All it does is return The range API that was being discussed above would produce an Cursors are Yet Another thing that are not currently being proposed here, and that the standard library does not currently have a notion of. Cursors and iterators -- particularly &mut ones -- must be implemented as separate types because they have different semantics. Iterators say you can always call |
Okay, I understand now. I'll leave function naming to you as I am the worst at that. |
@apasel422 Would you be fine with punting on remove/entry APIs until BTreeMap is rewritten to use parent pointers? I believe they can be added afterwards without an RFC based on "natural API holes" logic. |
@gankro Absolutely. |
I've renamed the APIs per discussion. |
@gankro you may want to update your original comment. I read |
Hm, how about combating the combinatoric explosion with type paramters? .get_rel::<LE>(&Q) -> Option<(&K, &V)>; If fn get<Ord=EQ>(&Q) -> Option<(&K, &V)>; |
@Kimundi I've talked about something like that with @gankro in the past: Gankra/collect-rs#120 (comment). |
🔔 HERE YE HERE YE THIS RFC IS ENTERING ITS FINAL COMMENT PERIOD 🔔 |
modulo, but this is a more general problem for the *ordered map* API. There are surely types for | ||
which a straight-up query will be cheaper than iterator initialization. | ||
|
||
It is also siginificantly more ergonomic/discoverable to have `pred_inc_mut(&K)` over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/pred_inc_mut/get_le_mut/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/siginificantly/significantly/
(It would be good for the typos that @apasel422 has noticed to be fixed if/before this is merged.) I find the combinatorics here really really bad. It seems a little crazy to add so many methods for what I suspect are relatively niche use-cases. @gankro, I know you're rabidly against any use of enums in APIs to collapse functionality (i.e. the |
Returning an impl<K, V> Map<K, V> {
fn get_lt<Q: ?Sized>(&self, key: &Q) -> Option<(&K, &V)>;
fn lt_entry<Q: ?Sized>(&mut self, key: &Q) -> Option<OccupiedEntry<K, V>>;
// ... `le`, `ge`, `gt`
} or enum Query<T> {
Min,
Lt(T),
Le(T),
Ge(T),
Gt(T),
Max,
}
impl<K, V> Map<K, V> {
fn query<Q: ?Sized = K>(&self, query: Query<&Q>) -> Option<(&K, &V)>;
fn query_entry<Q: ?Sized = K>(&mut self, query: Query<&Q>) -> Option<OccupiedEntry<K, V>>;
} |
I disagree that it's niche -- it's one of the primary reasons to use an ordered map. |
Is there also use for a "nearest" version? I e, if the treemap looks for 1000 and can only find 900 and 1010, it will choose 1010 because it is nearest. That seems useful - although maybe that will require some additional trait bound (e g Bikeshed wise, I don't know why |
This, please! I'm missing this for a while. Otherwise there's very little reason to have an ordered map! It's a shame it requires so much code though, we need those parent pointers. |
@arthurprs parent pointers wouldn't solve the duplication, it's a pure descent algorithm. (I suppose it would reduce duplication with other APIs.) |
In think that the following is missing:
When seeing all the
I see two main use-cases for the query API:
So I basically expected to see two functions @gankro you asked for feedback, I hope this is some constructive one :P |
The time and space complexity of these operations is an implementation Can you provide an example of how you would use the user-provided closure On Thursday, August 6, 2015, gnzlbg notifications@github.com wrote:
|
Basically I just want to know if I can call these in a loop without ending up in N^2 complexity or blowing up the stack. If I'm doing something latency-related I also need to know if they allocate any memory in the heap. If the implementation improves the complexity in the future, that is a non-breaking change, but the current complexity guarantees should be there.
I'm not sure either, but for the |
I don't think this RFC needs to make complexity guarantees. People can decide to use these methods based on the public documentation of their current complexity, not the contents of this RFC. But if you need the predecessor of a key for a certain algorithm, you have to find it somehow, so providing these APIs will be beneficial regardless of their complexity. Even if they are implemented completely naively at first, code that calls the methods instead of doing a manual iterator-based search will be made more efficient automatically when the implementation improves. I don't understand what |
An alternative to the enum would also be a static dispatch version, similar to the way struct Min;
struct Max;
struct Le<Q: ?Sized>(Q);
// ...
trait Query<K, V, Selector: ?Sized> {
fn query(&self, query: &Selector) -> Option<(&K, &V)>;
fn query_mut(&mut self, query: &Selector) -> Option<(&K, &mut V)>;
// maybe query_entry as well
}
impl<K, V> Query<K, V, Min> for Map<K, V> {
/* ... */
}
impl<K, V, Q> Query<K, V, Le<Q>> for Map<K, V>
where K: Borrow<Q> {
/* ... */
}
// ... Cons: you'd have to import |
You actually wouldn't have to import the query trait, because we could add inherent methods to the map that simply call out to the appropriate impl. You would have to import the query structs themselves, though, and there would have to be a different trait for set queries, which won't expose mutable elements. I'm not opposed to the static dispatch approach, because it can be nice to represent the queries themselves as values (e.g. passing around |
To avoid having a separate trait for trait Query<Selector: ?Sized> {
type Output;
fn query(self, query: &Selector) -> Option<Self::Output>;
}
trait QueryMut<Selector: ?Sized> {
type Output;
fn query_mut(self, query: &Selector) -> Option<Self::Output>;
}
impl<'a, K, V> Query<Min> for &'a Map<K, V> {
type Output = (&'a K, &'a V);
fn query(self, query: &Selector) -> Option<Self::Output> {
/* ... */
}
}
impl<'a, K, V> QueryMut<Min> for &'a mut Map<K, V> {
type Output = (&'a K, &'a mut V);
fn query_mut(self, query: &Selector) -> Option<Self::Output> {
/* ... */
}
}
impl<'a, E> Query<Min> for &'a Set<E> {
type Output = &'a E;
fn query(self, query: &Selector) -> Option<Self::Output> {
/* ... */
}
} If you do provide inherent methods though, I don't know if there is much value in using the same trait (which would really only ever show up to bound the argument of the inherent methods). |
@cristicbz I've put a POC implementation of what you are suggesting here: https://github.com/apasel422/bst/tree/query. |
Here's a thought on an API variant to deal with combinatorics while still being friendly: fn max<Q: ?Sized, R>(&self, range: R) -> Option<(&K, &V)>
where K: Borrow<Q>, Q: Ord, AnyRange<&Q>;
fn min<Q: ?Sized, R>(&self, range: R) -> Option<(&K, &V)>
where K: Borrow<Q>, Q: Ord, R: AnyRange<&Q>;
fn max_entry<Q: ?Sized, R>(&mut self, range: R) -> Option<OccupiedEntry<K, V>>
where K: Borrow<Q>, Q: Ord, R: AnyRange<&Q>;
fn min_entry<Q: ?Sized, R>(&mut self, range: R) -> Option<OccupiedEntry<K, V>>
where K: Borrow<Q>, Q: Ord, R: AnyRange<&Q>; Given inclusive ranges, you can cover all of the cases you wanted to with your API, without requiring any extra imports or names to be used. UPDATE: in case the above is unclear, here are some examples: // get_le
map.max(...&k)
// get_lt
map.max(..&k)
// get_ge
map.min(&k..)
// get the smallest element:
map.min(..) However, @gankro points out on IRC that since exclusive ranges only exclude on the right, we can't express |
@aturon The inability to express |
FWIW, it seems C++'s equivalent container |
The libs team has decided to close this RFC pending investigating alternative API solutions. In particular I think there's a promising opportunity with a range builder pattern. For now this functionality could be provided by an external crate -- at least semantically, not necessarily perf-wise -- on top of |
rendered
Add the following to BTreeMap
and to BTreeSet: