-
Notifications
You must be signed in to change notification settings - Fork 13.3k
libcollections: Add a Multiset data structure. #15623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… take iterator parameter. This allows them to be used for both TreeSet and TreeMultiset.
I don't think MutableMultiset should inherit MutableSet. Given a Set I expect the following to work:
And that's just not how a multiset works. |
|
||
/// Add one occurrence of `value` to the multiset. Return true if the value | ||
/// was not already present in the multiset. | ||
fn insert_one(&mut self, value: T) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not also have remove_one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inadvertent omission. I've just added it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely a minor point, but it seems like insert_one/remove_one are probably the more common methods for someone to want. Would it be better to have insert/remove be insert_one/remove_one, and then have insert_many/remove_many? At very least, this would bring the multiset interface superficially closer to the set interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right. Most of the inserts/removes will probably be one at a time. I'll change this.
Add remove_one() method to MutableMultiset. Implement Show and Default for TreeMultiset. Add tests for count() and the Show implementation.
|
||
impl<T: Ord> Collection for TreeMultiset<T> { | ||
#[inline] | ||
fn len(&self) -> uint { self.map.len() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should len just return the number of keys, or the sum of the counts? I imagine it would be fairly easy and desirable to expose methods for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. I almost certainly copied it unthinkingly from the TreeSet implementation. I'm going to change len()
to return the sum of the counts. Since the number of distinct values is probably a useful method to have, I'll add a method for that (though I'm currently not sure whether to add it to the trait or just to each implementation)
…evious version returned the number of distinct values.
Should the possibility of splitting treemap.rs into multiple files be discussed here? It's getting a bit unwieldy, and could easily be separated into treemap, treeset, and treemultiset. |
With this patch, treemap.rs is currently at 2502 lines. By comparison, hashmap.rs is currently at 2520, so there is precedent for files this large. But perhaps it makes the most sense to put TreeSet and TreeMultiset into separate modules. I'm not sure here. |
This does seem to play nicely into the For now, though, I'm not sure if adding new collections traits is the best option. The current traits are sort of ad-hoc and seem to be lacking an overall design in terms of interactions with one another and extensions into the future. Perhaps these methods could be inherent methods for now pending a redesign in the future? I don't think anyone is taking a generic |
…[insert|remove]_many
The collections crate is still in flux, so we are holding off on deciding the design of this trait.
@alexcrichton Yeah, that makes sense. I've removed the traits. |
This should be ready for further review. TreeMultiset now implements all the traits that TreeSet implements and provides all the same methods as well, with the exception of For I should also point out that it's unclear to me how this could support both |
let mut mset = TreeMultiset::new(); | ||
mset.extend(iter); | ||
mset | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason that from_iter isn't #[inline] but extend is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that I copied and pasted function signatures from TreeSet
and its from_iter
method is not inlined! Same goes for much of the other methods.
Part of me wonders if a "bulk" iterator that yields (item, count) would be desirable, but... so many iterators... |
} | ||
|
||
/// Return true if the multiset has no elements in common with `other`. | ||
pub fn is_disjoint(&self, other: &TreeMultiset<T>) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this implementation is elegant, would it not be radically more efficient to iterate the underlying TreeMaps, and compare counts? As an extreme case if I make two Multisets with 10000000 and 1000001 instances of x, this code will count all the way to 1000001 before terminating, where it could just do a single integer comparison.
Edit: I think it would be more readable, too.
Edit2: This comment was actually directed at is_subset, but there exists a similar very-inefficient input for this function.
self.count -= 1; | ||
self.current | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between these implementations? Only the underlying iterators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if I remember correctly that's the only difference. (same as for SetItems
and RevSetItems
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you may write a single impl<'a, T, I: Iterator<(&'a T, &'a uint)>> Iterator<&'a T> for MultisetItems<'a, T, I> {
, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the only question is whether that makes the item too unwieldy for use. The same could be said for SetItems
and RevSetItems
(unifying them and parameterizing the iterator)
There are some improvements that can be made to this, as @gankro and @pczarn have pointed out, but I'm no longer convinced that using |
@nham presumably you were dissatisfied with all colliding instances taking on the same concrete value? |
@gankro Yeah, only storing a count instead of multiple instances means that |
@nham: presumably Edit: |
@gankro That is one way to go ( |
@nham: avoiding more nodes seems like it'd be a big performance win in several ways (less frequent heap allocs). Although... the more I think about it the more I'm uncertain if my TreeMultiSet impl makes sense (can you get the ref out of the Vec correctly?). A duplicate key tree would be saner there, for sure. Duplicate keys also has distinct semantics wrt popping nodes. If your keys have destructors, then Not sure which is the desired behaviour! |
@gankro: Do I understand correctly that in your TreeMultimap, the first (key, value) gets inserted as (&key, vec!((key, value))). Then subsequent pairs (k, v) where the |
@nham: I believe you are correct. My implementation wouldn't work. You'd need a wrapper around a ptr, I guess. |
fix: Don't skip closure captures after let-else As I understand that `return` was left there by accident. It caused capture analysis to skip the rest of the block after a let-else, and then missed captures caused incorrect results in borrowck, closure hints, layout calculation, etc. Fixes rust-lang#15623 I didn't understand why I using the example from rust-lang#15623 as-is doesn't work - I don't get the warnings unless I remove the `call_me()` call, even on the same commit as my own RA version which does show those warnings.
This is not finished, but I want some feedback before I go too far on my own. This defines two new traits,
Multiset
andMutableMultiset
. I currently have a partial implementation of these traits,TreeMultiset
, based on TreeMap, and I plan to add another one based on HashMap. I have tried to make these traits and theTreeMultiset
implementation closely correspond to the Set traits andTreeSet
.The
TreeMultiset
implementation is missing a few things, mostly some trait implementations. I also plan on adding a multiset sum operation, which is distinct from multiset union (see this Wikipedia page for reference).One thing I want to add to the Multiset trait is a
to_set
method that will return the underlying set of the multiset, which would just be all the distinct values in the multiset with multiple occurrences removed. However, this currently doesn't seem possible, as noted in #8154.Another thing I've wondered about: the Multiset trait could inherit the Set trait, but the way I've written it, MutableMultiset could not inherit MutableSet. One way this could be changed is to inherit MutableSet and use
insert
as a method to insert one occurrence (andremove
as a method to remove one occurrence) and then add trait methods for inserting multiple occurrence, perhaps calledinsert_multiple
/remove_multiple
orinsert_many
/remove_many
. I'm just not sure if this makes sense.