-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Set Difference size_hint lower bound #530
Conversation
This implementation isn't perfect, but it is spec compliant as far as I can tell
My motivation for this was that I'm a perfectionist and get annoyed by this kind of thing, but I still did A LOT of benchmarking. The result of my benchmarking was that while the changes to collecting the various iterators to a The summary of my results is this: Here my exact results(click me):
The tests were run with a "large" 1000 element set and a "small" 100 element set with an overlap of 50 elements. That means that from a performance point of view, whether this change is worth it depends on whether collecting to a |
One way to potentially get rid of most of the I'm not entirely sure how trait based specialization works though, so I can't guarantee that this will work. The advantage is, though, that doing so should allow making collecting into a |
The overhead of a saturating add is tiny compared to memory allocation so that is most likely not the direct cause of the regression. Finding the actual cause would require looking at the exact assembly code generated by rustc and comparing the two versions along with profiling information. However I think it is still better to merge this for now since it does provide a more accurate lower bound for the iterator size. @bors r+ |
☀️ Test successful - checks-actions |
Implement XxxAssign operations on HashSets This PR primarily implements the XxxAssign operation traits for `HashSet`. My primary motivation to do so is for convenience, but depending on the situation they can provide a significant performance improvement as well.\* In my tests, which may not be ideal because I don't have much benchmarking experience, the assigning operations are, with the exception of `Sub`, a minimum of 25% faster.\* Note that when swapping the large and the small set around, some of these are significantly slower than the non-assigning variant. Therefore using them is likely only worth it performance wise, if you already know which set is larger, and the right one of the sets just so happens to be the one you don't need to keep. \* Results may have changed due to #530 being merged Here my exact benchmark results, done with the newly added benchmark suit: <!DOCTYPE html> VER | LSIZE | SSIZE | OP | NS/ITER | DIFF (%) | COMMENT -- | -- | -- | -- | -- | -- | -- 1 | 1000 | 100 | and | 5,682.88 | | 1 | 1000 | 100 | or | 41,427.82 | | 1 | 1000 | 100 | xor | 57,404.27 | | 1 | 1000 | 100 | subls | 56,262.53 | | 1 | 1000 | 100 | subsl | 751.42 | | 1 | 1000 | 2 | and | 100.16 | | 1 | 1000 | 2 | or | 40,435.09 | | 1 | 1000 | 2 | xor | 59,058.05 | | 1 | 1000 | 2 | subls | 58,668.34 | | 1 | 1000 | 2 | subsl | 18.89 | | 1 | 1000 | 100 | or_ass | 32,888.49 | -20.61% | unconditional insert 2 | 1000 | 100 | or_ass | 29,397.04 | -29.04% | !contains insert 3 | 1000 | 100 | or_ass | 32,399.65 | -21.79% | extend iter().cloned() 4 | 1000 | 100 | or_ass | 30,693.33 | -25.91% | get_or_insert_owned 5 | 1000 | 100 | or_ass | 33,722.59 | -18.60% | calc intersection; extend rhs.iter() !intersection contains; Requires S: Clone 1 | 1000 | 100 | add_ass | 30,114.17 | -26.66% | !contains insert 1 | 1000 | 100 | xor_ass | 32,309.85 | -43.72% | contains remove else insert 2 | 1000 | 100 | xor_ass | 40,058.48 | -30.22% | extract_if rhs contains; extend !removed contains 3 | 1000 | 100 | xor_ass | 31,801.04 | -44.60% | raw_entry().from_key() replace_entry_with / insert 4 | 1000 | 100 | xor_ass | 31,935.07 | -44.37% | raw_entry().from_key_hashed_nocheck() replace_entry_with / insert_hashed_nocheck 5 | 1000 | 100 | xor_ass | 31,843.33 | -44.53% | self.map.table.get.is_none self.map.table.insert else self.map.table.remove_entry 1 | 1000 | 100 | subls_ass | 33,366.13 | -40.70% | contains remove 1 | 1000 | 100 | subsl_ass | 10,686.02 | 1322.11% | contains remove 2 | 1000 | 100 | subls_ass | 36,351.69 | -35.39% | retain !contains 2 | 1000 | 100 | subsl_ass | 3,939.67 | 424.30% | retain !contains 3 | 1000 | 100 | subls_ass | 32,012.82 | -43.10% | unconditional remove 3 | 1000 | 100 | subsl_ass | 9,908.76 | 1218.67% | unconditional remove 4 | 1000 | 100 | subls_ass | 36,232.13 | -35.60% | self.map.retain !contains 4 | 1000 | 100 | subsl_ass | 3,939.35 | 424.25% | self.map.retain !contains 5 | 1000 | 100 | subls_ass | 31,879.32 | -43.34% | if rhs smaller self unconditional remove else retain !contains 5 | 1000 | 100 | subsl_ass | 3,946.98 | 425.27% | if rhs smaller self unconditional remove else retain !contains 1 | 1000 | 2 | add_ass | 28,324.95 | -29.27% | 2 | 1000 | 2 | or_ass | 28,322.62 | -29.96% | 1 | 1000 | 2 | xor_ass | 29,107.31 | -50.71% | 3 | 1000 | 2 | xor_ass | 29,026.82 | -50.85% | 1 | 1000 | 2 | subls_ass | 29,310.04 | -50.04% | 1 | 1000 | 2 | subsl_ass | 4,212.56 | 22200.48% | 2 | 1000 | 2 | subls_ass | 34,074.85 | -41.92% | 2 | 1000 | 2 | subsl_ass | 66.43 | 251.67% | 3 | 1000 | 2 | subls_ass | 29,340.86 | -49.99% | 3 | 1000 | 2 | subsl_ass | 5,972.25 | 31515.93% | 5 | 1000 | 2 | subls_ass | 29,460.49 | -49.78% | 5 | 1000 | 2 | subsl_ass | 65.32 | 245.79% | In addition to the Assigning operators this PR changes a few more things: * It changes the allocator bound on the non-assigning set operations to `A: Allocator + Default`. * I also added a benchmark suit for the set operations.
This PR makes the Set
Difference
iterator generate a non-zero lower bound in some situations.Specifically, it now returns a non-zero lower bound if the
difference
method is called on a larger set with a smaller set.That is because in those cases the fact that sets can't really contains duplicates* guarantees that a minimum of
self.len() - other.len()
items will be returned by the iterator.* Well, they can, but that is already documented to be causing a random mess
This implementation has the disadvantage that a single
next()
call may reduce the lower bound by more than one.Every size hint generated, even the first largest one, is guaranteed to be correct, but it may be confusing if one
next()
call causes the lower bound to drop by more than one.This could be avoided by storing the minimum number of resulting elements in the iterator and subtracting one each time
next()
is called, but I don't think its worth the added complexity.