-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast counts #168
Fast counts #168
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, it would be indeed faster to compute the cardinality of operations 😃
dd70be9 greatly simplified the implementation, though it is slower without cached len. This will be much faster once #45 is fixed Cardinality of all ops can all be trivially implemented in terms of the cardinality of the intersection
Intersection cardinality is cheapest to compute, so it's both simpler to implement and has better runtime performance once we have the len cached. |
Benchmarks done now. It's about 0.5x compared to materializing container. This is good enough to ship even without #45 it's still a win |
Curious how much this can improve milli. I would think this would be used a lot for any document distance metrics. |
I'm not sure about that, the most used features of roaring are obviously set operations and multi-ops (unions of a lot of bitmaps). It can take up to 70-80% (on ~35-50ms) of the search time sometimes! |
Without seeing the use site I cant tell if you need to materialize the containers or not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you!
Co-authored-by: Clément Renault <renault.cle@gmail.com>
Co-authored-by: Clément Renault <renault.cle@gmail.com>
fn intersection_len_impl(lhs: &RoaringTreemap, rhs: &RoaringTreemap) -> u64 { | ||
let mut len = 0; | ||
for (key, right_rb) in &rhs.map { | ||
if let Some(left_rb) = lhs.map.get(key) { | ||
len += left_rb.intersection_len(right_rb); | ||
} | ||
} | ||
len | ||
} | ||
|
||
// We make sure that we apply the intersection operation on the biggest map. | ||
if self.map.len() < other.map.len() { | ||
intersection_len_impl(self, other) | ||
} else { | ||
intersection_len_impl(other, self) | ||
} | ||
self.pairs(other) | ||
.map(|pair| match pair { | ||
(Some(..), None) => 0, | ||
(None, Some(..)) => 0, | ||
(Some(lhs), Some(rhs)) => lhs.intersection_len(rhs), | ||
(None, None) => 0, | ||
}) | ||
.sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay, was thinking about that but forget a second after, nice change here! ❤️
I assure you that those bitmaps are computed then intersected to reduce the number of candidates 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you again!
bors merge
Build succeeded: |
168: Fast counts r=Kerollmops a=saik0 See RoaringBitmap#167 Add methods to compute the cardinality of two bitmaps without materializing them. Once the naming conventions and overall structure for union is approved I will add tests and the remaining ops TODO: - Bitmap - [x] union - [x] intersection - [x] difference - [x] symmetric difference - [x] tests - [x] benchmarks - Blocked by RoaringBitmap#129 - Treemap - [x] union - [x] intersection - [x] difference - [x] symmetric difference - [x] tests Co-authored-by: saik0 <github@saik0.net> Co-authored-by: Joel Pedraza <github@saik0.net>
See #167
Add methods to compute the cardinality of two bitmaps without materializing them. Once the naming conventions and overall structure for union is approved I will add tests and the remaining ops
TODO: