-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve worst-case performance of BTreeSet intersection v3 #59186
improve worst-case performance of BTreeSet intersection v3 #59186
Conversation
r? @KodrAus |
Hi @ssomers 👋 Would you prefer to close #58577 and #59078 and look at this PR instead? I haven't had a chance to look at what's changed since the changeset I approved, could you summarize the differences? It looks like this one does keep the |
Sure. The difference with the approved state is:
PS and yes, I prefer to close the other two PRs. |
} else { | ||
(other, self) | ||
}; | ||
if a_set.len() > b_set.len() / 16 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth leaving a comment here about what this branch is for and why we use the constant 16
to decide which strategy to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment left, but it seems you have to find it yourself.
} | ||
} | ||
} | ||
} | ||
|
||
fn size_hint(&self) -> (usize, Option<usize>) { | ||
(0, Some(min(self.a.len(), self.b.len()))) | ||
let max_size = match &self.inner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would actually be min_size
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had I taken the time to look up the doc of size_hint, I would have called it upper_bound
. It's the "min" of the input sets, and "max" of the size_hint, so I'm not a fan of min_size
either. How about min_len
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
min_len
is ok with me 👍
b_iter: Iter<'a, T>, | ||
}, | ||
Search { | ||
a_iter: Iter<'a, T>, // for size_hint, should be the smaller of the sets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the implementation depends on a_iter
being smaller I think we should choose more descriptive names here for a
and b
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, I hate the name a
because it's intermingled with the same lifetime identifier, but didn't dare to change them. I'll go for small
and large
in the Search case, small
and other
in the Stitch case, unless you object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good!
src/liballoc/benches/lib.rs
Outdated
@@ -1,5 +1,6 @@ | |||
#![feature(repr_simd)] | |||
#![feature(test)] | |||
#![feature(benches_btree_set)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before merging I think we should remove this feature and make the methods private (or just inline them). It's nice to get an idea of the performance characteristics of each strategy, but once we understand those I think we can move forward with just the general benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we only understand the performance characteristics:
- on 1 platform and 1 machine (I could build it on Linux on another machine though, but nothing modern),
- for one generation of the BTreeSet implementation. I see significant improvement from stable to nightly in my BTreeSet macro-benchmarks, tens of percents, with the same intersection implementation.
What alternatives are there for the feature litter?
- Moving the benchmarks closer to the lib code is terrible: every tweak or comment requires over an hour to build.
- Duplicating the lib code near/in the benchmark file is doable. You should notice if it's not up to date anymore when you check that the actual performance matches that of either implementation.
- Or similarly, duplicating the lib code and benchmarks in some separate repository (like I already did in https://github.com/ssomers/Bron-Kerbosch/blob/master/rust/bron_kerbosch/src/util.rs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssomers I think duplicating the code in a separate repository is the best way to go here 👍 I believe we can run benchmarks in CI here (I don't remember the command off the top of my head) so we can worry about this later.
I made the benchmarks more fine-grained with set sizes and plotted the results (still only one machine). I think this highly suggests that the ideal strategy rule is much more complicated (not just logarithmic with size), and if that rule doesn't spend the performance it gains on evaluating itself, it could be greatly off on a system with different word size, cache sizes and architecture. But it also confirms that factor 16 seems quite reasonable. To prevent the <30% performance hit for intersection of particularly crafted large sets, it would have to be 19. From the viewpoint of random sets, factor 16 should be lowered instead. But if I commit all 146 benchmarks I have now, it's simply annoying for someone casually checking performance in general. So I'm becoming convinced that it's better to move this case study over to a separate repository, and leave in this PR only the final rule, and the few benchmarks already merged in earlier (or less), and thus nothing exposed as public unstable. Checking the rust source code, there are other references to github repositories besides rust-lang so I can probably just create one myself. PS the <30% performance hit is actually closer to 15%, apparently thanks to no longer using Peekable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the new comment (cause it seems rather difficult to find in the previous comment)
Oh well, this seems worse and can't be deleted.
I configured Travis on the separate repository and the readings on the linux build on that (virtual) machine are quite similar (with much less stability, as one would expect). The ideal factor is < 1 higher accross the range of sizes. |
Thanks for all your investigation work @ssomers! Just for good measure, let's add another test case to check_intersection(&[11, 5000, 1, 3, 77, 8924, 103],
&(0..1000).collect::<Vec<_>>(),
&[11, 1, 3, 77, 103]); |
I thought the test cases in the first PR were still there and they didn't even include non-subset samples. Rest assured, the proptest in the separate repository covered everything. So are we there now? Nope, I just realized that is_subset/superset is really quite similar. I'll keep that out of this PR, but it might mean that the 16 becomes a named constant. PS change of plan: it wasn't that much work, that revealed the comments in intersection weren't accurate, and the benchmarks clunky, so I committed it here anyway |
I cooked up a similar change to the implementation of set difference. It only needs half of the peekables, and it benefits from the same performance boost as with intersection if the right hand set is huge. I cooked up similar code resulting in: before:
after:
Do you want me to commit it here or later? (or not at all...) |
This is looking great! Thanks for giving these methods some TLC @ssomers.
Yeh I think we can roll difference into this PR as well while we're working on these set operations. I'm happy with the implementation of intersection now. |
I meant "push" instead of "commit", but github lists the commits according to time committed locally anyway. Confusing... Anyways, it's all here now, and nothing changed to the implementation of intersection. |
For future reference: if you wonder why we have to tediously implement clone for Difference and for Intersection, while BTreeSet itself gets away with an easy derive(Clone), it's because of #26925. BTreeSet doesn't have clone unless T has it, and that makes all the sense in the world. It doesn't make much sense for Difference and Intersection, because they hand out references to T. |
Alrighty, this PR introduces some additional complexity to @bors r+ |
📌 Commit bb7bf9b8ea66d72a7e29c6c7a37ddaa8924ef62f has been approved by |
I don't mind, but it's going to take a while to figure out how. |
bb7bf9b
to
f5fee8f
Compare
@bors r+ |
📌 Commit f5fee8f has been approved by |
⌛ Testing commit f5fee8f with merge e0e27d75ee5c9618f834391bc06355d848dfc2d7... |
💔 Test failed - checks-travis |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
…ited_again, r=KodrAus improve worst-case performance of BTreeSet intersection v3 Variation of [rust-lang#59078](rust-lang#59078) with `Intersection` remaining a struct r? @scottmcm
…ited_again, r=KodrAus improve worst-case performance of BTreeSet intersection v3 Variation of [rust-lang#59078](rust-lang#59078) with `Intersection` remaining a struct r? @scottmcm
improve worst-case performance of HashSet.is_subset One more simple optimization opportunity for HashSet that was applied in BTreeSet in rust-lang#59186 (and wasn't in rust-lang#57043). Already covered by the existing unit test. r? @KodrAus
Variation of #59078 with
Intersection
remaining a structr? @scottmcm