-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement MergeBy::fold
#920
Implement MergeBy::fold
#920
Conversation
ecb2cd2
to
0dfd502
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #920 +/- ##
==========================================
+ Coverage 94.38% 94.42% +0.03%
==========================================
Files 48 48
Lines 6665 6872 +207
==========================================
+ Hits 6291 6489 +198
- Misses 374 383 +9 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-75% is a nice win!
I doubt my change below speed things even more but could you re-run the benchmark?
Side note: For Combinations::nth
, I read part of criterion doc for ploting purposes (curiousity) but I also read that:
- To save a baseline, use
cargo bench -- --save-baseline <name>
. To compare against an existing baseline, usecargo bench -- --baseline <name>
.
I did not use this yet, but I guess we can:
Once feature branch created, cargo bench --bench specializations merge_by/fold -- --save-baseline mergeby
Then we can do this as many times as we want without overwriting the baseline: cargo bench --bench specializations merge_by/fold -- --baseline mergeby
Just a trick.
Oh saving the baseline is so useful, thanks for pointing that out! Compared to master with the change above it actually gets a little slower
I've noticed in the past that match can often be much faster than comparable expressions, so perhaps the compiler has a better time with the more verbose code. Let me know if you want me to push the patch or leave as is. |
I wondered, I checked out your branch, experimented, benchmarked but still wonder. I eventually noticed that the benchmark I wrote only has zeros. Apart from being unrealistic which can be fine, I guess it repeatedly uses only one arm of the big match statement which is more problematic. itertools/benches/specializations.rs Lines 583 to 610 in f7d557d
The thing that bothers me is that our benchmark is still quite basic: adapted iterators are iterated slices, function I would like a more experienced opinion here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have this, couldn't this eliminate other specializations (e.g. count
)?
The thing that bothers me is that our benchmark is still quite basic: adapted iterators are iterated slices, function
f
simple integer comparison. [...]
I think you're right. It's probably hard to get a really good estimate of a function's performance if you do not know the exact use.
But I am not sure how to reliably improve this. Should we construct benchmarks such that they generally execute many different code paths (possibly even so (almost) each iteration chooses another branch)? Have them operate on struct
s whose methods are inline(never)
instead of on integers? Maybe look for usage in the wild, and boil down these real-world cases to their benchmarkable essence?
@phimuemue Yep, good call. The speed-up is pretty large (n.b. timings are using @Philippe-Cholet's modification). Happy to submit a separate PR removing these specialisations or add it to this one. Whatever y'all prefer
|
I imagine this PR in 4 commits:
|
Sounds like a plan, @Philippe-Cholet.
Let's remove it and comment that we could further specialize |
Easy, I'll sort it out tomorrow. Thanks both :) |
0dfd502
to
e8b0daa
Compare
It is faster to rely on new `MergeBy::fold` implementation
Return `(Option<Either<L, R>>, MergeResult)` instead of `(Option<L>, Option<R>, MergeResult)`
Ok sorted. Simplifying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
I'm still surprised that last
and count
are that faster as I expected their (old) specializations to be roughly as fast as "their default with fold
specialized".
EDIT: Could not resist running benchmarks. Well, there is no doubt.
with last/count specialized with only fold specialized
merge/count [1.7842 µs 1.7871 µs 1.7902 µs] [1.1981 µs 1.2033 µs 1.2103 µs] [-34.336% -32.623% -30.803%]
merge/last [2.1839 µs 2.1916 µs 2.2010 µs] [600.16 ns 602.19 ns 604.29 ns] [-73.024% -72.604% -72.156%]
merge_by/count [1.7115 µs 1.7157 µs 1.7208 µs] [808.16 ns 812.90 ns 818.16 ns] [-53.581% -53.304% -53.010%]
merge_by/last [1.5753 µs 1.5793 µs 1.5843 µs] [277.42 ns 278.06 ns 278.84 ns] [-82.877% -82.533% -82.194%]
merge_join_by_ordering/count [1.3395 µs 1.3439 µs 1.3485 µs] [1.0581 µs 1.0762 µs 1.0997 µs] [-21.419% -20.487% -19.520%]
merge_join_by_ordering/last [1.6069 µs 1.6134 µs 1.6200 µs] [1.0075 µs 1.0103 µs 1.0135 µs] [-37.745% -37.080% -36.476%]
merge_join_by_bool/count [1.7556 µs 1.7595 µs 1.7634 µs] [805.21 ns 807.31 ns 809.63 ns] [-54.132% -53.966% -53.797%]
merge_join_by_bool/last [1.6153 µs 1.6207 µs 1.6266 µs] [543.29 ns 544.50 ns 545.80 ns] [-66.382% -66.234% -66.101%]
Relates to #755