-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Benchmarks][stdlib] Adding an extra benchmark for set isDisjoint for disjoint sets of different size #39265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Benchmarks][stdlib] Adding an extra benchmark for set isDisjoint for disjoint sets of different size #39265
Conversation
@swift-ci Please benchmark |
Performance (x86_64): -O
Code size: -O
Performance (x86_64): -Osize
Code size: -Osize
Performance (x86_64): -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
One of these days we'll need to take a long, hard look at how an innocent addition like this can trigger such large swings in unchanged code paths within the same module.
Argh, my bad, I missed that this changes those paths. (Reviewing PRs on my phone is not a good idea. 🙈)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to leave the existing benchmarks as is -- changing them would interfere with performance tracking.
(The new code is great for the new benchmarks, though!)
No worries, Thanks for the review! |
… disjoint sets of different size
46efd31
to
c3c1e6d
Compare
Just fixed! |
@swift-ci Please benchmark |
Performance (x86_64): -O
Code size: -O
Performance (x86_64): -Osize
Code size: -Osize
Performance (x86_64): -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@swift-ci Please smoke test |
@lorentey Just for curiosity, in this on the how to read: |
@lorentey, friendly ping :) |
It's a heuristic that puts the mark on (almost) all results that are overlapping with the old run. Here is the logic behind it: # Indication of dubious changes: when result's MIN falls inside the
# (MIN, MAX) interval of result they are being compared with.
self.is_dubious = (old.min < new.min and new.min < old.max) or (
new.min < old.min and old.min < new.max
) I believe this will tend to shame benchmarks that have a high variability. (I'm not nearly good enough with statistics to tell if it's a good one, but it seems to be working relatively well!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you for working on this.
Regression OLD NEW DELTA RATIO Set.isDisjoint.Box25 355 515 +45.1% 0.69x (?) Set.isDisjoint.Int25 262 344 +31.3% 0.76x (?)
I wonder if the nondeterministic hashing config was lost from the benchmarking environment. This could also be an instruction cache artifact from the functions getting slightly bumped in the emitted binary. Well, it's an investigation for another day. :-/
Ah interesting, thank you for the explanation!
Variability would be across multiple runs with same input values? I guess that given that most of the methods being benchmarked are deterministic, variability across multiple runs may come from hardware state e.g. variability of CPU cache hits and misses in different runs... |
Yep -- the benchmarks are measured over multiple iterations, scaled so that each measurement takes some fixed amount of time. I believe the time interval is set small to discourage/prevent context switches in the middle of a benchmark, but measuring over multiple iterations also reduces some of the noise by averaging it out. Then iirc these measurements are repeated multiple times to collect a nice sample. The min/max/avg/stddev values reported come from these samples. (Beware I'm sure I got at least some of these details wrong -- the truth is in the code. 😅 ) |
Interesting to know, Thanks :) |
As for the improvement on #39263 we can really see much improvement because sets are same size.
So this adds a new smaller disjoint set and a benchmark for 0 overlap (disjoint) using those sets. And also make the benchmark run to run
a.isDisjoint(b)
andb.isDisjoint(a)
to give a better overview of benchmarks with different size sets.