-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[benchmark] Janitor Duty: Datalore Legacy #21516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[benchmark] Janitor Duty: Datalore Legacy #21516
Conversation
arc4random_buf was dominating the creation time. We don’t need trully random data. Anything that quickly fills the buffer will do.
@phausler @eeckstein Could you give this a casual look, and flag any deal-breakers? (It's best to review it by individual commits, as it is once again a step-by-step refactoring.) |
This comment has been minimized.
This comment has been minimized.
Those regressions seem like they are hitting the wrong code paths. The other ones that improved are worrying too since they are really fast items and I’d wager the reduced time will cause more noise since the sample size isn’t a large enough significant sample. Particularly these: |
af998a1
to
6702381
Compare
This comment has been minimized.
This comment has been minimized.
For extracting setup overhead. Even the `large` sample Data is only 40KB, we can afford this to be static constant.
Removed the disabled Large variants (tagged `.skip`). We might re-introduce proper benchmarks for the `LargeSlice` implementation later, but for **that** the capacity has to be at least Int32.max (i.e. over 2GB). The `sampleData(.veryLarge)` was only 1GB.
Refactored to use shared test method and inlined runFunctions. Extracted setup overhead. `blackHole` is ubiquitous in SBS, no need to state the obvious.
Refactored to use inlined runFunctions. Extracted setup overhead.
Refactored to use shared test method and inlined runFunctions. Extracted setup overhead. Removed `Large` variant, as it was testing the same underlying implementation as `Medium`.
Refactored to use shared test method and inlined runFunctions. Extracted setup overhead. `blackHole` is ubiquitous in SBS, no need to state the obvious.
Refactored to use shared test method and inlined runFunctions. Extracted setup overhead.
Refactored to use shared test method and inlined runFunctions. Extracted setup overhead. Removed `Large` variant, as it was testing the same underlying implementation as `Medium`.
Refactored to use shared test method and inlined runFunctions. Extracted setup overhead.
Refactored to use inlined runFunctions.
Refactored to use inlined runFunctions.
Refactored to use shared test method and inlined runFunctions. Extracted setup overhead. Re-enabled `Large` tests errorneously disabled in swiftlang#20411.
Refactored to use shareable test method and inlined runFunction.
Refactored to use shareable test method and inlined runFunction.
Refactored to use shareable test method and inlined runFunction.
Refactored to use shared test method and inlined runFunctions.
Refactored to use shared test method and inlined runFunctions. Re-enabled `Large` test errorneously disabled in swiftlang#20411. Removed `skip` tags, as this was the last use.
Refactored to use shared test method and inlined runFunctions.
@swift-ci please benchmark |
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@phausler The String benchmarks were just an error on my part, I've lost a zero during loop multiplier extraction… 🤷♂️ I've finally realized what you meant by:
Initially, I was puzzling over You've mentioned before that:
If I understand correctly, these were meant to exercise IMHO there are still multiple issues with benchmarking the initialization of Data, even after replacing
I have played with many different variants: I suggest we ignore the runtime changes in It also looks like the "inline everything" strategy is backfiring on |
@swift-ci please smoke test |
@eeckstein @phausler Please review 🙏. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but please wait for @phausler to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes here look good to me too, though I'd want @phausler to get a chance to review. Tiny nits inline; otherwise, do you mind commenting on how you decided on these legacy factors?
I'm also separately working on pulling some of the inlining out where it was excessive, so I'll try to test with the changes here so we're looking at a consistent view of the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scalars could use a bit of renaming, but functionally this looks reasonable to me.
When you run locally |
I have a few extra benchmark variants stashed away… would you be interested? |
@phausler, @itaiferber Could you also respond to my suggestion about testing |
Here are few I don't think it very important how fast is inlined
|
Also added 2 forgotten legacy factors.
a55ce72
to
7db46b3
Compare
@swift-ci please benchmark |
@swift-ci please smoke test |
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@palimondo Regarding the large slice cases — I personally believe that both large slices and large inline slices are worth testing, but perhaps not at the cost of destabilizing other tests, or causing unnecessary delays in benchmarking for everyone. With more extensive, separate benchmarking infrastructure (to make it more reasonable to test this case more sparingly), it might be worth investigating. I think we're okay to drop the tests for now though. |
Perhaps I wasn't being clear enough. My point was, that I believe the point @phausler was making about the
When I played with |
The perf differential tween |
This PR follows-up #20861 and #21413. It's a next batch of benchmark clean-up to enable robust performance measurements by adjusting workloads to run in reasonable time (< 1000 μs), minimizing the accumulated error. To maintain long-term performance tracking, it applies legacy factor where necessary.
DataBenchmarks
have been recently extended with a lot of variants in #20396, while some of the tests have regressed with regards to elimination of setup overhead (previously fixed in ae9f5f1). ManyLarge
variants (problematic as well as some perfectly fine ones) have been disabled in #20411.Before application of
legacyFactor
, it was necessary to clean up the file to ease future maintenance. I have applied the technique of shared test functions and inlined run functions pioneered by @lorentey in SetTests.The disabled
Large
variants that were using.veryLarge
sampleData
were removed. We might re-introduce proper benchmarks for theLargeSlice
implementation later, but for that the capacity has to be at least Int32.max (i.e. over 2GB). ThesampleData(.veryLarge)
was "only" 1GB.The disabled
Large
tests that were using.large
sampleData
were re-enabled to maintain the continuity of long-term performance tracking, since these benchmarks predated the addition ofLarge
variants form #20396 (disabled in #20411).DataCreate[Small,Medium]
benchmarks were dominated byarc4random_buf
. We don’t need truly random data, so an increasing sequence of bytes is used to fills the buffer instead. This changes their runtime, but I didn't rename them, as they were introduce very recently — I believe this doesn't disrupt any long-term performance tracking yet.