-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[benchmark] Data.[init,append].Sequence various sizes #21848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[benchmark] Data.[init,append].Sequence various sizes #21848
Conversation
31f0bd5
to
cca8382
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
2e66d22
to
f4dee7e
Compare
This comment has been minimized.
This comment has been minimized.
f4dee7e
to
61e85ce
Compare
61e85ce
to
8b7a594
Compare
@swift-ci please benchmark |
This comment has been minimized.
This comment has been minimized.
8b7a594
to
626878e
Compare
@swift-ci please benchmark |
@atrick @eeckstein @itaiferber @phausler Please review. Notes about the benchmark results:
I've pushed another update (and triggered a benchmark), where I've added |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me for measuring that particular set of cases. I honestly think more coverage is better so I don’t see a need for paring this down any.
Umm… I mean… at least the |
@swift-ci please benchmark |
@swift-ci please smoke test |
@phausler I'm going to merge this based on your previous approval once the checks pass, if you didn't change your mind about the multitude of these benchmarks. |
Build failed before running benchmark. |
@swift-ci please benchmark |
!!! Couldn't read commit file !!! |
@swift-ci please benchmark |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
Follow-up to #21766 for benchmarking the refinement of the new
Data
implementation in #21754.This PR adds thorough performance coverage of the
init
andappend
methods forSequence
s. During the discussion in #21754 (comment), the use of stack allocated buffer for sequences shorter than 2kB (_withStackOrHeapBuffer
) has been mentioned as potential reason for the performance differences we were seeing there. This adds benchmarks to validate that hypothesis.During experimentation with these benchmarks I've learned of my previous mistake:
ExactCount
vs.UnderestimatedCount
wasn't actually an apples-to-apples comparison. Though functionally equivalent, therepeatElementSeq
wasn't compiled and optimized to same code asrepeatElement
. The slowdown we were seeing could be caused by the difference in sequence implementation. My bad! 🙇♂️To rectify this, I have first created
Bytes
that produces simple increasing byte sequence and depending on theexact
parameter reports either exact count or0
as itsunderestimatedCount
. This sequence is very simple I have worried it would be inlinable even when other more complex sequences were not. Therefore I've createdCount0
wrapper that always reportsunderestimatedCount = 0
and can be used to erase the exact from other sequences. Using this I've added a secondRE
benchmark variant that usesrepeatElement
, as was originally the case with theDataAppendSequence
.To thoroughly explore the problem space, I have created another 2 variants from all tests: one using shared generic non-inlinable test method and another with sequence created directly in the same block, which theoretically allows for the sequence to be inlined into the
Data
'sinit
orappend
methods — denoted with.I
suffix for inlinable.I've also added sequences that test around the edges of the two possible thresholds (
511B/513B
,2047B/2049B
) for the stack allocated / heap allocated buffer and in my local tests it looks like the complicated implementation is bringing no measurable advantage. I suspect that the complexity of_withStackOrHeapBuffer
is the reason for the slowdown betweenCount
andCount0
.Notes about the benchmark results:
Bytes
-based as wellrepeatElement
-based (RE
) perform similarly.In summary:
809B
group (using*100
multiplier).511B/513B
and2047B/2049B
test the advantage of stack allocated buffer.64kB
sequence (multiplier of1
) were added.Count
is a test group where sequences report their full length inunderestimatedCount
.Count0
groupunderestimatedCount
always returns0
.Bytes
sequence..RE
are usingrepeatElement
sequence..I
, the sequence can be potentially inlined intoData
's methods.The scaling is derived from the presumption that
809B
creates/appends ~80kB of data, so the new64kB
group should be faster but roughly in the same ball park.