Skip to content

[benchmark] Data.[init,append].Sequence various sizes #21848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 20, 2019

Conversation

palimondo
Copy link
Contributor

@palimondo palimondo commented Jan 14, 2019

Follow-up to #21766 for benchmarking the refinement of the new Data implementation in #21754.

This PR adds thorough performance coverage of the init and append methods for Sequences. During the discussion in #21754 (comment), the use of stack allocated buffer for sequences shorter than 2kB (_withStackOrHeapBuffer) has been mentioned as potential reason for the performance differences we were seeing there. This adds benchmarks to validate that hypothesis.

During experimentation with these benchmarks I've learned of my previous mistake: ExactCount vs. UnderestimatedCount wasn't actually an apples-to-apples comparison. Though functionally equivalent, the repeatElementSeq wasn't compiled and optimized to same code as repeatElement. The slowdown we were seeing could be caused by the difference in sequence implementation. My bad! 🙇‍♂️

To rectify this, I have first created Bytes that produces simple increasing byte sequence and depending on the exact parameter reports either exact count or 0 as its underestimatedCount. This sequence is very simple I have worried it would be inlinable even when other more complex sequences were not. Therefore I've created Count0 wrapper that always reports underestimatedCount = 0 and can be used to erase the exact from other sequences. Using this I've added a second RE benchmark variant that uses repeatElement, as was originally the case with the DataAppendSequence.

To thoroughly explore the problem space, I have created another 2 variants from all tests: one using shared generic non-inlinable test method and another with sequence created directly in the same block, which theoretically allows for the sequence to be inlined into the Data's init or append methods — denoted with .I suffix for inlinable.

I've also added sequences that test around the edges of the two possible thresholds (511B/513B, 2047B/2049B) for the stack allocated / heap allocated buffer and in my local tests it looks like the complicated implementation is bringing no measurable advantage. I suspect that the complexity of _withStackOrHeapBuffer is the reason for the slowdown between Count and Count0.

Notes about the benchmark results:

  • The the Bytes-based as well repeatElement-based (RE) perform similarly.
  • It looks like sequence inlining is not happening.

In summary:

  • Existing sequence benchmarks were collected into 809B group (using *100 multiplier).
  • New variants 511B/513B and 2047B/2049B test the advantage of stack allocated buffer.
  • Benchmarks for longer 64kB sequence (multiplier of 1) were added.
  • Count is a test group where sequences report their full length in underestimatedCount.
  • In the Count0 group underestimatedCount always returns 0.
  • By default test are using Bytes sequence.
  • Variants marked with .RE are using repeatElement sequence.
  • In variants marked with .I, the sequence can be potentially inlined into Data's methods.

The scaling is derived from the presumption that 809B creates/appends ~80kB of data, so the new 64kB group should be faster but roughly in the same ball park.

@palimondo

This comment has been minimized.

@palimondo

This comment has been minimized.

@swift-ci

This comment has been minimized.

@palimondo palimondo force-pushed the and-dreadfully-distinct branch 2 times, most recently from 2e66d22 to f4dee7e Compare January 15, 2019 08:29
@swift-ci

This comment has been minimized.

@palimondo palimondo force-pushed the and-dreadfully-distinct branch from f4dee7e to 61e85ce Compare January 15, 2019 17:59
@swiftlang swiftlang deleted a comment from swift-ci Jan 15, 2019
@palimondo palimondo force-pushed the and-dreadfully-distinct branch from 61e85ce to 8b7a594 Compare January 15, 2019 19:26
@palimondo
Copy link
Contributor Author

@swift-ci please benchmark

@swift-ci

This comment has been minimized.

@palimondo palimondo force-pushed the and-dreadfully-distinct branch from 8b7a594 to 626878e Compare January 15, 2019 21:05
@palimondo
Copy link
Contributor Author

@swift-ci please benchmark

@palimondo
Copy link
Contributor Author

palimondo commented Jan 15, 2019

@atrick @eeckstein @itaiferber @phausler Please review.

Notes about the benchmark results:

  • The the Bytes-based as well repeatElement-based (RE) perform similarly.
  • It looks like sequence inlining is not happening.

I've pushed another update (and triggered a benchmark), where I've added Array.init/append.Sequence.64kB benchmarks for direct comparison. I've also added sequences that test around the edges of the two possible thresholds (511B/513B, 2047B/2049B) for the stack allocated / heap allocated buffer and in my local tests it looks like the complicated implementation is bringing no measurable advantage. I suspect that the complexity of _withStackOrHeapBuffer is the reason for the slowdown between Count and Count0. Let me know what you think...

@swift-ci

This comment has been minimized.

@palimondo
Copy link
Contributor Author

Copy link
Contributor

@phausler phausler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me for measuring that particular set of cases. I honestly think more coverage is better so I don’t see a need for paring this down any.

@palimondo
Copy link
Contributor Author

palimondo commented Jan 21, 2019

Umm… I mean… at least the Array benchmarks are not really part of this. They're here just to illustrate the performance gap we should probably close or at least reasonably justify before considering the implementation good enough to ship.

@palimondo palimondo changed the title [WIP][benchmark] Data.[init,append].Sequence.[809B,64kB] [benchmark] Data.[init,append].Sequence various sizes Feb 19, 2019
@palimondo
Copy link
Contributor Author

@swift-ci please benchmark

@palimondo
Copy link
Contributor Author

@swift-ci please smoke test

@palimondo
Copy link
Contributor Author

@phausler I'm going to merge this based on your previous approval once the checks pass, if you didn't change your mind about the multitude of these benchmarks.

@swift-ci
Copy link
Contributor

Build failed before running benchmark.

@palimondo
Copy link
Contributor Author

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

!!! Couldn't read commit file !!!

@palimondo
Copy link
Contributor Author

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
DataCreateEmpty 170 200 +17.6% 0.85x
DataCountSmall 22 25 +13.6% 0.88x
DataCountMedium 28 31 +10.7% 0.90x (?)
Improvement
DataSubscriptSmall 31 28 -9.7% 1.11x
SortLettersInPlace 554 507 -8.5% 1.09x (?)
Added
Data.append.Sequence.64kB.Count 60 61 60
Data.append.Sequence.64kB.Count.I 60 62 61
Data.append.Sequence.64kB.Count.RE 79 82 80
Data.append.Sequence.64kB.Count.RE.I 79 82 80
Data.append.Sequence.64kB.Count0 374 374 374
Data.append.Sequence.64kB.Count0.I 373 374 374
Data.append.Sequence.64kB.Count0.RE 379 379 379
Data.append.Sequence.64kB.Count0.RE.I 380 381 380
Data.append.Sequence.809B.Count 178 188 181
Data.append.Sequence.809B.Count.I 184 188 185
Data.append.Sequence.809B.Count.RE 209 225 215
Data.append.Sequence.809B.Count.RE.I 204 209 206
Data.append.Sequence.809B.Count0 676 676 676
Data.append.Sequence.809B.Count0.I 670 671 671
Data.append.Sequence.809B.Count0.RE 560 561 560
Data.append.Sequence.809B.Count0.RE.I 559 560 559
Data.init.Sequence.2047B.Count.I 136 140 137
Data.init.Sequence.2047B.Count0.I 698 699 698
Data.init.Sequence.2049B.Count.I 135 138 136
Data.init.Sequence.2049B.Count0.I 696 698 697
Data.init.Sequence.511B.Count.I 191 195 192
Data.init.Sequence.511B.Count0.I 687 688 687
Data.init.Sequence.513B.Count.I 192 197 194
Data.init.Sequence.513B.Count0.I 695 762 718
Data.init.Sequence.64kB.Count 59 62 60
Data.init.Sequence.64kB.Count.I 59 62 60
Data.init.Sequence.64kB.Count.RE 78 82 79
Data.init.Sequence.64kB.Count.RE.I 78 81 79
Data.init.Sequence.64kB.Count0 376 377 376
Data.init.Sequence.64kB.Count0.I 375 377 376
Data.init.Sequence.64kB.Count0.RE 381 382 382
Data.init.Sequence.64kB.Count0.RE.I 381 382 381
Data.init.Sequence.809B.Count 154 157 155
Data.init.Sequence.809B.Count.I 154 157 155
Data.init.Sequence.809B.Count.RE 180 185 182
Data.init.Sequence.809B.Count.RE.I 181 185 182
Data.init.Sequence.809B.Count0 642 643 642
Data.init.Sequence.809B.Count0.I 639 639 639
Data.init.Sequence.809B.Count0.RE 637 637 637
Data.init.Sequence.809B.Count0.RE.I 633 633 633
Removed
Data.append.Sequence.ExactCount 199 199 199
Data.append.Sequence.UnderestimatedCount 1203 1302 1236
Data.init.Sequence.ExactCount 174 178 175
Data.init.Sequence.UnderestimatedCount 1277 1396 1317

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
DataBenchmarks.o 52228 84494 +61.8% 0.62x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Improvement
DataSubscriptSmall 31 25 -19.4% 1.24x
DataCreateEmpty 200 170 -15.0% 1.18x
DataCountSmall 28 25 -10.7% 1.12x
DataCopyBytesSmall 134 122 -9.0% 1.10x (?)
DataCountMedium 34 31 -8.8% 1.10x (?)
Data.hash.Empty 74 68 -8.1% 1.09x (?)
Added
Data.append.Sequence.64kB.Count 60 63 61
Data.append.Sequence.64kB.Count.I 60 63 61
Data.append.Sequence.64kB.Count.RE 79 82 80
Data.append.Sequence.64kB.Count.RE.I 79 81 80
Data.append.Sequence.64kB.Count0 339 342 340
Data.append.Sequence.64kB.Count0.I 337 342 339
Data.append.Sequence.64kB.Count0.RE 346 350 347
Data.append.Sequence.64kB.Count0.RE.I 346 350 347
Data.append.Sequence.809B.Count 175 179 176
Data.append.Sequence.809B.Count.I 173 176 174
Data.append.Sequence.809B.Count.RE 206 211 208
Data.append.Sequence.809B.Count.RE.I 207 213 209
Data.append.Sequence.809B.Count0 514 534 521
Data.append.Sequence.809B.Count0.I 511 511 511
Data.append.Sequence.809B.Count0.RE 515 515 515
Data.append.Sequence.809B.Count0.RE.I 522 522 522
Data.init.Sequence.2047B.Count.I 133 137 134
Data.init.Sequence.2047B.Count0.I 636 671 648
Data.init.Sequence.2049B.Count.I 134 137 135
Data.init.Sequence.2049B.Count0.I 629 630 629
Data.init.Sequence.511B.Count.I 188 193 190
Data.init.Sequence.511B.Count0.I 642 643 643
Data.init.Sequence.513B.Count.I 189 192 190
Data.init.Sequence.513B.Count0.I 756 756 756
Data.init.Sequence.64kB.Count 59 62 60
Data.init.Sequence.64kB.Count.I 61 64 62
Data.init.Sequence.64kB.Count.RE 78 83 80
Data.init.Sequence.64kB.Count.RE.I 78 80 79
Data.init.Sequence.64kB.Count0 340 341 341
Data.init.Sequence.64kB.Count0.I 340 341 340
Data.init.Sequence.64kB.Count0.RE 348 352 349
Data.init.Sequence.64kB.Count0.RE.I 348 350 349
Data.init.Sequence.809B.Count 153 156 154
Data.init.Sequence.809B.Count.I 154 157 155
Data.init.Sequence.809B.Count.RE 183 188 185
Data.init.Sequence.809B.Count.RE.I 184 188 185
Data.init.Sequence.809B.Count0 720 742 727
Data.init.Sequence.809B.Count0.I 724 725 725
Data.init.Sequence.809B.Count0.RE 603 603 603
Data.init.Sequence.809B.Count0.RE.I 594 595 594
Removed
Data.append.Sequence.ExactCount 221 223 222
Data.append.Sequence.UnderestimatedCount 1241 1347 1277
Data.init.Sequence.ExactCount 183 188 185
Data.init.Sequence.UnderestimatedCount 1313 1418 1348

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
DataBenchmarks.o 39764 67526 +69.8% 0.59x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
DictionaryBridgeToObjC_Access 1134 1250 +10.2% 0.91x (?)
Improvement
ObjectiveCBridgeStubFromNSString 1019 943 -7.5% 1.08x (?)
Added
Data.append.Sequence.64kB.Count 4710 5214 4880
Data.append.Sequence.64kB.Count.I 4804 4888 4859
Data.append.Sequence.64kB.Count.RE 30625 31861 31365
Data.append.Sequence.64kB.Count.RE.I 30531 31797 31342
Data.append.Sequence.64kB.Count0 4682 4837 4734
Data.append.Sequence.64kB.Count0.I 4739 4899 4827
Data.append.Sequence.64kB.Count0.RE 29571 30351 29844
Data.append.Sequence.64kB.Count0.RE.I 29748 31265 30429
Data.append.Sequence.809B.Count 5897 6337 6047
Data.append.Sequence.809B.Count.I 5970 6107 6059
Data.append.Sequence.809B.Count.RE 37118 40370 39273
Data.append.Sequence.809B.Count.RE.I 37371 37785 37639
Data.append.Sequence.809B.Count0 5957 6125 6068
Data.append.Sequence.809B.Count0.I 6035 6116 6065
Data.append.Sequence.809B.Count0.RE 37454 38960 38410
Data.append.Sequence.809B.Count0.RE.I 36705 37360 36988
Data.init.Sequence.2047B.Count.I 7334 7652 7443
Data.init.Sequence.2047B.Count0.I 7354 7722 7478
Data.init.Sequence.2049B.Count.I 7329 7514 7450
Data.init.Sequence.2049B.Count0.I 7361 7473 7399
Data.init.Sequence.511B.Count.I 5832 6075 5914
Data.init.Sequence.511B.Count0.I 5736 5841 5795
Data.init.Sequence.513B.Count.I 5781 5816 5795
Data.init.Sequence.513B.Count0.I 5782 5959 5889
Data.init.Sequence.64kB.Count 4780 5063 4940
Data.init.Sequence.64kB.Count.I 4765 4854 4798
Data.init.Sequence.64kB.Count.RE 29770 30437 30025
Data.init.Sequence.64kB.Count.RE.I 32028 32674 32305
Data.init.Sequence.64kB.Count0 4710 4729 4722
Data.init.Sequence.64kB.Count0.I 4761 4984 4898
Data.init.Sequence.64kB.Count0.RE 29615 30314 29994
Data.init.Sequence.64kB.Count0.RE.I 29822 29922 29874
Data.init.Sequence.809B.Count 5957 6180 6034
Data.init.Sequence.809B.Count.I 5954 6083 6011
Data.init.Sequence.809B.Count.RE 36867 37306 37026
Data.init.Sequence.809B.Count.RE.I 36820 37426 37082
Data.init.Sequence.809B.Count0 6143 6247 6194
Data.init.Sequence.809B.Count0.I 5977 6094 6055
Data.init.Sequence.809B.Count0.RE 36995 37194 37062
Data.init.Sequence.809B.Count0.RE.I 37057 37373 37171
Removed
Data.append.Sequence.ExactCount 37130 37508 37269
Data.append.Sequence.UnderestimatedCount 4470 4613 4520
Data.init.Sequence.ExactCount 37368 38281 37690
Data.init.Sequence.UnderestimatedCount 4593 4691 4627
Benchmark Check Report
⚠️Ⓜ️ Data.append.Sequence.64kB.Count0.RE.I has very wide range of memory used between independent, repeated measurements.
Data.append.Sequence.64kB.Count0.RE.I mem_pages [i1, i2]: min=[27, 27] 𝚫=0 R=[38, 0]
⚠️Ⓜ️ Data.append.Sequence.64kB.Count0.I has very wide range of memory used between independent, repeated measurements.
Data.append.Sequence.64kB.Count0.I mem_pages [i1, i2]: min=[27, 27] 𝚫=0 R=[0, 38]
⚠️Ⓜ️ Data.init.Sequence.64kB.Count0 has very wide range of memory used between independent, repeated measurements.
Data.init.Sequence.64kB.Count0 mem_pages [i1, i2]: min=[24, 24] 𝚫=0 R=[0, 37]
⚠️Ⓜ️ Data.append.Sequence.64kB.Count0.RE has very wide range of memory used between independent, repeated measurements.
Data.append.Sequence.64kB.Count0.RE mem_pages [i1, i2]: min=[28, 28] 𝚫=0 R=[38, 0]
⚠️Ⓜ️ Data.init.Sequence.64kB.Count.RE.I has very wide range of memory used between independent, repeated measurements.
Data.init.Sequence.64kB.Count.RE.I mem_pages [i1, i2]: min=[19, 19] 𝚫=0 R=[38, 0]
⚠️Ⓜ️ Data.init.Sequence.64kB.Count0.RE has very wide range of memory used between independent, repeated measurements.
Data.init.Sequence.64kB.Count0.RE mem_pages [i1, i2]: min=[24, 25] 𝚫=1 R=[1, 37]
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@palimondo palimondo merged commit fc817ba into swiftlang:master Feb 20, 2019
@palimondo palimondo deleted the and-dreadfully-distinct branch May 6, 2019 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants