Skip to content

[Benchmarks] Add a whole lot more benchmarks for Data #20396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 7, 2018

Conversation

phausler
Copy link
Contributor

@phausler phausler commented Nov 7, 2018

This adds a number of new benchmarks for Data that test specific common cases of empty, small, medium and large Datas. Additionally this adds new benchmarks for creation as well as conversion from String to Data using Sequence/Collection initializers.

@phausler
Copy link
Contributor Author

phausler commented Nov 7, 2018

@swift-ci please smoke test

@phausler
Copy link
Contributor Author

phausler commented Nov 7, 2018

@swift-ci please smoke benchmark

blackHole(Data([0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6]))
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milseman What other converters do we want to test here? Should we test the Foundation overlay variations as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you. Next release, we'll want to unify them as part of some upcoming string spring cleaning.

Copy link
Contributor

@itaiferber itaiferber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — as long as CI is happy, we should merge.

@swift-ci
Copy link
Contributor

swift-ci commented Nov 7, 2018

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Improvement
ObjectiveCBridgeStubToNSDate2 1538 1364 -11.3% 1.13x (?)
Added
DataAccessBytesLarge 1409027 1488463 1460729
DataAccessBytesMedium 1140 1352 1211
DataAccessBytesSmall 1137 1139 1138
DataAppendBytesLarge 1810983 1813382 1812004
DataAppendBytesMedium 4890 5360 5201
DataAppendBytesSmall 4477 4744 4580
DataCopyBytesLarge 15268 16127 15639
DataCopyBytesMedium 453 472 459
DataCopyBytesSmall 291 293 292
DataCountLarge 1419208 1489729 1464522
DataCountMedium 37 40 38
DataCountSmall 34 35 34
DataCreateEmpty 23600 23819 23685
DataCreateEmptyArray 26934 27026 26980
DataCreateLarge 1433268 1494583 1473405
DataCreateMedium 17312 18078 17605
DataCreateMediumArray 3102 3157 3134
DataCreateSmall 98144 102382 100847
DataCreateSmallArray 27711 27881 27785
DataMutateBytesLarge 1817912 1891248 1866258
DataMutateBytesMedium 4279 4723 4526
DataMutateBytesSmall 4712 5021 4887
DataSetCountLarge 525 527 526
DataSetCountMedium 517 534 523
DataSetCountSmall 508 509 508
DataSubscriptLarge 1393758 1487858 1449201
DataSubscriptMedium 220 222 221
DataSubscriptSmall 220 224 222
DataToStringEmpty 2985 3416 3130
DataToStringMedium 11626 11816 11690
DataToStringSmall 5140 5363 5221
StringToDataEmpty 2590 2641 2607
StringToDataMedium 2680 2739 2713
StringToDataSmall 2599 2646 2615
Removed
DataAccessBytes 1137 1408 1227
DataAppendBytes 5139 5356 5224
DataCopyBytes 531 531 531
DataCount 34 35 34
DataMutateBytes 4332 4482 4383
DataSetCount 517 517 517
DataSubscript 220 222 221

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
DataBenchmarks.o 26476 56388 +113.0% 0.47x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Improvement
CharacterLiteralsLarge 111 100 -9.9% 1.11x
CharacterLiteralsSmall 345 322 -6.7% 1.07x
Added
DataAccessBytesLarge 1477658 1515304 1496091
DataAccessBytesMedium 1154 1288 1199
DataAccessBytesSmall 1152 1152 1152
DataAppendBytesLarge 1818139 1820459 1819406
DataAppendBytesMedium 4869 4950 4897
DataAppendBytesSmall 5087 5136 5104
DataCopyBytesLarge 15565 16309 15882
DataCopyBytesMedium 484 484 484
DataCopyBytesSmall 291 294 292
DataCountLarge 1402564 1460620 1441213
DataCountMedium 37 40 38
DataCountSmall 34 35 34
DataCreateEmpty 23530 23928 23727
DataCreateEmptyArray 27031 27143 27100
DataCreateLarge 1492492 1502613 1499147
DataCreateMedium 17147 17547 17306
DataCreateMediumArray 3100 3156 3119
DataCreateSmall 100609 103595 102597
DataCreateSmallArray 27718 27998 27844
DataMutateBytesLarge 1850114 1903383 1869186
DataMutateBytesMedium 4287 4411 4342
DataMutateBytesSmall 4363 4463 4425
DataSetCountLarge 529 532 530
DataSetCountMedium 517 517 517
DataSetCountSmall 508 509 508
DataSubscriptLarge 1498272 1509822 1504916
DataSubscriptMedium 220 223 221
DataSubscriptSmall 220 223 221
DataToStringEmpty 3192 3424 3269
DataToStringMedium 12146 12181 12169
DataToStringSmall 5763 5901 5813
StringToDataEmpty 2633 2805 2690
StringToDataMedium 2700 2755 2720
StringToDataSmall 2640 2747 2676
Removed
DataAccessBytes 1142 1316 1200
DataAppendBytes 5877 6070 5942
DataCopyBytes 665 665 665
DataCount 37 38 37
DataMutateBytes 4912 5367 5069
DataSetCount 517 517 517
DataSubscript 220 222 221

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
DataBenchmarks.o 21781 51013 +134.2% 0.43x

Performance: -Onone

TEST MIN MAX MEAN MAX_RSS
Added
DataAccessBytesLarge 1407747 1478167 1434889
DataAccessBytesMedium 2351 2496 2400
DataAccessBytesSmall 2370 2473 2423
DataAppendBytesLarge 1823424 1855672 1844661
DataAppendBytesMedium 5077 5239 5144
DataAppendBytesSmall 4877 4948 4902
DataCopyBytesLarge 15465 16268 15957
DataCopyBytesMedium 503 509 505
DataCopyBytesSmall 363 365 364
DataCountLarge 1411637 1474797 1434953
DataCountMedium 223 227 224
DataCountSmall 225 228 226
DataCreateEmpty 24247 24514 24363
DataCreateEmptyArray 60473 60733 60580
DataCreateLarge 1448701 1554400 1508303
DataCreateMedium 18212 18294 18265
DataCreateMediumArray 9367 9467 9403
DataCreateSmall 97285 100790 98485
DataCreateSmallArray 91824 92012 91935
DataMutateBytesLarge 1813483 1900503 1844943
DataMutateBytesMedium 5775 5977 5855
DataMutateBytesSmall 5820 5954 5888
DataSetCountLarge 572 574 573
DataSetCountMedium 535 535 535
DataSetCountSmall 526 563 544
DataSubscriptLarge 1441155 1457993 1450813
DataSubscriptMedium 443 443 443
DataSubscriptSmall 443 448 446
DataToStringEmpty 3829 4169 3984
DataToStringMedium 12289 12418 12370
DataToStringSmall 5772 5845 5799
StringToDataEmpty 3665 3733 3689
StringToDataMedium 3986 4063 4028
StringToDataSmall 3678 3718 3694
Removed
DataAccessBytes 2283 2430 2332
DataAppendBytes 5601 5672 5639
DataCopyBytes 568 568 568
DataCount 222 226 223
DataMutateBytes 5940 6024 5969
DataSetCount 534 534 534
DataSubscript 442 450 445
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@phausler phausler merged commit a66e769 into swiftlang:master Nov 7, 2018
@palimondo
Copy link
Contributor

palimondo commented Dec 20, 2018

Among other things, this has also reintroduced setup overhead to some of the benchmarks I previously cleaned up. All the new Large variants are ~100% overhead of buffer creation. I see they were disabled in #20411, but I really don't think benchmarks that allocate over 1GB of memory are something that belongs to Swift Benchmarking Suite, at all.

Also all the DataCreate measures is the performance of arc4random_buf. I don't see why we need truly random data for these tests… pseudorandom would be just as good, but since we are not doing anything with it, why not just fill it with repeated sequence of bytes like we do in IterateData?

I'll clean this up, but in the future, please do run Benchmark_Driver check locally before committing modified benchmarks. We now have hooked this up to CI, but it only validates newly added benchmarks. Here's what it say on my 2008 MBP:

Benchmark Check Report
Benchmark Check Report
⛔️⏱ DataAccessBytesLarge has setup overhead of 15980062 μs (102.0%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⚠️Ⓜ️ DataAccessBytesLarge has very wide range of memory used between independent, repeated measurements.
DataAccessBytesLarge mem_pages [i1, i2]: min=[258239, 180882] 𝚫=77357 R=[3927, 81283]
⛔️⏱ DataAccessBytesMedium has setup overhead of 16 μs (7.4%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⛔️⏱ DataAppendArray execution took at least 16619 μs.
Decrease the workload of DataAppendArray by a factor of 32 (100), to be less than 1000 μs.
⛔️⏱ DataAppendBytesLarge has setup overhead of 14701060 μs (92.3%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⛔️⏱ DataAppendBytesLarge execution took at least 1231514 μs (excluding the setup overhead).
Decrease the workload of DataAppendBytesLarge by a factor of 2048 (10000), to be less than 1000 μs.
⛔️⏱ DataAppendBytesMedium execution took at least 15679 μs.
Decrease the workload of DataAppendBytesMedium by a factor of 16 (100), to be less than 1000 μs.
⚠️🔤 DataAppendDataLargeToLarge name is composed of 6 words.
Split DataAppendDataLargeToLarge name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️⏱ DataAppendDataLargeToLarge has setup overhead of 27390 μs (13.1%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⛔️⏱ DataAppendDataLargeToLarge execution took at least 181985 μs (excluding the setup overhead).
Decrease the workload of DataAppendDataLargeToLarge by a factor of 256 (1000), to be less than 1000 μs.
⚠️Ⓜ️ DataAppendDataLargeToLarge has very wide range of memory used between independent, repeated measurements.
DataAppendDataLargeToLarge mem_pages [i1, i2]: min=[119, 95] 𝚫=24 R=[18, 29]
⚠️🔤 DataAppendDataLargeToMedium name is composed of 6 words.
Split DataAppendDataLargeToMedium name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️⏱ DataAppendDataLargeToMedium execution took at least 80878 μs.
Decrease the workload of DataAppendDataLargeToMedium by a factor of 128 (100), to be less than 1000 μs.
⚠️Ⓜ️ DataAppendDataLargeToMedium has very wide range of memory used between independent, repeated measurements.
DataAppendDataLargeToMedium mem_pages [i1, i2]: min=[54, 51] 𝚫=3 R=[19, 22]
⚠️🔤 DataAppendDataLargeToSmall name is composed of 6 words.
Split DataAppendDataLargeToSmall name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️⏱ DataAppendDataLargeToSmall execution took at least 76184 μs.
Decrease the workload of DataAppendDataLargeToSmall by a factor of 128 (100), to be less than 1000 μs.
⚠️🔤 DataAppendDataMediumToLarge name is composed of 6 words.
Split DataAppendDataMediumToLarge name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️⏱ DataAppendDataMediumToLarge has setup overhead of 19424 μs (17.8%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⛔️⏱ DataAppendDataMediumToLarge execution took at least 89827 μs (excluding the setup overhead).
Decrease the workload of DataAppendDataMediumToLarge by a factor of 128 (100), to be less than 1000 μs.
⚠️Ⓜ️ DataAppendDataMediumToLarge has very wide range of memory used between independent, repeated measurements.
DataAppendDataMediumToLarge mem_pages [i1, i2]: min=[52, 53] 𝚫=1 R=[24, 23]
⚠️🔤 DataAppendDataMediumToMedium name is composed of 6 words.
Split DataAppendDataMediumToMedium name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️⏱ DataAppendDataMediumToMedium execution took at least 14957 μs.
Decrease the workload of DataAppendDataMediumToMedium by a factor of 16 (100), to be less than 1000 μs.
⚠️🔤 DataAppendDataMediumToSmall name is composed of 6 words.
Split DataAppendDataMediumToSmall name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️⏱ DataAppendDataMediumToSmall execution took at least 14235 μs.
Decrease the workload of DataAppendDataMediumToSmall by a factor of 16 (100), to be less than 1000 μs.
⚠️🔤 DataAppendDataSmallToLarge name is composed of 6 words.
Split DataAppendDataSmallToLarge name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️⏱ DataAppendDataSmallToLarge execution took at least 120114 μs.
Decrease the workload of DataAppendDataSmallToLarge by a factor of 128 (1000), to be less than 1000 μs.
⚠️Ⓜ️ DataAppendDataSmallToLarge has very wide range of memory used between independent, repeated measurements.
DataAppendDataSmallToLarge mem_pages [i1, i2]: min=[74, 61] 𝚫=13 R=[12, 18]
⚠️🔤 DataAppendDataSmallToMedium name is composed of 6 words.
Split DataAppendDataSmallToMedium name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️⏱ DataAppendDataSmallToMedium execution took at least 13885 μs.
Decrease the workload of DataAppendDataSmallToMedium by a factor of 16 (100), to be less than 1000 μs.
⚠️🔤 DataAppendDataSmallToSmall name is composed of 6 words.
Split DataAppendDataSmallToSmall name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⛔️⏱ DataAppendDataSmallToSmall has setup overhead of 1210 μs (9.0%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⛔️⏱ DataAppendDataSmallToSmall execution took at least 12171 μs (excluding the setup overhead).
Decrease the workload of DataAppendDataSmallToSmall by a factor of 16 (100), to be less than 1000 μs.
⛔️⏱ DataAppendSequence execution took at least 65873 μs.
Decrease the workload of DataAppendSequence by a factor of 128 (100), to be less than 1000 μs.
⛔️⏱ DataCopyBytesLarge execution took at least 81330 μs.
Decrease the workload of DataCopyBytesLarge by a factor of 128 (100), to be less than 1000 μs.
⚠️Ⓜ️ DataCopyBytesLarge has very wide range of memory used between independent, repeated measurements.
DataCopyBytesLarge mem_pages [i1, i2]: min=[34, 34] 𝚫=0 R=[13, 24]
⚠️ DataCopyBytesMedium execution took at least 1352 μs.
Decrease the workload of DataCopyBytesMedium by a factor of 2 (10), to be less than 1000 μs.
⛔️⏱ DataCountLarge has setup overhead of 15325398 μs (99.0%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⛔️⏱ DataCountLarge execution took at least 160287 μs (excluding the setup overhead).
Decrease the workload of DataCountLarge by a factor of 256 (1000), to be less than 1000 μs.
⛔️⏱ DataCountMedium has setup overhead of 18 μs (22.8%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⛔️⏱ DataCountSmall has setup overhead of 4 μs (7.5%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⛔️⏱ DataCreateEmptyArray execution took at least 14273 μs.
Decrease the workload of DataCreateEmptyArray by a factor of 16 (100), to be less than 1000 μs.
⛔️⏱ DataCreateLarge has setup overhead of 14540010 μs (95.3%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⛔️⏱ DataCreateLarge execution took at least 714406 μs (excluding the setup overhead).
Decrease the workload of DataCreateLarge by a factor of 1024 (1000), to be less than 1000 μs.
⛔️⏱ DataCreateMedium execution took at least 157904 μs.
Decrease the workload of DataCreateMedium by a factor of 256 (1000), to be less than 1000 μs.
⚠️ DataCreateMediumArray execution took at least 9136 μs.
Decrease the workload of DataCreateMediumArray by a factor of 16 (10), to be less than 1000 μs.
⛔️⏱ DataCreateSmall execution took at least 223690 μs.
Decrease the workload of DataCreateSmall by a factor of 256 (1000), to be less than 1000 μs.
⛔️⏱ DataCreateSmallArray execution took at least 16932 μs.
Decrease the workload of DataCreateSmallArray by a factor of 32 (100), to be less than 1000 μs.
⛔️⏱ DataMutateBytesLarge has setup overhead of 18031434 μs (100.5%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⛔️⏱ DataMutateBytesMedium execution took at least 12779 μs.
Decrease the workload of DataMutateBytesMedium by a factor of 16 (100), to be less than 1000 μs.
⚠️ DataMutateBytesSmall execution took at least 3367 μs.
Decrease the workload of DataMutateBytesSmall by a factor of 4 (10), to be less than 1000 μs.
⛔️⏱ DataReplaceLarge execution took at least 84975 μs.
Decrease the workload of DataReplaceLarge by a factor of 128 (100), to be less than 1000 μs.
⛔️⏱ DataReplaceLargeBuffer execution took at least 186047 μs.
Decrease the workload of DataReplaceLargeBuffer by a factor of 256 (1000), to be less than 1000 μs.
⚠️Ⓜ️ DataReplaceLargeBuffer has very wide range of memory used between independent, repeated measurements.
DataReplaceLargeBuffer mem_pages [i1, i2]: min=[68, 61] 𝚫=7 R=[13, 26]
⛔️⏱ DataReplaceMedium execution took at least 16692 μs.
Decrease the workload of DataReplaceMedium by a factor of 32 (100), to be less than 1000 μs.
⛔️⏱ DataReplaceMediumBuffer execution took at least 33516 μs.
Decrease the workload of DataReplaceMediumBuffer by a factor of 64 (100), to be less than 1000 μs.
⛔️⏱ DataReplaceSmall execution took at least 11302 μs.
Decrease the workload of DataReplaceSmall by a factor of 16 (100), to be less than 1000 μs.
⛔️⏱ DataReplaceSmallBuffer execution took at least 27040 μs.
Decrease the workload of DataReplaceSmallBuffer by a factor of 32 (100), to be less than 1000 μs.
⛔️⏱ DataReset execution took at least 10305 μs.
Decrease the workload of DataReset by a factor of 16 (100), to be less than 1000 μs.
⛔️⏱ DataSetCountLarge has setup overhead of 498 μs (17.3%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⚠️ DataSetCountLarge execution took at least 2377 μs (excluding the setup overhead).
Decrease the workload of DataSetCountLarge by a factor of 4 (10), to be less than 1000 μs.
⚠️ DataSetCountMedium execution took at least 2477 μs.
Decrease the workload of DataSetCountMedium by a factor of 4 (10), to be less than 1000 μs.
⛔️⏱ DataSubscriptLarge has setup overhead of 14769664 μs (97.1%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⛔️⏱ DataSubscriptLarge execution took at least 443749 μs (excluding the setup overhead).
Decrease the workload of DataSubscriptLarge by a factor of 512 (1000), to be less than 1000 μs.
⛔️⏱ DataSubscriptMedium has setup overhead of 16 μs (10.0%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.
⛔️⏱ DataToStringMedium has setup overhead of 44 μs (5.8%).
Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.

@phausler
Copy link
Contributor Author

Large variants are useful in the regards that it takes a completely different code path and has different performance characteristics. I understand we should not do that for every test but it still should not regress imho.

The arc4random is being used to ensure the kernel does not just hand out a non faulting page of memory. Just being an allocation is not enough to properly identify used cases. It must be written to: and patterns like repeated are a completely different code path in the sequence initialization.

@phausler
Copy link
Contributor Author

If we want to transition to a different random source that is fine in my book. It does not need to be arc4 but it should be something portable to Linux is all.

@palimondo
Copy link
Contributor

palimondo commented Dec 20, 2018

Can you clarify this part?

It must be written to: and patterns like repeated are a completely different code path in the sequence initialization.

To the best of my understanding, even the .veryLarge case just calls sampleData(size: 1024 * 1024 * 1024 + 128) which looks like this:

func sampleData(size: Int) -> Data {
    var data = Data(count: size)
    data.withUnsafeMutableBytes { getRandomBuf(baseAddress: $0, count: size) }
    return data
}

How is it different initialization pattern than this from IterateData?

  var data = Data(count: 16 * 1024)
  let n = data.count
  data.withUnsafeMutableBytes { (ptr: UnsafeMutablePointer<UInt8>) -> () in
    for i in 0..<n {
      ptr[i] = UInt8(i % 23)
    }
  }

I think that it is fully equivalent from the perspective of Data, which hands an UnsafeMutablePointer to the closure that fills it to capacity. As long as all the content is written to, it should look the same from kernel's perspective, too. Why do we need random data here?

@phausler
Copy link
Contributor Author

Oh that differential is fine, I was just saying that Repeated (the sequence type) would behave differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants