Skip to content

[benchmark] Gather more independent samples for changes #22546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 12, 2019
Merged

Conversation

palimondo
Copy link
Contributor

@palimondo palimondo commented Feb 12, 2019

Multimodal benchmarks with significant delta between the modes can report false performance changes when we gather too few independent samples. This increases the minimal number of independent samples for potential performance changes from 5 to 10.

Resolves SR-9907.

Multimodal benchmarks with significant delta between the modes can report false performance changes when we gather too few independent samples. This increases the minimal number of independent samples from 5 to 10.
Fix for https://bugs.swift.org/browse/SR-9907
@palimondo palimondo requested a review from eeckstein February 12, 2019 10:44
@palimondo
Copy link
Contributor Author

@swift-ci please benchmark

@palimondo
Copy link
Contributor Author

@swift-ci please smoke test

@palimondo
Copy link
Contributor Author

@eeckstein Please review 🙏

@swift-ci
Copy link
Contributor

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
DataCreateMediumArray 2780 3040 +9.4% 0.91x (?)

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
DataAppendArray 5100 5700 +11.8% 0.89x (?)
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@palimondo
Copy link
Contributor Author

Oh man! 🤦‍♂️
Let me see how these other fake changes profile in detail...

@palimondo
Copy link
Contributor Author

@swift-ci please benchmark

@palimondo
Copy link
Contributor Author

palimondo commented Feb 12, 2019

DataCreateMediumArray has weakly bi-modal runtime distribution, but nowhere near the DataAppendLargeToLarge severity. I'd say this false change is more of an argument against using minimum as the typical value for a benchmark… and more evidence that somebody should have a hard look at high variance of the new Data implementation's performance even beyond SR-9911.

@swift-ci
Copy link
Contributor

Performance: -O

TEST OLD NEW DELTA RATIO
Improvement
DataAppendDataSmallToSmall 4560 4040 -11.4% 1.13x (?)

Performance: -Osize

TEST OLD NEW DELTA RATIO
Improvement
SortLettersInPlace 554 513 -7.4% 1.08x (?)

Performance: -Onone

TEST OLD NEW DELTA RATIO
Improvement
DataAppendArray 5600 5200 -7.1% 1.08x (?)
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@palimondo
Copy link
Contributor Author

Humph… Increasing the number of independent samples seems to get rid of the false changes from DataAppendLargeToLarge, but it also exposes false changes from the rest of the DataBenchmarks… 🤷‍♂️ I've seen the same two benchmarks jump out before (hidden comment on #21848), so it's hard to say if gathering more independent samples increases the chance of false change reports... let's try again.

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

Performance: -Onone

TEST OLD NEW DELTA RATIO
Improvement
DataAppendArray 5600 5100 -8.9% 1.10x (?)
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

Copy link
Contributor

@eeckstein eeckstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this change.

But if we cannot fix some unstable benchmarks, we should mark them as unstable.

@palimondo
Copy link
Contributor Author

The DataAppendArray is jumpy in -Onone because of setup overhead from the fillBuffer function. I'll fix that one in the upcoming Janitor Duty.

@palimondo palimondo merged commit 61f57a0 into master Feb 12, 2019
@palimondo palimondo deleted the SR-9907 branch February 12, 2019 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants