[benchmark] Gather more independent samples for changes #22546

palimondo · 2019-02-12T10:44:27Z

Multimodal benchmarks with significant delta between the modes can report false performance changes when we gather too few independent samples. This increases the minimal number of independent samples for potential performance changes from 5 to 10.

Resolves SR-9907.

Multimodal benchmarks with significant delta between the modes can report false performance changes when we gather too few independent samples. This increases the minimal number of independent samples from 5 to 10. Fix for https://bugs.swift.org/browse/SR-9907

palimondo · 2019-02-12T10:45:40Z

@swift-ci please benchmark

palimondo · 2019-02-12T10:45:50Z

@swift-ci please smoke test

palimondo · 2019-02-12T10:46:17Z

@eeckstein Please review 🙏

swift-ci · 2019-02-12T11:32:26Z

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
DataCreateMediumArray	2780	3040	+9.4%	0.91x (?)

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
DataAppendArray	5100	5700	+11.8%	0.89x (?)

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

palimondo · 2019-02-12T12:44:40Z

Oh man! 🤦‍♂️
Let me see how these other fake changes profile in detail...

palimondo · 2019-02-12T14:07:49Z

@swift-ci please benchmark

palimondo · 2019-02-12T14:31:36Z

DataCreateMediumArray has weakly bi-modal runtime distribution, but nowhere near the DataAppendLargeToLarge severity. I'd say this false change is more of an argument against using minimum as the typical value for a benchmark… and more evidence that somebody should have a hard look at high variance of the new Data implementation's performance even beyond SR-9911.

swift-ci · 2019-02-12T14:54:29Z

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Improvement
DataAppendDataSmallToSmall	4560	4040	-11.4%	1.13x (?)

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Improvement
SortLettersInPlace	554	513	-7.4%	1.08x (?)

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Improvement
DataAppendArray	5600	5200	-7.1%	1.08x (?)

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

palimondo · 2019-02-12T15:20:44Z

Humph… Increasing the number of independent samples seems to get rid of the false changes from DataAppendLargeToLarge, but it also exposes false changes from the rest of the DataBenchmarks… 🤷‍♂️ I've seen the same two benchmarks jump out before (hidden comment on #21848), so it's hard to say if gathering more independent samples increases the chance of false change reports... let's try again.

@swift-ci please benchmark

swift-ci · 2019-02-12T16:07:17Z

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Improvement
DataAppendArray	5600	5100	-8.9%	1.10x (?)

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

eeckstein

I'm fine with this change.

But if we cannot fix some unstable benchmarks, we should mark them as unstable.

palimondo · 2019-02-12T19:49:05Z

The DataAppendArray is jumpy in -Onone because of setup overhead from the fillBuffer function. I'll fix that one in the upcoming Janitor Duty.

palimondo requested a review from eeckstein February 12, 2019 10:44

eeckstein approved these changes Feb 12, 2019

View reviewed changes

palimondo merged commit 61f57a0 into master Feb 12, 2019

palimondo deleted the SR-9907 branch February 12, 2019 20:01

palimondo mannequin mentioned this pull request Feb 12, 2019

[SR-9907] DataAppendDataLargeToLarge benchmark is unstable #52313

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[benchmark] Gather more independent samples for changes #22546

[benchmark] Gather more independent samples for changes #22546

palimondo commented Feb 12, 2019 •

edited

Loading

palimondo commented Feb 12, 2019

palimondo commented Feb 12, 2019

palimondo commented Feb 12, 2019

swift-ci commented Feb 12, 2019

palimondo commented Feb 12, 2019

palimondo commented Feb 12, 2019

palimondo commented Feb 12, 2019 •

edited

Loading

swift-ci commented Feb 12, 2019

palimondo commented Feb 12, 2019

swift-ci commented Feb 12, 2019

eeckstein left a comment

palimondo commented Feb 12, 2019

[benchmark] Gather more independent samples for changes #22546

[benchmark] Gather more independent samples for changes #22546

Conversation

palimondo commented Feb 12, 2019 • edited Loading

palimondo commented Feb 12, 2019

palimondo commented Feb 12, 2019

palimondo commented Feb 12, 2019

swift-ci commented Feb 12, 2019

Performance: -O

Performance: -Onone

palimondo commented Feb 12, 2019

palimondo commented Feb 12, 2019

palimondo commented Feb 12, 2019 • edited Loading

swift-ci commented Feb 12, 2019

Performance: -O

Performance: -Osize

Performance: -Onone

palimondo commented Feb 12, 2019

swift-ci commented Feb 12, 2019

Performance: -Onone

eeckstein left a comment

Choose a reason for hiding this comment

palimondo commented Feb 12, 2019

palimondo commented Feb 12, 2019 •

edited

Loading

palimondo commented Feb 12, 2019 •

edited

Loading