[benchmark] Data.[init,append].Sequence various sizes #21848

palimondo · 2019-01-14T21:08:34Z

Follow-up to #21766 for benchmarking the refinement of the new Data implementation in #21754.

This PR adds thorough performance coverage of the init and append methods for Sequences. During the discussion in #21754 (comment), the use of stack allocated buffer for sequences shorter than 2kB (_withStackOrHeapBuffer) has been mentioned as potential reason for the performance differences we were seeing there. This adds benchmarks to validate that hypothesis.

During experimentation with these benchmarks I've learned of my previous mistake: ExactCount vs. UnderestimatedCount wasn't actually an apples-to-apples comparison. Though functionally equivalent, the repeatElementSeq wasn't compiled and optimized to same code as repeatElement. The slowdown we were seeing could be caused by the difference in sequence implementation. My bad! 🙇‍♂️

To rectify this, I have first created Bytes that produces simple increasing byte sequence and depending on the exact parameter reports either exact count or 0 as its underestimatedCount. This sequence is very simple I have worried it would be inlinable even when other more complex sequences were not. Therefore I've created Count0 wrapper that always reports underestimatedCount = 0 and can be used to erase the exact from other sequences. Using this I've added a second RE benchmark variant that uses repeatElement, as was originally the case with the DataAppendSequence.

To thoroughly explore the problem space, I have created another 2 variants from all tests: one using shared generic non-inlinable test method and another with sequence created directly in the same block, which theoretically allows for the sequence to be inlined into the Data's init or append methods — denoted with .I suffix for inlinable.

I've also added sequences that test around the edges of the two possible thresholds (511B/513B, 2047B/2049B) for the stack allocated / heap allocated buffer and in my local tests it looks like the complicated implementation is bringing no measurable advantage. I suspect that the complexity of _withStackOrHeapBuffer is the reason for the slowdown between Count and Count0.

Notes about the benchmark results:

The the Bytes-based as well repeatElement-based (RE) perform similarly.
It looks like sequence inlining is not happening.

In summary:

Existing sequence benchmarks were collected into 809B group (using *100 multiplier).
New variants 511B/513B and 2047B/2049B test the advantage of stack allocated buffer.
Benchmarks for longer 64kB sequence (multiplier of 1) were added.
Count is a test group where sequences report their full length in underestimatedCount.
In the Count0 group underestimatedCount always returns 0.
By default test are using Bytes sequence.
Variants marked with .RE are using repeatElement sequence.
In variants marked with .I, the sequence can be potentially inlined into Data's methods.

The scaling is derived from the presumption that 809B creates/appends ~80kB of data, so the new 64kB group should be faster but roughly in the same ball park.

palimondo · 2019-01-15T19:36:08Z

@swift-ci please benchmark

palimondo · 2019-01-15T21:05:44Z

@swift-ci please benchmark

palimondo · 2019-01-15T21:16:01Z

@atrick @eeckstein @itaiferber @phausler Please review.

Notes about the benchmark results:

The the Bytes-based as well repeatElement-based (RE) perform similarly.
It looks like sequence inlining is not happening.

I've pushed another update (and triggered a benchmark), where I've added Array.init/append.Sequence.64kB benchmarks for direct comparison. I've also added sequences that test around the edges of the two possible thresholds (511B/513B, 2047B/2049B) for the stack allocated / heap allocated buffer and in my local tests it looks like the complicated implementation is bringing no measurable advantage. I suspect that the complexity of _withStackOrHeapBuffer is the reason for the slowdown between Count and Count0. Let me know what you think...

palimondo · 2019-01-21T20:01:04Z

Ping @atrick @eeckstein @itaiferber @phausler

phausler

Looks reasonable to me for measuring that particular set of cases. I honestly think more coverage is better so I don’t see a need for paring this down any.

palimondo · 2019-01-21T20:21:59Z

Umm… I mean… at least the Array benchmarks are not really part of this. They're here just to illustrate the performance gap we should probably close or at least reasonably justify before considering the implementation good enough to ship.

palimondo · 2019-02-19T19:23:34Z

@swift-ci please benchmark

palimondo · 2019-02-19T19:23:55Z

@swift-ci please smoke test

palimondo · 2019-02-19T19:28:59Z

@phausler I'm going to merge this based on your previous approval once the checks pass, if you didn't change your mind about the multitude of these benchmarks.

swift-ci · 2019-02-19T19:48:56Z

Build failed before running benchmark.

palimondo · 2019-02-19T19:53:21Z

@swift-ci please benchmark

swift-ci · 2019-02-19T20:02:36Z

!!! Couldn't read commit file !!!

palimondo · 2019-02-19T21:42:07Z

@swift-ci please benchmark

swift-ci · 2019-02-19T23:28:07Z

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
DataCreateEmpty	170	200	+17.6%	0.85x
DataCountSmall	22	25	+13.6%	0.88x
DataCountMedium	28	31	+10.7%	0.90x (?)
Improvement
DataSubscriptSmall	31	28	-9.7%	1.11x
SortLettersInPlace	554	507	-8.5%	1.09x (?)
Added
Data.append.Sequence.64kB.Count	60	61	60	—
Data.append.Sequence.64kB.Count.I	60	62	61	—
Data.append.Sequence.64kB.Count.RE	79	82	80	—
Data.append.Sequence.64kB.Count.RE.I	79	82	80	—
Data.append.Sequence.64kB.Count0	374	374	374	—
Data.append.Sequence.64kB.Count0.I	373	374	374	—
Data.append.Sequence.64kB.Count0.RE	379	379	379	—
Data.append.Sequence.64kB.Count0.RE.I	380	381	380	—
Data.append.Sequence.809B.Count	178	188	181	—
Data.append.Sequence.809B.Count.I	184	188	185	—
Data.append.Sequence.809B.Count.RE	209	225	215	—
Data.append.Sequence.809B.Count.RE.I	204	209	206	—
Data.append.Sequence.809B.Count0	676	676	676	—
Data.append.Sequence.809B.Count0.I	670	671	671	—
Data.append.Sequence.809B.Count0.RE	560	561	560	—
Data.append.Sequence.809B.Count0.RE.I	559	560	559	—
Data.init.Sequence.2047B.Count.I	136	140	137	—
Data.init.Sequence.2047B.Count0.I	698	699	698	—
Data.init.Sequence.2049B.Count.I	135	138	136	—
Data.init.Sequence.2049B.Count0.I	696	698	697	—
Data.init.Sequence.511B.Count.I	191	195	192	—
Data.init.Sequence.511B.Count0.I	687	688	687	—
Data.init.Sequence.513B.Count.I	192	197	194	—
Data.init.Sequence.513B.Count0.I	695	762	718	—
Data.init.Sequence.64kB.Count	59	62	60	—
Data.init.Sequence.64kB.Count.I	59	62	60	—
Data.init.Sequence.64kB.Count.RE	78	82	79	—
Data.init.Sequence.64kB.Count.RE.I	78	81	79	—
Data.init.Sequence.64kB.Count0	376	377	376	—
Data.init.Sequence.64kB.Count0.I	375	377	376	—
Data.init.Sequence.64kB.Count0.RE	381	382	382	—
Data.init.Sequence.64kB.Count0.RE.I	381	382	381	—
Data.init.Sequence.809B.Count	154	157	155	—
Data.init.Sequence.809B.Count.I	154	157	155	—
Data.init.Sequence.809B.Count.RE	180	185	182	—
Data.init.Sequence.809B.Count.RE.I	181	185	182	—
Data.init.Sequence.809B.Count0	642	643	642	—
Data.init.Sequence.809B.Count0.I	639	639	639	—
Data.init.Sequence.809B.Count0.RE	637	637	637	—
Data.init.Sequence.809B.Count0.RE.I	633	633	633	—
Removed
Data.append.Sequence.ExactCount	199	199	199	—
Data.append.Sequence.UnderestimatedCount	1203	1302	1236	—
Data.init.Sequence.ExactCount	174	178	175	—
Data.init.Sequence.UnderestimatedCount	1277	1396	1317	—

Code size: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
DataBenchmarks.o	52228	84494	+61.8%	0.62x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Improvement
DataSubscriptSmall	31	25	-19.4%	1.24x
DataCreateEmpty	200	170	-15.0%	1.18x
DataCountSmall	28	25	-10.7%	1.12x
DataCopyBytesSmall	134	122	-9.0%	1.10x (?)
DataCountMedium	34	31	-8.8%	1.10x (?)
Data.hash.Empty	74	68	-8.1%	1.09x (?)
Added
Data.append.Sequence.64kB.Count	60	63	61	—
Data.append.Sequence.64kB.Count.I	60	63	61	—
Data.append.Sequence.64kB.Count.RE	79	82	80	—
Data.append.Sequence.64kB.Count.RE.I	79	81	80	—
Data.append.Sequence.64kB.Count0	339	342	340	—
Data.append.Sequence.64kB.Count0.I	337	342	339	—
Data.append.Sequence.64kB.Count0.RE	346	350	347	—
Data.append.Sequence.64kB.Count0.RE.I	346	350	347	—
Data.append.Sequence.809B.Count	175	179	176	—
Data.append.Sequence.809B.Count.I	173	176	174	—
Data.append.Sequence.809B.Count.RE	206	211	208	—
Data.append.Sequence.809B.Count.RE.I	207	213	209	—
Data.append.Sequence.809B.Count0	514	534	521	—
Data.append.Sequence.809B.Count0.I	511	511	511	—
Data.append.Sequence.809B.Count0.RE	515	515	515	—
Data.append.Sequence.809B.Count0.RE.I	522	522	522	—
Data.init.Sequence.2047B.Count.I	133	137	134	—
Data.init.Sequence.2047B.Count0.I	636	671	648	—
Data.init.Sequence.2049B.Count.I	134	137	135	—
Data.init.Sequence.2049B.Count0.I	629	630	629	—
Data.init.Sequence.511B.Count.I	188	193	190	—
Data.init.Sequence.511B.Count0.I	642	643	643	—
Data.init.Sequence.513B.Count.I	189	192	190	—
Data.init.Sequence.513B.Count0.I	756	756	756	—
Data.init.Sequence.64kB.Count	59	62	60	—
Data.init.Sequence.64kB.Count.I	61	64	62	—
Data.init.Sequence.64kB.Count.RE	78	83	80	—
Data.init.Sequence.64kB.Count.RE.I	78	80	79	—
Data.init.Sequence.64kB.Count0	340	341	341	—
Data.init.Sequence.64kB.Count0.I	340	341	340	—
Data.init.Sequence.64kB.Count0.RE	348	352	349	—
Data.init.Sequence.64kB.Count0.RE.I	348	350	349	—
Data.init.Sequence.809B.Count	153	156	154	—
Data.init.Sequence.809B.Count.I	154	157	155	—
Data.init.Sequence.809B.Count.RE	183	188	185	—
Data.init.Sequence.809B.Count.RE.I	184	188	185	—
Data.init.Sequence.809B.Count0	720	742	727	—
Data.init.Sequence.809B.Count0.I	724	725	725	—
Data.init.Sequence.809B.Count0.RE	603	603	603	—
Data.init.Sequence.809B.Count0.RE.I	594	595	594	—
Removed
Data.append.Sequence.ExactCount	221	223	222	—
Data.append.Sequence.UnderestimatedCount	1241	1347	1277	—
Data.init.Sequence.ExactCount	183	188	185	—
Data.init.Sequence.UnderestimatedCount	1313	1418	1348	—

Code size: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
DataBenchmarks.o	39764	67526	+69.8%	0.59x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
DictionaryBridgeToObjC_Access	1134	1250	+10.2%	0.91x (?)
Improvement
ObjectiveCBridgeStubFromNSString	1019	943	-7.5%	1.08x (?)
Added
Data.append.Sequence.64kB.Count	4710	5214	4880	—
Data.append.Sequence.64kB.Count.I	4804	4888	4859	—
Data.append.Sequence.64kB.Count.RE	30625	31861	31365	—
Data.append.Sequence.64kB.Count.RE.I	30531	31797	31342	—
Data.append.Sequence.64kB.Count0	4682	4837	4734	—
Data.append.Sequence.64kB.Count0.I	4739	4899	4827	—
Data.append.Sequence.64kB.Count0.RE	29571	30351	29844	—
Data.append.Sequence.64kB.Count0.RE.I	29748	31265	30429	—
Data.append.Sequence.809B.Count	5897	6337	6047	—
Data.append.Sequence.809B.Count.I	5970	6107	6059	—
Data.append.Sequence.809B.Count.RE	37118	40370	39273	—
Data.append.Sequence.809B.Count.RE.I	37371	37785	37639	—
Data.append.Sequence.809B.Count0	5957	6125	6068	—
Data.append.Sequence.809B.Count0.I	6035	6116	6065	—
Data.append.Sequence.809B.Count0.RE	37454	38960	38410	—
Data.append.Sequence.809B.Count0.RE.I	36705	37360	36988	—
Data.init.Sequence.2047B.Count.I	7334	7652	7443	—
Data.init.Sequence.2047B.Count0.I	7354	7722	7478	—
Data.init.Sequence.2049B.Count.I	7329	7514	7450	—
Data.init.Sequence.2049B.Count0.I	7361	7473	7399	—
Data.init.Sequence.511B.Count.I	5832	6075	5914	—
Data.init.Sequence.511B.Count0.I	5736	5841	5795	—
Data.init.Sequence.513B.Count.I	5781	5816	5795	—
Data.init.Sequence.513B.Count0.I	5782	5959	5889	—
Data.init.Sequence.64kB.Count	4780	5063	4940	—
Data.init.Sequence.64kB.Count.I	4765	4854	4798	—
Data.init.Sequence.64kB.Count.RE	29770	30437	30025	—
Data.init.Sequence.64kB.Count.RE.I	32028	32674	32305	—
Data.init.Sequence.64kB.Count0	4710	4729	4722	—
Data.init.Sequence.64kB.Count0.I	4761	4984	4898	—
Data.init.Sequence.64kB.Count0.RE	29615	30314	29994	—
Data.init.Sequence.64kB.Count0.RE.I	29822	29922	29874	—
Data.init.Sequence.809B.Count	5957	6180	6034	—
Data.init.Sequence.809B.Count.I	5954	6083	6011	—
Data.init.Sequence.809B.Count.RE	36867	37306	37026	—
Data.init.Sequence.809B.Count.RE.I	36820	37426	37082	—
Data.init.Sequence.809B.Count0	6143	6247	6194	—
Data.init.Sequence.809B.Count0.I	5977	6094	6055	—
Data.init.Sequence.809B.Count0.RE	36995	37194	37062	—
Data.init.Sequence.809B.Count0.RE.I	37057	37373	37171	—
Removed
Data.append.Sequence.ExactCount	37130	37508	37269	—
Data.append.Sequence.UnderestimatedCount	4470	4613	4520	—
Data.init.Sequence.ExactCount	37368	38281	37690	—
Data.init.Sequence.UnderestimatedCount	4593	4691	4627	—

✅	Benchmark Check Report
⚠️Ⓜ️	`Data.append.Sequence.64kB.Count0.RE.I` has very wide range of memory used between independent, repeated measurements. _{Data.append.Sequence.64kB.Count0.RE.I mem_pages [i1, i2]: min=[27, 27] 𝚫=0 R=[38, 0]}
⚠️Ⓜ️	`Data.append.Sequence.64kB.Count0.I` has very wide range of memory used between independent, repeated measurements. _{Data.append.Sequence.64kB.Count0.I mem_pages [i1, i2]: min=[27, 27] 𝚫=0 R=[0, 38]}
⚠️Ⓜ️	`Data.init.Sequence.64kB.Count0` has very wide range of memory used between independent, repeated measurements. _{Data.init.Sequence.64kB.Count0 mem_pages [i1, i2]: min=[24, 24] 𝚫=0 R=[0, 37]}
⚠️Ⓜ️	`Data.append.Sequence.64kB.Count0.RE` has very wide range of memory used between independent, repeated measurements. _{Data.append.Sequence.64kB.Count0.RE mem_pages [i1, i2]: min=[28, 28] 𝚫=0 R=[38, 0]}
⚠️Ⓜ️	`Data.init.Sequence.64kB.Count.RE.I` has very wide range of memory used between independent, repeated measurements. _{Data.init.Sequence.64kB.Count.RE.I mem_pages [i1, i2]: min=[19, 19] 𝚫=0 R=[38, 0]}
⚠️Ⓜ️	`Data.init.Sequence.64kB.Count0.RE` has very wide range of memory used between independent, repeated measurements. _{Data.init.Sequence.64kB.Count0.RE mem_pages [i1, i2]: min=[24, 25] 𝚫=1 R=[1, 37]}

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

palimondo requested review from eeckstein, itaiferber, phausler and atrick January 14, 2019 21:10

palimondo force-pushed the and-dreadfully-distinct branch from 31f0bd5 to cca8382 Compare January 14, 2019 21:29

This comment has been minimized.

Sign in to view

palimondo force-pushed the and-dreadfully-distinct branch 2 times, most recently from 2e66d22 to f4dee7e Compare January 15, 2019 08:29

This comment has been minimized.

Sign in to view

palimondo mentioned this pull request Jan 15, 2019

Data Inlinability Refinements #21754

Merged

palimondo force-pushed the and-dreadfully-distinct branch from f4dee7e to 61e85ce Compare January 15, 2019 17:59

swiftlang deleted a comment from swift-ci Jan 15, 2019

palimondo force-pushed the and-dreadfully-distinct branch from 61e85ce to 8b7a594 Compare January 15, 2019 19:26

This comment has been minimized.

Sign in to view

[benchmark] Data.[init,append].Sequence.[809B,64kB]

626878e

palimondo force-pushed the and-dreadfully-distinct branch from 8b7a594 to 626878e Compare January 15, 2019 21:05

This comment has been minimized.

Sign in to view

phausler approved these changes Jan 21, 2019

View reviewed changes

palimondo mentioned this pull request Feb 12, 2019

[benchmark] Gather more independent samples for changes #22546

Merged

[benchmark] Remove Array.[init/append] & dead code

9919bc7

palimondo changed the title ~~[WIP][benchmark] Data.[init,append].Sequence.[809B,64kB]~~ [benchmark] Data.[init,append].Sequence various sizes Feb 19, 2019

Merge branch 'master' into and-dreadfully-distinct

d366105

palimondo merged commit fc817ba into swiftlang:master Feb 20, 2019

palimondo deleted the and-dreadfully-distinct branch May 6, 2019 09:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[benchmark] Data.[init,append].Sequence various sizes #21848

[benchmark] Data.[init,append].Sequence various sizes #21848

palimondo commented Jan 14, 2019 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

palimondo commented Jan 15, 2019

This comment has been minimized.

palimondo commented Jan 15, 2019

palimondo commented Jan 15, 2019 •

edited

Loading

This comment has been minimized.

palimondo commented Jan 21, 2019

phausler left a comment

palimondo commented Jan 21, 2019 •

edited

Loading

palimondo commented Feb 19, 2019

palimondo commented Feb 19, 2019

palimondo commented Feb 19, 2019

swift-ci commented Feb 19, 2019

palimondo commented Feb 19, 2019

swift-ci commented Feb 19, 2019

palimondo commented Feb 19, 2019

swift-ci commented Feb 19, 2019

[benchmark] Data.[init,append].Sequence various sizes #21848

[benchmark] Data.[init,append].Sequence various sizes #21848

Conversation

palimondo commented Jan 14, 2019 • edited Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

palimondo commented Jan 15, 2019

This comment has been minimized.

palimondo commented Jan 15, 2019

palimondo commented Jan 15, 2019 • edited Loading

This comment has been minimized.

palimondo commented Jan 21, 2019

phausler left a comment

Choose a reason for hiding this comment

palimondo commented Jan 21, 2019 • edited Loading

palimondo commented Feb 19, 2019

palimondo commented Feb 19, 2019

palimondo commented Feb 19, 2019

swift-ci commented Feb 19, 2019

palimondo commented Feb 19, 2019

swift-ci commented Feb 19, 2019

palimondo commented Feb 19, 2019

swift-ci commented Feb 19, 2019

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

palimondo commented Jan 14, 2019 •

edited

Loading

palimondo commented Jan 15, 2019 •

edited

Loading

palimondo commented Jan 21, 2019 •

edited

Loading