Trimming add-across (Neoverse N1) #49

lemire · 2024-06-25T19:47:48Z

By conditionally skipping the 8-bit add-across instruction due to 4-byte characters, we get better performance on Neoverse N1 (ARM CPUs with weak SIMD performance).

The 16-byte addv instruction is one the slowest on N1.

It gets much better on N2.

And it is already good on V1:

Method	FileName	Mean	Error	StdDev	Speed (GB/s)
SIMDUtf8ValidationRealDataMAIN	data/twitter.json	89.26 us	2.490 us	0.136 us	7.08
SIMDUtf8ValidationRealData	data/twitter.json	90.52 us	1.111 us	0.061 us	6.98
DotnetRuntimeUtf8ValidationRealData	data/twitter.json	110.96 us	3.281 us	0.180 us	5.69

Method	FileName	Mean	Error	StdDev	Speed (GB/s)
SIMDUtf8ValidationRealDataMAIN	data/Arabic-Lipsum.utf8.txt	40.945 us	0.2172 us	0.0119 us	1.99
SIMDUtf8ValidationRealDataMAIN	data/Chinese-Lipsum.utf8.txt	36.299 us	2.9133 us	0.1597 us	1.92
SIMDUtf8ValidationRealDataMAIN	data/Emoji-Lipsum.utf8.txt	33.969 us	1.5396 us	0.0844 us	1.93
SIMDUtf8ValidationRealDataMAIN	data/Hebrew-Lipsum.utf8.txt	33.613 us	0.0480 us	0.0026 us	1.98
SIMDUtf8ValidationRealDataMAIN	data/Hindi-Lipsum.utf8.txt	47.278 us	0.5625 us	0.0308 us	1.86
SIMDUtf8ValidationRealDataMAIN	data/Japanese-Lipsum.utf8.txt	35.142 us	0.0574 us	0.0031 us	1.93
SIMDUtf8ValidationRealDataMAIN	data/Korean-Lipsum.utf8.txt	34.358 us	1.4380 us	0.0788 us	1.94
SIMDUtf8ValidationRealDataMAIN	data/Latin-Lipsum.utf8.txt	3.682 us	0.0114 us	0.0006 us	23.61
SIMDUtf8ValidationRealDataMAIN	data/Russian-Lipsum.utf8.txt	55.565 us	2.5333 us	0.1389 us	1.89
SIMDUtf8ValidationRealData	data/Arabic-Lipsum.utf8.txt	37.712 us	0.1409 us	0.0077 us	2.17
SIMDUtf8ValidationRealData	data/Chinese-Lipsum.utf8.txt	33.545 us	0.9888 us	0.0542 us	2.08
SIMDUtf8ValidationRealData	data/Emoji-Lipsum.utf8.txt	37.238 us	0.1203 us	0.0066 us	1.76
SIMDUtf8ValidationRealData	data/Hebrew-Lipsum.utf8.txt	33.957 us	0.0790 us	0.0043 us	1.96
SIMDUtf8ValidationRealData	data/Hindi-Lipsum.utf8.txt	44.110 us	0.0708 us	0.0039 us	1.99
SIMDUtf8ValidationRealData	data/Japanese-Lipsum.utf8.txt	32.090 us	1.5966 us	0.0875 us	2.11
SIMDUtf8ValidationRealData	data/Korean-Lipsum.utf8.txt	30.836 us	0.3372 us	0.0185 us	2.16
SIMDUtf8ValidationRealData	data/Latin-Lipsum.utf8.txt	3.687 us	0.1805 us	0.0099 us	23.58
SIMDUtf8ValidationRealData	data/Russian-Lipsum.utf8.txt	50.254 us	3.7907 us	0.2078 us	2.08
DotnetRuntimeUtf8ValidationRealData	data/Arabic-Lipsum.utf8.txt	86.886 us	11.8837 us	0.6514 us	.94
DotnetRuntimeUtf8ValidationRealData	data/Chinese-Lipsum.utf8.txt	39.431 us	0.0155 us	0.0008 us	1.77
DotnetRuntimeUtf8ValidationRealData	data/Emoji-Lipsum.utf8.txt	96.482 us	0.8070 us	0.0442 us	.68
DotnetRuntimeUtf8ValidationRealData	data/Hebrew-Lipsum.utf8.txt	72.209 us	2.7361 us	0.1500 us	.92
DotnetRuntimeUtf8ValidationRealData	data/Hindi-Lipsum.utf8.txt	88.090 us	5.1335 us	0.2814 us	1.00
DotnetRuntimeUtf8ValidationRealData	data/Japanese-Lipsum.utf8.txt	41.100 us	0.0615 us	0.0034 us	1.65
DotnetRuntimeUtf8ValidationRealData	data/Korean-Lipsum.utf8.txt	69.868 us	0.1748 us	0.0096 us	.95
DotnetRuntimeUtf8ValidationRealData	data/Latin-Lipsum.utf8.txt	6.929 us	0.0154 us	0.0008 us	12.55
DotnetRuntimeUtf8ValidationRealData	data/Russian-Lipsum.utf8.txt	145.420 us	4.4449 us	0.2436 us	.72

Method	FileName	Mean	Error	StdDev	Speed (GB/s)
SIMDUtf8ValidationRealDataMAIN	data/Bogatov1069.utf8.txt	625.07 ns	74.340 ns	4.075 ns	1.71
SIMDUtf8ValidationRealDataMAIN	data/Bogatov136.utf8.txt	103.47 ns	2.362 ns	0.129 ns	1.31
SIMDUtf8ValidationRealDataMAIN	data/Bogatov286.utf8.txt	205.76 ns	4.560 ns	0.250 ns	1.39
SIMDUtf8ValidationRealDataMAIN	data/Bogatov527.utf8.txt	336.33 ns	123.567 ns	6.773 ns	1.57
SIMDUtf8ValidationRealData	data/Bogatov1069.utf8.txt	576.68 ns	89.757 ns	4.920 ns	1.85
SIMDUtf8ValidationRealData	data/Bogatov136.utf8.txt	98.05 ns	4.872 ns	0.267 ns	1.39
SIMDUtf8ValidationRealData	data/Bogatov286.utf8.txt	193.19 ns	13.463 ns	0.738 ns	1.48
SIMDUtf8ValidationRealData	data/Bogatov527.utf8.txt	318.96 ns	11.377 ns	0.624 ns	1.65
DotnetRuntimeUtf8ValidationRealData	data/Bogatov1069.utf8.txt	665.67 ns	1.828 ns	0.100 ns	1.61
DotnetRuntimeUtf8ValidationRealData	data/Bogatov136.utf8.txt	98.98 ns	11.743 ns	0.644 ns	1.37
DotnetRuntimeUtf8ValidationRealData	data/Bogatov286.utf8.txt	187.39 ns	0.386 ns	0.021 ns	1.53
DotnetRuntimeUtf8ValidationRealData	data/Bogatov527.utf8.txt	338.34 ns	1.897 ns	0.104 ns	1.56

EgorBo

Thanks for the improvement and the clean up! 🙂 I'll integrate these changes into my test branch to re-run the numbers

Nick-Nuon

Seen/noted, looks good!

Ubuntu added 2 commits June 25, 2024 19:03

trying to reduce the cost of the 4-byte char

1cce4a6

adding results

7bdac73

lemire requested review from EgorBo and Nick-Nuon June 25, 2024 19:47

lemire changed the title ~~Trimming addacross (Neoverse N1)~~ Trimming add-across (Neoverse N1) Jun 25, 2024

lemire mentioned this pull request Jun 25, 2024

Try SimdUnicode for Utf8 validation dotnet/runtime#103860

Closed

making some of the arm code prettier

906201f

EgorBo approved these changes Jun 25, 2024

View reviewed changes

lemire merged commit 6f92b06 into main Jun 25, 2024
6 checks passed

Nick-Nuon reviewed Jun 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trimming add-across (Neoverse N1) #49

Trimming add-across (Neoverse N1) #49

Uh oh!

lemire commented Jun 25, 2024 •

edited

Loading

Uh oh!

EgorBo left a comment •

edited

Loading

Uh oh!

Uh oh!

Nick-Nuon left a comment

Uh oh!

Uh oh!

Trimming add-across (Neoverse N1) #49

Trimming add-across (Neoverse N1) #49

Uh oh!

Conversation

lemire commented Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EgorBo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Nick-Nuon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lemire commented Jun 25, 2024 •

edited

Loading

EgorBo left a comment •

edited

Loading