Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trimming add-across (Neoverse N1) #49

Merged
merged 3 commits into from
Jun 25, 2024
Merged

Trimming add-across (Neoverse N1) #49

merged 3 commits into from
Jun 25, 2024

Conversation

lemire
Copy link
Member

@lemire lemire commented Jun 25, 2024

By conditionally skipping the 8-bit add-across instruction due to 4-byte characters, we get better performance on Neoverse N1 (ARM CPUs with weak SIMD performance).

The 16-byte addv instruction is one the slowest on N1.
Screenshot 2024-06-25 at 4 04 06 PM

It gets much better on N2.

Screenshot 2024-06-25 at 4 04 57 PM

And it is already good on V1:

Screenshot 2024-06-25 at 4 05 36 PM
Method FileName Mean Error StdDev Speed (GB/s)
SIMDUtf8ValidationRealDataMAIN data/twitter.json 89.26 us 2.490 us 0.136 us 7.08
SIMDUtf8ValidationRealData data/twitter.json 90.52 us 1.111 us 0.061 us 6.98
DotnetRuntimeUtf8ValidationRealData data/twitter.json 110.96 us 3.281 us 0.180 us 5.69
Method FileName Mean Error StdDev Speed (GB/s)
SIMDUtf8ValidationRealDataMAIN data/Arabic-Lipsum.utf8.txt 40.945 us 0.2172 us 0.0119 us 1.99
SIMDUtf8ValidationRealDataMAIN data/Chinese-Lipsum.utf8.txt 36.299 us 2.9133 us 0.1597 us 1.92
SIMDUtf8ValidationRealDataMAIN data/Emoji-Lipsum.utf8.txt 33.969 us 1.5396 us 0.0844 us 1.93
SIMDUtf8ValidationRealDataMAIN data/Hebrew-Lipsum.utf8.txt 33.613 us 0.0480 us 0.0026 us 1.98
SIMDUtf8ValidationRealDataMAIN data/Hindi-Lipsum.utf8.txt 47.278 us 0.5625 us 0.0308 us 1.86
SIMDUtf8ValidationRealDataMAIN data/Japanese-Lipsum.utf8.txt 35.142 us 0.0574 us 0.0031 us 1.93
SIMDUtf8ValidationRealDataMAIN data/Korean-Lipsum.utf8.txt 34.358 us 1.4380 us 0.0788 us 1.94
SIMDUtf8ValidationRealDataMAIN data/Latin-Lipsum.utf8.txt 3.682 us 0.0114 us 0.0006 us 23.61
SIMDUtf8ValidationRealDataMAIN data/Russian-Lipsum.utf8.txt 55.565 us 2.5333 us 0.1389 us 1.89
SIMDUtf8ValidationRealData data/Arabic-Lipsum.utf8.txt 37.712 us 0.1409 us 0.0077 us 2.17
SIMDUtf8ValidationRealData data/Chinese-Lipsum.utf8.txt 33.545 us 0.9888 us 0.0542 us 2.08
SIMDUtf8ValidationRealData data/Emoji-Lipsum.utf8.txt 37.238 us 0.1203 us 0.0066 us 1.76
SIMDUtf8ValidationRealData data/Hebrew-Lipsum.utf8.txt 33.957 us 0.0790 us 0.0043 us 1.96
SIMDUtf8ValidationRealData data/Hindi-Lipsum.utf8.txt 44.110 us 0.0708 us 0.0039 us 1.99
SIMDUtf8ValidationRealData data/Japanese-Lipsum.utf8.txt 32.090 us 1.5966 us 0.0875 us 2.11
SIMDUtf8ValidationRealData data/Korean-Lipsum.utf8.txt 30.836 us 0.3372 us 0.0185 us 2.16
SIMDUtf8ValidationRealData data/Latin-Lipsum.utf8.txt 3.687 us 0.1805 us 0.0099 us 23.58
SIMDUtf8ValidationRealData data/Russian-Lipsum.utf8.txt 50.254 us 3.7907 us 0.2078 us 2.08
DotnetRuntimeUtf8ValidationRealData data/Arabic-Lipsum.utf8.txt 86.886 us 11.8837 us 0.6514 us .94
DotnetRuntimeUtf8ValidationRealData data/Chinese-Lipsum.utf8.txt 39.431 us 0.0155 us 0.0008 us 1.77
DotnetRuntimeUtf8ValidationRealData data/Emoji-Lipsum.utf8.txt 96.482 us 0.8070 us 0.0442 us .68
DotnetRuntimeUtf8ValidationRealData data/Hebrew-Lipsum.utf8.txt 72.209 us 2.7361 us 0.1500 us .92
DotnetRuntimeUtf8ValidationRealData data/Hindi-Lipsum.utf8.txt 88.090 us 5.1335 us 0.2814 us 1.00
DotnetRuntimeUtf8ValidationRealData data/Japanese-Lipsum.utf8.txt 41.100 us 0.0615 us 0.0034 us 1.65
DotnetRuntimeUtf8ValidationRealData data/Korean-Lipsum.utf8.txt 69.868 us 0.1748 us 0.0096 us .95
DotnetRuntimeUtf8ValidationRealData data/Latin-Lipsum.utf8.txt 6.929 us 0.0154 us 0.0008 us 12.55
DotnetRuntimeUtf8ValidationRealData data/Russian-Lipsum.utf8.txt 145.420 us 4.4449 us 0.2436 us .72
Method FileName Mean Error StdDev Speed (GB/s)
SIMDUtf8ValidationRealDataMAIN data/Bogatov1069.utf8.txt 625.07 ns 74.340 ns 4.075 ns 1.71
SIMDUtf8ValidationRealDataMAIN data/Bogatov136.utf8.txt 103.47 ns 2.362 ns 0.129 ns 1.31
SIMDUtf8ValidationRealDataMAIN data/Bogatov286.utf8.txt 205.76 ns 4.560 ns 0.250 ns 1.39
SIMDUtf8ValidationRealDataMAIN data/Bogatov527.utf8.txt 336.33 ns 123.567 ns 6.773 ns 1.57
SIMDUtf8ValidationRealData data/Bogatov1069.utf8.txt 576.68 ns 89.757 ns 4.920 ns 1.85
SIMDUtf8ValidationRealData data/Bogatov136.utf8.txt 98.05 ns 4.872 ns 0.267 ns 1.39
SIMDUtf8ValidationRealData data/Bogatov286.utf8.txt 193.19 ns 13.463 ns 0.738 ns 1.48
SIMDUtf8ValidationRealData data/Bogatov527.utf8.txt 318.96 ns 11.377 ns 0.624 ns 1.65
DotnetRuntimeUtf8ValidationRealData data/Bogatov1069.utf8.txt 665.67 ns 1.828 ns 0.100 ns 1.61
DotnetRuntimeUtf8ValidationRealData data/Bogatov136.utf8.txt 98.98 ns 11.743 ns 0.644 ns 1.37
DotnetRuntimeUtf8ValidationRealData data/Bogatov286.utf8.txt 187.39 ns 0.386 ns 0.021 ns 1.53
DotnetRuntimeUtf8ValidationRealData data/Bogatov527.utf8.txt 338.34 ns 1.897 ns 0.104 ns 1.56

@lemire lemire requested review from EgorBo and Nick-Nuon June 25, 2024 19:47
@lemire lemire changed the title Trimming addacross (Neoverse N1) Trimming add-across (Neoverse N1) Jun 25, 2024
Copy link
Collaborator

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement and the clean up! 🙂 I'll integrate these changes into my test branch to re-run the numbers

@lemire lemire merged commit 6f92b06 into main Jun 25, 2024
6 checks passed
Copy link
Collaborator

@Nick-Nuon Nick-Nuon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seen/noted, looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants