-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arm64,purego: field arithmetic mul for arm64 and cleanup build tags #257
Conversation
@@ -1,4 +1,4 @@ | |||
// +build !amd64_adx | |||
// +build !purego |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by amd64_adx
being replaced with purego
. I thought that when ADX was available we definitely used assembly and hence it wouldn't be pure Go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is what's happening the following:
Previously we had two sets of x64 assembly, one for when ADX is available and one when not. Turns out assembly is not quite worth it unless ADX is available hence the purego
flag is set unless ADX is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
purego
is a wider Golang community conventioon, to avoid the use of assembly at all.
The previous amd64_adx
was removing the instructions to actually check the presence of the ADX instructions at run time, which can result in a minor 2-4% speed up.
This PR introduce the following changes:
arm64
performance boost for field arithmetic Multiplication. No assembly, generating pure Go code that match closely what we would hand-write.amd64_adx
build tag and duplicate.s
assembly files generated foramd64
target;purego
build tag which follows other projects convention to disable assembly if providedBenchmarks (
arm64
)These are run on a M1 chip, but give ~similar performance improvement on AWS Graviton3 (not on AWS Graviton2 !). Some mobile devices may also benefit from these improvments, and particularly the
Square
algorithm (not generated by default to simplify codebase).Mul generation details