-
Notifications
You must be signed in to change notification settings - Fork 181
Preparation for ARM64 Implementation of poly operations for dilithium package. #562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
a part of bigger PR: #561 |
|
once this PR is approved, I will provide the the other PR's due to the fact that the other PR's need the base files (arm64.s and arm64.go) |
|
what should be considered is the alignment of the poly array. The difference between unaligned and aligned loads and stores needs to be tested. |
| // manually unrolling could also be done, for now skipped | ||
| MOVW $16, R3 | ||
|
|
||
| add: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can just call this loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okey, I will take it under consideration on the next PR's!
| VLD1.P (64)(R1), [V0.S4, V1.S4, V2.S4, V3.S4] | ||
| VLD1.P (64)(R2), [V4.S4, V5.S4, V6.S4, V7.S4] | ||
|
|
||
| VADD V4.S4, V0.S4, V8.S4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: it's not necessary here, but you can reuse V0 or V4 as target register.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are absolutely right!
|
This is ready to be merged @armfazh |
|
Thank you @elementrics, keep 'm coming! |
To test performance difference on arm64 chips: "go test -benchmem -run=^$ ./sign/internal/dilithium -bench=Add"
On my machine (Apple M1 Max) on average:
Also consider this are microbenchmarks!