Skip to content

Conversation

@elementrics
Copy link
Contributor

@elementrics elementrics commented Aug 14, 2025

To test performance difference on arm64 chips: "go test -benchmem -run=^$ ./sign/internal/dilithium -bench=Add"

On my machine (Apple M1 Max) on average:

BenchmarkAddGeneric-10          12393860                95.46 ns/op            0 B/op          0 allocs/op
BenchmarkAdd-10                 68264402                17.40 ns/op            0 B/op          0 allocs/op

Also consider this are microbenchmarks!

@elementrics
Copy link
Contributor Author

a part of bigger PR: #561

@elementrics
Copy link
Contributor Author

elementrics commented Aug 14, 2025

once this PR is approved, I will provide the the other PR's due to the fact that the other PR's need the base files (arm64.s and arm64.go)

@elementrics
Copy link
Contributor Author

elementrics commented Aug 14, 2025

what should be considered is the alignment of the poly array. The difference between unaligned and aligned loads and stores needs to be tested.

@bwesterb bwesterb self-requested a review August 14, 2025 20:19
// manually unrolling could also be done, for now skipped
MOVW $16, R3

add:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can just call this loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okey, I will take it under consideration on the next PR's!

VLD1.P (64)(R1), [V0.S4, V1.S4, V2.S4, V3.S4]
VLD1.P (64)(R2), [V4.S4, V5.S4, V6.S4, V7.S4]

VADD V4.S4, V0.S4, V8.S4
Copy link
Member

@bwesterb bwesterb Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: it's not necessary here, but you can reuse V0 or V4 as target register.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are absolutely right!

@bwesterb
Copy link
Member

This is ready to be merged @armfazh

@armfazh armfazh merged commit e5f5529 into cloudflare:main Aug 14, 2025
11 checks passed
@bwesterb
Copy link
Member

Thank you @elementrics, keep 'm coming!

@elementrics elementrics deleted the arm64prep branch August 15, 2025 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants