Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scalar/Neon/Neon hybrid for Keccak-x4 #179

Merged
merged 13 commits into from
Oct 2, 2024

Conversation

hanno-becker
Copy link
Contributor

@hanno-becker hanno-becker commented Oct 1, 2024

This PR adds hybrid scalar/Neon/Neon implementations of Keccak-x4. Those implementations were first described in https://eprint.iacr.org/2022/1243.

We don't consume the implementations of that paper as-is, however, but auto-generate the code from clean de-interleaved assembly. As it stands, only the interleaved versions are added here.

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
@hanno-becker hanno-becker force-pushed the keccak_scalar_neon_neon_hybrid branch 3 times, most recently from 28f7d6f to 268be64 Compare October 2, 2024 05:46
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
@hanno-becker hanno-becker marked this pull request as ready for review October 2, 2024 05:51
@hanno-becker hanno-becker requested a review from a team October 2, 2024 05:51
@hanno-becker hanno-becker added the benchmark this PR should be benchmarked in CI label Oct 2, 2024
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
@hanno-becker hanno-becker added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Oct 2, 2024
@hanno-becker
Copy link
Contributor Author

hanno-becker commented Oct 2, 2024

I haven't yet amended optimize.sh to reproduce the optimizations from de-interleaved code. As it stands, the optimizations are only implemented in example.py in the draft PR slothy-optimizer/slothy#65.

I'd prefer to do this in a follow-up, but could be convinced otherwise.

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
The scope of MLKEM_USE_NTT_ASM_CLEAN already went beyond the NTT,
and will soon be further expanded to cover the choice of Keccak
implementation.

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hanno-becker!!
It's getting rather hard to keep track of which implementation is used for each target. #141 should be addressed soon.

Code looks mostly good to me - some comments.
Here is the performance of keccak-f1600-x4 cycles

platform before after
A72 6196 5407
A76 3708 2607
A55 6012 6346
G2 3709 2608
G3 2078 1604

That looks great except for the A55 - there, you just want to use a scalar 1x implementation throughout. That's in line with the paper.
We'll need a special case for the A55 I guess. Probably better to do that as a follow-up PR.

Thanks for introducing the profile for the special case A55.

fips202/asm/aarch64/keccak_f1600_x4_v84a_scalar_hybrid.S Outdated Show resolved Hide resolved
fips202/asm/aarch64/keccak_f1600_x4_v84a_scalar_hybrid.S Outdated Show resolved Hide resolved
fips202/asm/aarch64/keccak_f1600_x4_v8a_scalar_hybrid.S Outdated Show resolved Hide resolved
fips202/keccakf1600.c Show resolved Hide resolved
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
@hanno-becker hanno-becker added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Oct 2, 2024
Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
@hanno-becker hanno-becker added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Oct 2, 2024
@mkannwischer
Copy link
Contributor

Thanks! Now A55 performance looks good again, too.
LGTM.

@mkannwischer mkannwischer merged commit 5de13cb into main Oct 2, 2024
77 checks passed
@mkannwischer mkannwischer deleted the keccak_scalar_neon_neon_hybrid branch October 2, 2024 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark this PR should be benchmarked in CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants