Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster proofs for arm/proofs/bignum_k{mul,sqr}_*_neon.ml #121

Closed
wants to merge 2 commits into from

Conversation

aqjune-aws
Copy link
Collaborator

(Marked as a draft because (1) this is made on top of #118, (2) the speedup needs to be measured on the CI check machines, and (3) the proofs may fail due to my mistake)

This makes the proofs of arm/proofs/bignum_k{mul,sqr}_{16_32,32_64}_neon.ml
faster by unusing the inlined hoare triple proofs for local 8->16 multiplication
and instead using the corresponding lemmas in bignum_{mul,sqr}_8_16_neon.ml.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

This patch adds `bignum_mont{sqr,mul}_p256_neon` functions.

These are vectorized and instruction-rescheduled versions of `bignum_mont{sqr,mul}_p256`.
They are verified using the equivalence checking tactics.

A new bash script `tools/external/slothy.sh` is added to help reproduce the optimized output.
The 'intermediate' functions of the two functions are written as comments in the two assembly files.

Additionally,
- A new instruction `umull2` is formalized add added to the simulator in order to verify the new functions.
- Old `*_neon` functions' proofs are refactored a bit.
This makes the proofs of `arm/proofs/bignum_k{mul,sqr}_{16_32,32_64}_neon.ml`
faster by unusing the inlined hoare triple proofs for local 8->16 multiplication
and instead using the corresponding lemmas in bignum_{mul,sqr}_8_16_neon.ml.
@aqjune-aws aqjune-aws closed this Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant