Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asm_macros (both ADX and without ADX version) Compute Montgomery multiplication of a, b MUL produces wrong output for large a or/and b limbs #978

Open
zkbitcoin opened this issue May 13, 2024 · 1 comment

Comments

@zkbitcoin
Copy link

zkbitcoin commented May 13, 2024

during test with large limbs found output data is wrong (see simple test bed at https://github.com/zkbitcoin/nasm-adx)

running example will create following output (asm(MUL is from projects asm_macros.hpp) case of overflow most likely see limbs_a[0] 9293073166814171452ULL

input limbs:

uint64_t limbs_r[4] = {};
uint64_t limbs_a[4] = {9293073166814171452ULL,4158907695144192454,2644031866505052884,3024693275553353487};
uint64_t limbs_b[4] = {2812702673390851119,5479905877917956870,1104182671213310543,818574998703379345};

generates:

asm(MUL

limbs_r[0] is 12178871726809496723 limbs_r[1] is 13840435079915171493 limbs_r[2] is 16771051252808782701 limbs_r[3] is 3578015002697288320

calculation by hand of Montgomery multiplier should generate (this is correct output)

limbs_r[0] is 7846254855529840460 limbs_r[1] is 2923310935437288472 limbs_r[2] is 3489859301534087952 limbs_r[3] is 91016735894317655

@zkbitcoin zkbitcoin changed the title asm_macros not using ADX Compute Montgomery multiplication of a, b MUL produces wrong output for large a or/and b limbs asm_macros (both ADX and without ADX version) Compute Montgomery multiplication of a, b MUL produces wrong output for large a or/and b limbs May 13, 2024
@zkbitcoin
Copy link
Author

zkbitcoin commented May 13, 2024

adding to assembly (right before pushing data on rdi and returning fixes it) will do pull requests later

;comparison
cmp r15,[modulus + 24]
jc done
jnz sq
cmp r14,[modulus + 16]
jc done
jnz sq
cmp r13,[modulus + 8]
jc done
jnz sq
cmp r12,[modulus + 0]
jc done
jnz sq
sq:
sub r12,[modulus + 0]
sbb r13,[modulus + 8]
sbb r14,[modulus + 16]
sbb r15,[modulus + 24]
done:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant