-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clang does not recognize portable add-with-carry patterns #73847
Comments
@EugeneZelenko are you sure |
@AaronBallman @erichkeane any advice here? |
This is likely something that the opt folks have to take a look at. @topperc was particularly good at bit pattern recognition at one point, so he might be able to help out. |
I'm getting tired of having to rederive the best way to convince the compiler to emit addc and subb functions. Do it once and use the Clang builtins when available, because compilers seem to generally be terrible at this. (See llvm/llvm-project#73847.) The immediate trigger was the FIPS 186-2 PRF, which completely doesn't matter, but reminded me of this mess. As far as naming and calling conventions go, I just mimicked the Clang ones. In doing so, also use the Clang builtins when available, which helps Clang x86_64 no-asm builds a bit: Before: Did 704 ECDH P-384 operations in 1018920us (690.9 ops/sec) Did 1353 ECDSA P-384 signing operations in 1077927us (1255.2 ops/sec) Did 1190 ECDSA P-384 verify operations in 1020788us (1165.8 ops/sec) Did 784 RSA 2048 signing operations in 1058644us (740.6 ops/sec) Did 34000 RSA 2048 verify (same key) operations in 1011854us (33601.7 ops/sec) Did 30000 RSA 2048 verify (fresh key) operations in 1005974us (29821.8 ops/sec) Did 7799 RSA 2048 private key parse operations in 1061203us (7349.2 ops/sec) Did 130 RSA 4096 signing operations in 1082617us (120.1 ops/sec) Did 10472 RSA 4096 verify (same key) operations in 1082857us (9670.7 ops/sec) Did 9086 RSA 4096 verify (fresh key) operations in 1039164us (8743.6 ops/sec) Did 2574 RSA 4096 private key parse operations in 1078946us (2385.7 ops/sec) After: Did 775 ECDH P-384 operations in 1008465us (768.5 ops/sec) Did 1474 ECDSA P-384 signing operations in 1062096us (1387.8 ops/sec) Did 1485 ECDSA P-384 verify operations in 1086574us (1366.7 ops/sec) Did 812 RSA 2048 signing operations in 1043705us (778.0 ops/sec) Did 36000 RSA 2048 verify (same key) operations in 1005643us (35798.0 ops/sec) Did 33000 RSA 2048 verify (fresh key) operations in 1028256us (32093.2 ops/sec) Did 10087 RSA 2048 private key parse operations in 1018067us (9908.0 ops/sec) Did 132 RSA 4096 signing operations in 1033049us (127.8 ops/sec) Did 11000 RSA 4096 verify (same key) operations in 1070502us (10275.6 ops/sec) Did 9812 RSA 4096 verify (fresh key) operations in 1047618us (9366.0 ops/sec) Did 3245 RSA 4096 private key parse operations in 1083247us (2995.6 ops/sec) But this is also a no-asm build, so we don't really care. Builds with assembly, broadly, do not use these codepaths. The exception is the generic ECC code on 32-bit Arm, which has a few mod-add functions, and we don't have 32-bit Arm bn_add_words assembly: Before: Did 168 ECDH P-384 operations in 1003229us (167.5 ops/sec) Did 330 ECDSA P-384 signing operations in 1076600us (306.5 ops/sec) Did 319 ECDSA P-384 verify operations in 1080750us (295.2 ops/sec) After: Did 195 ECDH P-384 operations in 1026458us (190.0 ops/sec) Did 350 ECDSA P-384 signing operations in 1005392us (348.1 ops/sec) Did 341 ECDSA P-384 verify operations in 1008486us (338.1 ops/sec) Change-Id: Ia3fa51e59398224b9c39180e1d856bb412aa1246 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/64309 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>
FYI gcc does support __builtin_addc so it isn't clang specific anymore. Do other compilers optimize any of your sequences? |
Yeah, I realized that after I filed this. We're using that when available, annoying as the ifdef soup is. :-) But it'd be nice if more portable sequences were recognized so not every project needs to discover this on their own.
Not that I'm aware of. The compilers we've used seem to be uniformly pretty bad at handling carry flags, alas. This was filed less as a feature parity thing and more as a missed optimization. (This sort of code, when really perf-sensitive, often needs to dip all the way into assembly to avoid compiler mishaps. Compilers have a long way to go here.) |
I'm getting tired of having to rederive the best way to convince the compiler to emit addc and subb functions. Do it once and use the Clang builtins when available, because compilers seem to generally be terrible at this. (See llvm/llvm-project#73847.) The immediate trigger was the FIPS 186-2 PRF, which completely doesn't matter, but reminded me of this mess. As far as naming and calling conventions go, I just mimicked the Clang ones. In doing so, also use the Clang builtins when available, which helps Clang x86_64 no-asm builds a bit: Before: Did 704 ECDH P-384 operations in 1018920us (690.9 ops/sec) Did 1353 ECDSA P-384 signing operations in 1077927us (1255.2 ops/sec) Did 1190 ECDSA P-384 verify operations in 1020788us (1165.8 ops/sec) Did 784 RSA 2048 signing operations in 1058644us (740.6 ops/sec) Did 34000 RSA 2048 verify (same key) operations in 1011854us (33601.7 ops/sec) Did 30000 RSA 2048 verify (fresh key) operations in 1005974us (29821.8 ops/sec) Did 7799 RSA 2048 private key parse operations in 1061203us (7349.2 ops/sec) Did 130 RSA 4096 signing operations in 1082617us (120.1 ops/sec) Did 10472 RSA 4096 verify (same key) operations in 1082857us (9670.7 ops/sec) Did 9086 RSA 4096 verify (fresh key) operations in 1039164us (8743.6 ops/sec) Did 2574 RSA 4096 private key parse operations in 1078946us (2385.7 ops/sec) After: Did 775 ECDH P-384 operations in 1008465us (768.5 ops/sec) Did 1474 ECDSA P-384 signing operations in 1062096us (1387.8 ops/sec) Did 1485 ECDSA P-384 verify operations in 1086574us (1366.7 ops/sec) Did 812 RSA 2048 signing operations in 1043705us (778.0 ops/sec) Did 36000 RSA 2048 verify (same key) operations in 1005643us (35798.0 ops/sec) Did 33000 RSA 2048 verify (fresh key) operations in 1028256us (32093.2 ops/sec) Did 10087 RSA 2048 private key parse operations in 1018067us (9908.0 ops/sec) Did 132 RSA 4096 signing operations in 1033049us (127.8 ops/sec) Did 11000 RSA 4096 verify (same key) operations in 1070502us (10275.6 ops/sec) Did 9812 RSA 4096 verify (fresh key) operations in 1047618us (9366.0 ops/sec) Did 3245 RSA 4096 private key parse operations in 1083247us (2995.6 ops/sec) But this is also a no-asm build, so we don't really care. Builds with assembly, broadly, do not use these codepaths. The exception is the generic ECC code on 32-bit Arm, which has a few mod-add functions, and we don't have 32-bit Arm bn_add_words assembly: Before: Did 168 ECDH P-384 operations in 1003229us (167.5 ops/sec) Did 330 ECDSA P-384 signing operations in 1076600us (306.5 ops/sec) Did 319 ECDSA P-384 verify operations in 1080750us (295.2 ops/sec) After: Did 195 ECDH P-384 operations in 1026458us (190.0 ops/sec) Did 350 ECDSA P-384 signing operations in 1005392us (348.1 ops/sec) Did 341 ECDSA P-384 verify operations in 1008486us (338.1 ops/sec) Change-Id: Ia3fa51e59398224b9c39180e1d856bb412aa1246 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/64309 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> (cherry picked from commit 70ca6bc24be103dabd68e448cd3af29b929b771d)
I'm getting tired of having to rederive the best way to convince the compiler to emit addc and subb functions. Do it once and use the Clang builtins when available, because compilers seem to generally be terrible at this. (See llvm/llvm-project#73847.) The immediate trigger was the FIPS 186-2 PRF, which completely doesn't matter, but reminded me of this mess. As far as naming and calling conventions go, I just mimicked the Clang ones. In doing so, also use the Clang builtins when available, which helps Clang x86_64 no-asm builds a bit: Before: Did 704 ECDH P-384 operations in 1018920us (690.9 ops/sec) Did 1353 ECDSA P-384 signing operations in 1077927us (1255.2 ops/sec) Did 1190 ECDSA P-384 verify operations in 1020788us (1165.8 ops/sec) Did 784 RSA 2048 signing operations in 1058644us (740.6 ops/sec) Did 34000 RSA 2048 verify (same key) operations in 1011854us (33601.7 ops/sec) Did 30000 RSA 2048 verify (fresh key) operations in 1005974us (29821.8 ops/sec) Did 7799 RSA 2048 private key parse operations in 1061203us (7349.2 ops/sec) Did 130 RSA 4096 signing operations in 1082617us (120.1 ops/sec) Did 10472 RSA 4096 verify (same key) operations in 1082857us (9670.7 ops/sec) Did 9086 RSA 4096 verify (fresh key) operations in 1039164us (8743.6 ops/sec) Did 2574 RSA 4096 private key parse operations in 1078946us (2385.7 ops/sec) After: Did 775 ECDH P-384 operations in 1008465us (768.5 ops/sec) Did 1474 ECDSA P-384 signing operations in 1062096us (1387.8 ops/sec) Did 1485 ECDSA P-384 verify operations in 1086574us (1366.7 ops/sec) Did 812 RSA 2048 signing operations in 1043705us (778.0 ops/sec) Did 36000 RSA 2048 verify (same key) operations in 1005643us (35798.0 ops/sec) Did 33000 RSA 2048 verify (fresh key) operations in 1028256us (32093.2 ops/sec) Did 10087 RSA 2048 private key parse operations in 1018067us (9908.0 ops/sec) Did 132 RSA 4096 signing operations in 1033049us (127.8 ops/sec) Did 11000 RSA 4096 verify (same key) operations in 1070502us (10275.6 ops/sec) Did 9812 RSA 4096 verify (fresh key) operations in 1047618us (9366.0 ops/sec) Did 3245 RSA 4096 private key parse operations in 1083247us (2995.6 ops/sec) But this is also a no-asm build, so we don't really care. Builds with assembly, broadly, do not use these codepaths. The exception is the generic ECC code on 32-bit Arm, which has a few mod-add functions, and we don't have 32-bit Arm bn_add_words assembly: Before: Did 168 ECDH P-384 operations in 1003229us (167.5 ops/sec) Did 330 ECDSA P-384 signing operations in 1076600us (306.5 ops/sec) Did 319 ECDSA P-384 verify operations in 1080750us (295.2 ops/sec) After: Did 195 ECDH P-384 operations in 1026458us (190.0 ops/sec) Did 350 ECDSA P-384 signing operations in 1005392us (348.1 ops/sec) Did 341 ECDSA P-384 verify operations in 1008486us (338.1 ops/sec) Change-Id: Ia3fa51e59398224b9c39180e1d856bb412aa1246 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/64309 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> (cherry picked from commit 70ca6bc24be103dabd68e448cd3af29b929b771d)
I'm getting tired of having to rederive the best way to convince the compiler to emit addc and subb functions. Do it once and use the Clang builtins when available, because compilers seem to generally be terrible at this. (See llvm/llvm-project#73847.) The immediate trigger was the FIPS 186-2 PRF, which completely doesn't matter, but reminded me of this mess. As far as naming and calling conventions go, I just mimicked the Clang ones. In doing so, also use the Clang builtins when available, which helps Clang x86_64 no-asm builds a bit: Before: Did 704 ECDH P-384 operations in 1018920us (690.9 ops/sec) Did 1353 ECDSA P-384 signing operations in 1077927us (1255.2 ops/sec) Did 1190 ECDSA P-384 verify operations in 1020788us (1165.8 ops/sec) Did 784 RSA 2048 signing operations in 1058644us (740.6 ops/sec) Did 34000 RSA 2048 verify (same key) operations in 1011854us (33601.7 ops/sec) Did 30000 RSA 2048 verify (fresh key) operations in 1005974us (29821.8 ops/sec) Did 7799 RSA 2048 private key parse operations in 1061203us (7349.2 ops/sec) Did 130 RSA 4096 signing operations in 1082617us (120.1 ops/sec) Did 10472 RSA 4096 verify (same key) operations in 1082857us (9670.7 ops/sec) Did 9086 RSA 4096 verify (fresh key) operations in 1039164us (8743.6 ops/sec) Did 2574 RSA 4096 private key parse operations in 1078946us (2385.7 ops/sec) After: Did 775 ECDH P-384 operations in 1008465us (768.5 ops/sec) Did 1474 ECDSA P-384 signing operations in 1062096us (1387.8 ops/sec) Did 1485 ECDSA P-384 verify operations in 1086574us (1366.7 ops/sec) Did 812 RSA 2048 signing operations in 1043705us (778.0 ops/sec) Did 36000 RSA 2048 verify (same key) operations in 1005643us (35798.0 ops/sec) Did 33000 RSA 2048 verify (fresh key) operations in 1028256us (32093.2 ops/sec) Did 10087 RSA 2048 private key parse operations in 1018067us (9908.0 ops/sec) Did 132 RSA 4096 signing operations in 1033049us (127.8 ops/sec) Did 11000 RSA 4096 verify (same key) operations in 1070502us (10275.6 ops/sec) Did 9812 RSA 4096 verify (fresh key) operations in 1047618us (9366.0 ops/sec) Did 3245 RSA 4096 private key parse operations in 1083247us (2995.6 ops/sec) But this is also a no-asm build, so we don't really care. Builds with assembly, broadly, do not use these codepaths. The exception is the generic ECC code on 32-bit Arm, which has a few mod-add functions, and we don't have 32-bit Arm bn_add_words assembly: Before: Did 168 ECDH P-384 operations in 1003229us (167.5 ops/sec) Did 330 ECDSA P-384 signing operations in 1076600us (306.5 ops/sec) Did 319 ECDSA P-384 verify operations in 1080750us (295.2 ops/sec) After: Did 195 ECDH P-384 operations in 1026458us (190.0 ops/sec) Did 350 ECDSA P-384 signing operations in 1005392us (348.1 ops/sec) Did 341 ECDSA P-384 verify operations in 1008486us (338.1 ops/sec) Change-Id: Ia3fa51e59398224b9c39180e1d856bb412aa1246 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/64309 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> (cherry picked from commit 70ca6bc24be103dabd68e448cd3af29b929b771d)
I'm getting tired of having to rederive the best way to convince the compiler to emit addc and subb functions. Do it once and use the Clang builtins when available, because compilers seem to generally be terrible at this. (See llvm/llvm-project#73847.) The immediate trigger was the FIPS 186-2 PRF, which completely doesn't matter, but reminded me of this mess. As far as naming and calling conventions go, I just mimicked the Clang ones. In doing so, also use the Clang builtins when available, which helps Clang x86_64 no-asm builds a bit: Before: Did 704 ECDH P-384 operations in 1018920us (690.9 ops/sec) Did 1353 ECDSA P-384 signing operations in 1077927us (1255.2 ops/sec) Did 1190 ECDSA P-384 verify operations in 1020788us (1165.8 ops/sec) Did 784 RSA 2048 signing operations in 1058644us (740.6 ops/sec) Did 34000 RSA 2048 verify (same key) operations in 1011854us (33601.7 ops/sec) Did 30000 RSA 2048 verify (fresh key) operations in 1005974us (29821.8 ops/sec) Did 7799 RSA 2048 private key parse operations in 1061203us (7349.2 ops/sec) Did 130 RSA 4096 signing operations in 1082617us (120.1 ops/sec) Did 10472 RSA 4096 verify (same key) operations in 1082857us (9670.7 ops/sec) Did 9086 RSA 4096 verify (fresh key) operations in 1039164us (8743.6 ops/sec) Did 2574 RSA 4096 private key parse operations in 1078946us (2385.7 ops/sec) After: Did 775 ECDH P-384 operations in 1008465us (768.5 ops/sec) Did 1474 ECDSA P-384 signing operations in 1062096us (1387.8 ops/sec) Did 1485 ECDSA P-384 verify operations in 1086574us (1366.7 ops/sec) Did 812 RSA 2048 signing operations in 1043705us (778.0 ops/sec) Did 36000 RSA 2048 verify (same key) operations in 1005643us (35798.0 ops/sec) Did 33000 RSA 2048 verify (fresh key) operations in 1028256us (32093.2 ops/sec) Did 10087 RSA 2048 private key parse operations in 1018067us (9908.0 ops/sec) Did 132 RSA 4096 signing operations in 1033049us (127.8 ops/sec) Did 11000 RSA 4096 verify (same key) operations in 1070502us (10275.6 ops/sec) Did 9812 RSA 4096 verify (fresh key) operations in 1047618us (9366.0 ops/sec) Did 3245 RSA 4096 private key parse operations in 1083247us (2995.6 ops/sec) But this is also a no-asm build, so we don't really care. Builds with assembly, broadly, do not use these codepaths. The exception is the generic ECC code on 32-bit Arm, which has a few mod-add functions, and we don't have 32-bit Arm bn_add_words assembly: Before: Did 168 ECDH P-384 operations in 1003229us (167.5 ops/sec) Did 330 ECDSA P-384 signing operations in 1076600us (306.5 ops/sec) Did 319 ECDSA P-384 verify operations in 1080750us (295.2 ops/sec) After: Did 195 ECDH P-384 operations in 1026458us (190.0 ops/sec) Did 350 ECDSA P-384 signing operations in 1005392us (348.1 ops/sec) Did 341 ECDSA P-384 verify operations in 1008486us (338.1 ops/sec) Change-Id: Ia3fa51e59398224b9c39180e1d856bb412aa1246 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/64309 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> (cherry picked from commit 70ca6bc24be103dabd68e448cd3af29b929b771d)
I'm getting tired of having to rederive the best way to convince the compiler to emit addc and subb functions. Do it once and use the Clang builtins when available, because compilers seem to generally be terrible at this. (See llvm/llvm-project#73847.) The immediate trigger was the FIPS 186-2 PRF, which completely doesn't matter, but reminded me of this mess. As far as naming and calling conventions go, I just mimicked the Clang ones. In doing so, also use the Clang builtins when available, which helps Clang x86_64 no-asm builds a bit: Before: Did 704 ECDH P-384 operations in 1018920us (690.9 ops/sec) Did 1353 ECDSA P-384 signing operations in 1077927us (1255.2 ops/sec) Did 1190 ECDSA P-384 verify operations in 1020788us (1165.8 ops/sec) Did 784 RSA 2048 signing operations in 1058644us (740.6 ops/sec) Did 34000 RSA 2048 verify (same key) operations in 1011854us (33601.7 ops/sec) Did 30000 RSA 2048 verify (fresh key) operations in 1005974us (29821.8 ops/sec) Did 7799 RSA 2048 private key parse operations in 1061203us (7349.2 ops/sec) Did 130 RSA 4096 signing operations in 1082617us (120.1 ops/sec) Did 10472 RSA 4096 verify (same key) operations in 1082857us (9670.7 ops/sec) Did 9086 RSA 4096 verify (fresh key) operations in 1039164us (8743.6 ops/sec) Did 2574 RSA 4096 private key parse operations in 1078946us (2385.7 ops/sec) After: Did 775 ECDH P-384 operations in 1008465us (768.5 ops/sec) Did 1474 ECDSA P-384 signing operations in 1062096us (1387.8 ops/sec) Did 1485 ECDSA P-384 verify operations in 1086574us (1366.7 ops/sec) Did 812 RSA 2048 signing operations in 1043705us (778.0 ops/sec) Did 36000 RSA 2048 verify (same key) operations in 1005643us (35798.0 ops/sec) Did 33000 RSA 2048 verify (fresh key) operations in 1028256us (32093.2 ops/sec) Did 10087 RSA 2048 private key parse operations in 1018067us (9908.0 ops/sec) Did 132 RSA 4096 signing operations in 1033049us (127.8 ops/sec) Did 11000 RSA 4096 verify (same key) operations in 1070502us (10275.6 ops/sec) Did 9812 RSA 4096 verify (fresh key) operations in 1047618us (9366.0 ops/sec) Did 3245 RSA 4096 private key parse operations in 1083247us (2995.6 ops/sec) But this is also a no-asm build, so we don't really care. Builds with assembly, broadly, do not use these codepaths. The exception is the generic ECC code on 32-bit Arm, which has a few mod-add functions, and we don't have 32-bit Arm bn_add_words assembly: Before: Did 168 ECDH P-384 operations in 1003229us (167.5 ops/sec) Did 330 ECDSA P-384 signing operations in 1076600us (306.5 ops/sec) Did 319 ECDSA P-384 verify operations in 1080750us (295.2 ops/sec) After: Did 195 ECDH P-384 operations in 1026458us (190.0 ops/sec) Did 350 ECDSA P-384 signing operations in 1005392us (348.1 ops/sec) Did 341 ECDSA P-384 verify operations in 1008486us (338.1 ops/sec) Change-Id: Ia3fa51e59398224b9c39180e1d856bb412aa1246 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/64309 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> (cherry picked from commit 70ca6bc24be103dabd68e448cd3af29b929b771d)
I'm getting tired of having to rederive the best way to convince the compiler to emit addc and subb functions. Do it once and use the Clang builtins when available, because compilers seem to generally be terrible at this. (See llvm/llvm-project#73847.) The immediate trigger was the FIPS 186-2 PRF, which completely doesn't matter, but reminded me of this mess. As far as naming and calling conventions go, I just mimicked the Clang ones. In doing so, also use the Clang builtins when available, which helps Clang x86_64 no-asm builds a bit: Before: Did 704 ECDH P-384 operations in 1018920us (690.9 ops/sec) Did 1353 ECDSA P-384 signing operations in 1077927us (1255.2 ops/sec) Did 1190 ECDSA P-384 verify operations in 1020788us (1165.8 ops/sec) Did 784 RSA 2048 signing operations in 1058644us (740.6 ops/sec) Did 34000 RSA 2048 verify (same key) operations in 1011854us (33601.7 ops/sec) Did 30000 RSA 2048 verify (fresh key) operations in 1005974us (29821.8 ops/sec) Did 7799 RSA 2048 private key parse operations in 1061203us (7349.2 ops/sec) Did 130 RSA 4096 signing operations in 1082617us (120.1 ops/sec) Did 10472 RSA 4096 verify (same key) operations in 1082857us (9670.7 ops/sec) Did 9086 RSA 4096 verify (fresh key) operations in 1039164us (8743.6 ops/sec) Did 2574 RSA 4096 private key parse operations in 1078946us (2385.7 ops/sec) After: Did 775 ECDH P-384 operations in 1008465us (768.5 ops/sec) Did 1474 ECDSA P-384 signing operations in 1062096us (1387.8 ops/sec) Did 1485 ECDSA P-384 verify operations in 1086574us (1366.7 ops/sec) Did 812 RSA 2048 signing operations in 1043705us (778.0 ops/sec) Did 36000 RSA 2048 verify (same key) operations in 1005643us (35798.0 ops/sec) Did 33000 RSA 2048 verify (fresh key) operations in 1028256us (32093.2 ops/sec) Did 10087 RSA 2048 private key parse operations in 1018067us (9908.0 ops/sec) Did 132 RSA 4096 signing operations in 1033049us (127.8 ops/sec) Did 11000 RSA 4096 verify (same key) operations in 1070502us (10275.6 ops/sec) Did 9812 RSA 4096 verify (fresh key) operations in 1047618us (9366.0 ops/sec) Did 3245 RSA 4096 private key parse operations in 1083247us (2995.6 ops/sec) But this is also a no-asm build, so we don't really care. Builds with assembly, broadly, do not use these codepaths. The exception is the generic ECC code on 32-bit Arm, which has a few mod-add functions, and we don't have 32-bit Arm bn_add_words assembly: Before: Did 168 ECDH P-384 operations in 1003229us (167.5 ops/sec) Did 330 ECDSA P-384 signing operations in 1076600us (306.5 ops/sec) Did 319 ECDSA P-384 verify operations in 1080750us (295.2 ops/sec) After: Did 195 ECDH P-384 operations in 1026458us (190.0 ops/sec) Did 350 ECDSA P-384 signing operations in 1005392us (348.1 ops/sec) Did 341 ECDSA P-384 verify operations in 1008486us (338.1 ops/sec) Change-Id: Ia3fa51e59398224b9c39180e1d856bb412aa1246 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/64309 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> (cherry picked from commit 70ca6bc24be103dabd68e448cd3af29b929b771d)
When writing code for cryptographic primitives, or big integers in general, one often needs to use the ISA's add-with-carry instructions and chain carry flags up a bignum.
Although Clang does provide Clang-specific intrinsics like
__builtin_addc
, they're not portable across compilers. I made several attempts to write a portable add-with-carry here, and Clang seems unable to recognize any of them. Here's a godbolt link with a bunch of them:https://godbolt.org/z/WTns6M8E6
(CC @andres-erbsen, do you remember if there were other patterns we'd tried?)
The text was updated successfully, but these errors were encountered: