Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate integrating pqcrystals common code to OQS common code #973

Closed
baentsch opened this issue Apr 19, 2021 · 5 comments · Fixed by #1221
Closed

Investigate integrating pqcrystals common code to OQS common code #973

baentsch opened this issue Apr 19, 2021 · 5 comments · Fixed by #1221
Assignees

Comments

@baentsch
Copy link
Member

as per this discussion

@baentsch baentsch mentioned this issue Apr 19, 2021
2 tasks
@dstebila
Copy link
Member

This may actually be more about making sure PQCrystals symmetric crypto code is wired to use our liboqs common code (and pick up our platform-specific optimizations) rather than integrating PQCrystals symmetric crypto code into our common code. Need to check if all required functions are available.

@bhess
Copy link
Member

bhess commented Jun 8, 2022

Integrating the pqcrystals (Kyber-90s, Dilithium-AES) reference implementations with the libOQS common-AES. Also adding some more API and shim API to make it usable with minor modifications.

Some measurements on x86_64: (i) libOQS C-AES, (ii) libOQS OpenSSL-AES, (iii) old version with pqcrystals-AES.

Kyber768-90s:

Speed test
==========
Operation                      | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
Kyber768-90s                   |            |                |                 |            |                           |           
keygen                         |      22056 |          3.000 |         136.022 |      4.048 |                    352420 |      10386
encaps                         |      19166 |          3.000 |         156.530 |      2.783 |                    405606 |       7094
decaps                         |      18107 |          3.000 |         165.683 |      2.901 |                    429329 |       7423

Operation                      | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
Kyber768-90s                   |            |                |                 |            |                           |           
keygen                         |      46729 |          3.000 |          64.201 |     26.929 |                    166288 |      69793
encaps                         |      38671 |          3.000 |          77.579 |      2.714 |                    200951 |       6922
decaps                         |      33322 |          3.000 |          90.033 |      2.862 |                    233252 |       7280

Operation                      | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
Kyber768-90s                   |            |                |                 |            |                           |           
keygen                         |      21841 |          3.000 |         137.358 |      4.661 |                    355860 |      12025
encaps                         |      19581 |          3.000 |         153.212 |      5.660 |                    396936 |      14561
decaps                         |      18192 |          3.000 |         164.909 |      4.597 |                    427291 |      11820

Dilithium3-AES:

Operation                      | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
Dilithium3-AES                 |            |                |                 |            |                           |           
keypair                        |       5905 |          3.000 |         508.089 |      9.753 |                   1316758 |      25131
sign                           |       2271 |          3.001 |        1321.471 |    762.397 |                   3425000 |    1976105
verify                         |       6486 |          3.000 |         462.603 |      8.052 |                   1198915 |      20777

Operation                      | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
Dilithium3-AES                 |            |                |                 |            |                           |           
keypair                        |      18961 |          3.000 |         158.222 |      5.937 |                    409910 |      15281
sign                           |       3628 |          3.001 |         827.174 |    602.793 |                   2143812 |    1562435
verify                         |      18723 |          3.000 |         160.236 |      4.811 |                    415172 |      12368

Operation                      | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
Dilithium3-AES                 |            |                |                 |            |                           |           
keypair                        |       6223 |          3.000 |         482.118 |      7.959 |                   1249435 |      20504
sign                           |       2292 |          3.001 |        1309.400 |    777.427 |                   3393566 |    2015028
verify                         |       6742 |          3.000 |         445.001 |     30.026 |                   1153234 |      77723

-> Roughly the same performance using the libOQS C-AES and the old pqcrystals-AES. Improved performance when using OpenSSL.

The avx2 implementations are a bit more tightly integrated with AES(-NI).

@baentsch
Copy link
Member Author

baentsch commented Jun 8, 2022

Good to know. Any performance changes for speed_common?

@bhess
Copy link
Member

bhess commented Jun 9, 2022

Any performance changes for speed_common?

AES-CTR with AESNI was relatively slow, with a throughput about half compared to ECB-mode. The main cause is the code that increases the counter. See results below.

Configuration info
==================
Target platform:  x86_64-Darwin-21.4.0
Compiler:         clang (13.1.6 (clang-1316.0.21.2.5))
Compile options:  [-march=native;-Werror;-Wall;-Wextra;-Wpedantic;-Wno-unused-command-line-argument;-O3;-fomit-frame-pointer;-Wbad-function-cast;-Wcast-qual;-Wnarrowing;-Wconversion]
OQS version:      0.7.2-dev
Git commit:       3cb2bd28282002dbfcdb3c1f5b27252d7ca02097 (+ local modifications)
OpenSSL enabled:  No
AES:              NI
SHA-2:            C
SHA-3:            C
OQS build flags:  OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

Speed test
==========
Operation                      | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
OQS_AES256_ECB_load+free_sch   |    4822668 |          1.000 |           0.207 |      0.418 |                       442 |        229
OQS_AES256_ECB_enc_sch         |   13478044 |          1.000 |           0.074 |      0.269 |                        57 |        112
OQS_AES256_CTR_load+iv+free    |    4732316 |          1.000 |           0.211 |      0.424 |                       452 |        276
OQS_AES256_CTR_sch             |    6360121 |          1.000 |           0.157 |      0.375 |                        83 |        210
OQS_AES256_CTR_sch_upd_blks    |    6415337 |          1.000 |           0.156 |      0.377 |                        80 |        226

I've added some optimizations to the CTR-mode code so that the performance is now about the same as ECB-mode:

Configuration info
==================
Target platform:  x86_64-Darwin-21.4.0
Compiler:         clang (13.1.6 (clang-1316.0.21.2.5))
Compile options:  [-march=native;-Werror;-Wall;-Wextra;-Wpedantic;-Wno-unused-command-line-argument;-O3;-fomit-frame-pointer;-Wbad-function-cast;-Wcast-qual;-Wnarrowing;-Wconversion]
OQS version:      0.7.2-dev
Git commit:       6f57c035d731ea219059e16d1a8cf718b9d6a90f (+ local modifications)
OpenSSL enabled:  No
AES:              NI
SHA-2:            C
SHA-3:            C
OQS build flags:  OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

Speed test
==========
Operation                      | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
OQS_AES256_ECB_load+free_sch   |    4772605 |          1.000 |           0.210 |      0.441 |                       447 |        380
OQS_AES256_ECB_enc_sch         |   13391797 |          1.000 |           0.075 |      0.274 |                        57 |        147
OQS_AES256_CTR_load+iv+free    |    4798098 |          1.000 |           0.208 |      0.420 |                       446 |        234
OQS_AES256_CTR_sch             |   13683774 |          1.000 |           0.073 |      0.268 |                        55 |        132
OQS_AES256_CTR_sch_upd_blks    |   13515225 |          1.000 |           0.074 |      0.268 |                        56 |         99

Other algorithms using CTR-mode may benefit as well. I will add some updated results once everything is integrated.

@baentsch
Copy link
Member Author

Other algorithms using CTR-mode may benefit as well. I will add some updated results once everything is integrated.

Cool! Thanks. Looking forward to seeing that in #1221 when ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants