Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Performance Degradation of HQC128 through Update to 2023-04-30 Submission #2047

Open
BartBBM opened this issue Jan 21, 2025 · 10 comments
Labels
bug Something isn't working; high priority to fix

Comments

@BartBBM
Copy link

BartBBM commented Jan 21, 2025

Describe the bug
When running ./speed_kem HQC-128 from the build/tests directory there is a big performance degradation, introduced by #1585 by @SWilson4.

I wanna mention, that it just may be the effect of not having an avx2 implementation anymore, as mentioned here PQClean/PQClean#512. If this is the reason, then consider my bug report solved.

To Reproduce
Steps to reproduce the behavior:

  1. Checkout tags/0.10.0
  2. Build liboqs
  3. Run performance test ./speed_kem HQC-128
  4. See slow performance
  5. Checkout tags/0.9.2
  6. Build liboqs
  7. Run performance test ./speed_kem HQC-128
  8. See faster performance

Expected behavior
No performance degradation this big.

Logs

0.10.0

Configuration info
==================
Target platform:  x86_64-Linux-5.15.153.1-bebbo-WSL2-local-166808-g33cad9854e0b
Compiler:         gcc (13.3.0)
Compile options:  [-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.10.0
Git commit:       36be57445d8ca53f7095160fde548efe82ace09d
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release 
CPU exts active:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
Speed test
==========
Started at 2025-01-21 16:27:22
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
HQC-128                              |            |                |                 |            |                           |           
keygen                               |       1731 |          3.001 |        1733.832 |    184.774 |                   5881185 |     626126
encaps                               |        883 |          3.003 |        3400.411 |    168.300 |                  11526433 |     570244
decaps                               |        546 |          3.004 |        5501.266 |    898.510 |                  18647074 |    3045484
Ended at 2025-01-21 16:27:31

0.9.2

Configuration info
==================
Target platform:  x86_64-Linux-5.15.153.1-bebbo-WSL2-local-166808-g33cad9854e0b
Compiler:         gcc (13.3.0)
Compile options:  [-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.9.2
Git commit:       62b58a34fbbcc1cb23f2c090c8a19b090ebf1aa2
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release 
CPU exts active:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
Speed test
==========
Started at 2025-01-21 16:23:16
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
HQC-128                              |            |                |                 |            |                           |           
keygen                               |      87515 |          3.000 |          34.280 |     16.798 |                    116235 |      56934
encaps                               |      45523 |          3.000 |          65.902 |     40.549 |                    223264 |     137396
decaps                               |      25801 |          3.000 |         116.275 |     18.582 |                    393633 |      62858
Ended at 2025-01-21 16:23:25

Environment (please complete the following information):

  • OS: Ubuntu 24.04.1 LTS
  • OpenSSL version 3.4.0
  • Compiler version used: gcc (13.3.0)
  • Build variables used: none
  • liboqs version: different versions, see 'to reproduce'

Additional context
I used git bisect to find the exact commit introducing this behaviour.

@SWilson4
Copy link
Member

Thanks for the report! Good to know somebody has an interest in HQC.

It looks like you are building liboqs with OQS_DIST_BUILD=ON, which means the selected "generic" optimization target has no effect (i.e., AVX2 code will be executed if possible). Could you please repeat the test with OQS_DIST_BUILD=OFF and let me know how it goes?

@BartBBM
Copy link
Author

BartBBM commented Jan 21, 2025

Thank you for your quick and nice response :)

When adding the build variable -DOQS_DIST_BUILD=OFF to the cmake call, the performance test gets about twice as fast as before on 0.10.0, but is still missing the performance of 0.9.2 (see log below).

Is the statement correct that this performance degradation should not be there? Because in the publication of the hqc team accompanying the submission is no indication of a performance decrease, rather the opposite.

Configuration info
==================
Target platform:  x86_64-Linux-5.15.153.1-bebbo-WSL2-local-166808-g33cad9854e0b
Compiler:         gcc (13.3.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.10.0
Git commit:       36be57445d8ca53f7095160fde548efe82ace09d
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

Speed test
==========
Started at 2025-01-21 21:56:42
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
HQC-128                              |            |                |                 |            |                           |           
keygen                               |       3050 |          3.001 |         983.856 |     99.124 |                   3344828 |     336738
encaps                               |       1520 |          3.001 |        1974.575 |     74.938 |                   6717250 |     254772
decaps                               |        993 |          3.003 |        3023.771 |    177.682 |                  10286573 |     604254
Ended at 2025-01-21 21:56:51

@SWilson4
Copy link
Member

SWilson4 commented Jan 21, 2025

Sorry, I should have been more clear: I think the relevant test to rerun is 0.9.2, with OQS_DIST_BUILD=OFF and OQS_OPT_TARGET=generic. This will test the 0.9.2 generic code. That way we can see whether there was a performance regression in the generic code.

@BartBBM
Copy link
Author

BartBBM commented Jan 21, 2025

Like requested i built liboqs 0.9.2 with -DOQS_DIST_BUILD=OFF -DOQS_OPT_TARGET=generic. While not achieving the same performance like the code optimized to specific hardware (see initial issue description), there still seems to be a significant performance degradation (from 0.10.0 to 0.9.2).

Configuration info
==================
Target platform:  x86_64-Linux-5.15.153.1-bebbo-WSL2-local-166808-g33cad9854e0b
Compiler:         gcc (13.3.0)
Compile options:  [-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.9.2
Git commit:       62b58a34fbbcc1cb23f2c090c8a19b090ebf1aa2
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  SSE SSE2

Speed test
==========
Started at 2025-01-21 23:50:53
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
HQC-128                              |            |                |                 |            |                           |           
keygen                               |      44150 |          3.000 |          67.951 |     18.962 |                    228888 |      63786
encaps                               |      27831 |          3.000 |         107.795 |     29.895 |                    362830 |     100099
decaps                               |      18034 |          3.000 |         166.353 |     34.304 |                    559912 |     115364
Ended at 2025-01-21 23:51:02

@SWilson4
Copy link
Member

Thinking about it, there was one major change between the 2021-06-06 version and 2023-04-30 version that would affect performance: HQC switched from AES to SHA3 for seed expansion. Based on the logs, it looks like your builds are using AES hardware acceleration. This could explain some of the difference.

If you add the flag OQS_USE_AES_OPENSSL=OFF, I expect you will see a significant performance drop for the 0.9.2 build and effectively no change for the 0.10.0 build. It's also likely that the Barrett reduction routine I patched in is quite a bit slower than the non--constant-time "%" operator from the 2021-06-06 version.

@BartBBM
Copy link
Author

BartBBM commented Jan 23, 2025

Building with -DOQS_DIST_BUILD=OFF -DOQS_OPT_TARGET=generic -DOQS_USE_AES_OPENSSL=OFF version 0.9.2. The results are again slower, but not comparable with 0.10.0 (still missing performance by a factor of at least 4). But maybe I am not understood correctly, I do not wanna make the 0.9.2 as slow as 0.10.0 but rather the other way round :D Maybe I can have a more focused look at it later, but for now i would just consider using 0.9.2 for comparing HQC to BIKE.

Configuration info
==================
Target platform:  x86_64-Linux-5.15.153.1-bebbo-WSL2-local-166808-g33cad9854e0b
Compiler:         gcc (13.3.0)
Compile options:  [-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.9.2
Git commit:       62b58a34fbbcc1cb23f2c090c8a19b090ebf1aa2
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              C
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  SSE SSE2

Speed test
==========
Started at 2025-01-23 12:03:28
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
HQC-128                              |            |                |                 |            |                           |           
keygen                               |      11732 |          3.000 |         255.717 |     28.765 |                    849668 |      95511
encaps                               |       9530 |          3.000 |         314.795 |     34.607 |                   1045998 |     114863
decaps                               |       7718 |          3.000 |         388.732 |     26.778 |                   1291776 |      88929
Ended at 2025-01-23 12:03:37

@SWilson4
Copy link
Member

Building with -DOQS_DIST_BUILD=OFF -DOQS_OPT_TARGET=generic -DOQS_USE_AES_OPENSSL=OFF version 0.9.2. The results are again slower, but not comparable with 0.10.0 (still missing performance by a factor of at least 4). But maybe I am not understood correctly, I do not wanna make the 0.9.2 as slow as 0.10.0 but rather the other way round :D

Of course! That was meant to illustrate how much performance depends on the underlying AES or SHA3 implementation.

We do have an open issue to integrate the HQC AVX2-optimized implementation: #1596. However, I'm not inclined to work on it until the upstream source publishes a fix for the significant correctness/security issue currently present in the reference implementation.

Maybe I can have a more focused look at it later, but for now i would just consider using 0.9.2 for comparing HQC to BIKE.

Sure—but be advised that the HQC spec has been updated a number of times since that release, including the change to SHA3 from AES, so your results won't be current.

@baentsch
Copy link
Member

@SWilson4 Looking at the release history for 0.10.0, I assume responsibility for this: I do not recall having executed the section in the liboqs release process that should ensure no performance degradation between releases :-( Or did you or @dstebila do this with the 0.10.0 RCs you each created and did the script show no regression?

In the light of this issue, what should be improved to avoid this problem from re-occurring? Is the script OK? Worthwhile re-considering the "de-emphasizing" of the profiling sub project ? Has the "noregress" script been run for 0.11.0 and 0.12.0 releases? Are there similar problems with algorithms beyond HQC? Worthwhile creating a separate issue to investigate?

With this many questions, I've got to take a step back and look at it from a more general level: OQS once wanted to indiscriminately report pros and cons of all PQC algorithms -- and that included performance. Could you agree that this seems like another area where project utility got reduced? Might it be worth while to discuss this at OQS TSC and/or PQCA TAC level? I have the nagging feeling that OQS / PQCA pursue too many, somewhat contradicting goals (and/or too few people contributing for the level of goals set), leading to fewer being done at an excellent level. As now a full one year passed since LinuxFoundation/PQCA took control of OQS, time to take stock/review/realign?

@SWilson4
Copy link
Member

@SWilson4 Looking at the release history for 0.10.0, I assume responsibility for this: I do not recall having executed the section in the liboqs release process that should ensure no performance degradation between releases :-( Or did you or @dstebila do this with the 0.10.0 RCs you each created and did the script show no regression?

I don't recall if we did execute the performance script for the 0.10.0 release, but if we did I don't think a drop in HQC performance was significant cause for concern:

  • The 0.10.0 release removed the optimized AVX2 implementation, so slower performance was expected.
  • Even with "generic" code, a switch from AES to SHAKE for the underlying PRF will result in performance changes.
  • I patched a number of non-CT bugs, which will generally slow things down (especially in frequently executed code).

In the light of this issue, what should be improved to avoid this problem from re-occurring? Is the script OK? Worthwhile re-considering the "de-emphasizing" of the profiling sub project ? Has the "noregress" script been run for 0.11.0 and 0.12.0 releases? Are there similar problems with algorithms beyond HQC? Worthwhile creating a separate issue to investigate?

I believe the noregress script was run for the latest release. Don't remember for 0.11.0.

With this many questions, I've got to take a step back and look at it from a more general level: OQS once wanted to indiscriminately report pros and cons of all PQC algorithms -- and that included performance. Could you agree that this seems like another area where project utility got reduced? Might it be worth while to discuss this at OQS TSC and/or PQCA TAC level? I have the nagging feeling that OQS / PQCA pursue too many, somewhat contradicting goals (and/or too few people contributing for the level of goals set), leading to fewer being done at an excellent level. As now a full one year passed since LinuxFoundation/PQCA took control of OQS, time to take stock/review/realign?

I think it makes sense to bring up the profiling project at the TSC and/or PQCA level, perhaps to see if we could get an external contributor (like @geedo0 for OpenSSH or @ajbozarth for demos) to revive/redo it.

@baentsch baentsch added the bug Something isn't working; high priority to fix label Jan 24, 2025
@baentsch
Copy link
Member

Are there similar problems with algorithms beyond HQC?

In the light of #2054 allow me to ask this question again: Is this an HQC-only problem or one affecting more algs? Is there something really wrong with common algorithms, the configs and/or copy_from_upstream?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working; high priority to fix
Projects
Status: Todo
Development

No branches or pull requests

3 participants