Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PQC: ML-KEM #3893

Merged
merged 9 commits into from
Oct 15, 2024
Merged

PQC: ML-KEM #3893

merged 9 commits into from
Oct 15, 2024

Conversation

reneme
Copy link
Collaborator

@reneme reneme commented Jan 18, 2024

Pull Request Dependencies

Descriptions

This builds on the existing Kyber implementation (and the flexibility added in #3387 and #4024) to provide an implementation of ML-KEM as specified in FIPS 203. The OIDs for ML-KEM are also updated to the official definition.

While reviewing the implementation for consistency with the final FIPS 203 we did some minor cleanups/refactorings. Those are currently intertwined in this PR, but could be moved into an independent commit or PR.

Note that each flavor of Kyber/ML-KEM can be disabled at compile time via the modules kyber, kyber_90s, ml_kem.

Integration in FFI as well as into the Python bindings is also included and deprecates some of the kyber-specific FFI functions that are now replaced by algorithm-agnostic key encoding functions.

TODOs

  • Currently, we use the KATs from Kris Kwiatkowski which were verified against the PQShield implementations.
    We should probably also look into the official ACVP tests: #1, #2
  • Use the private seeds rather than expanded structures for storing the ML-KEM private key (cc @FiloSottile)
    We already have private_key_bits() and raw_private_key_bits(), the latter could return the seed (if it is available). We'll somehow have to distinguish serialized from non-serialized keys for loading: simply by size? 🤔
  • Look into the ACVP decaps KAT (currently disabled due to a lack of the d||z-type private key format)
    Either remove the tests entirely, or monitor whether an updated test vector is being released.
  • Go through FIPS 203 again and see if all validations are implemented (add tests)
  • Decide on the API (see below) (as with the other FIPS-PRs, let's not bloat this even more and build an API bridge as a follow up)
  • Squash (or at least clean up) the commit history
  • Remove ML-KEM specific key encoding functions from FFI/Python (once FFI/Python: Expose raw public/private key encoding functions #4368 is merged)

Open Questions re: API

Currently, there's no user-facing API that explicitly states "ML-KEM". Instead, users are expected to use the Kyber-classes with the ML-KEM parameter set. Technically, that should be fine, but I believe this will confuse people in a few years, when they didn't follow the PQC competition at the time and don't associate Kyber with ML-KEM anymore.

I see three options:

  1. Do nothing, apart from guidance in documentation
  2. Define aliases in a dedicated ml_kem.h header (e.g. by flat-out inheriting from the Kyber classes)
  3. Rename all "Kyber" classes to "ML_KEM" (so that ML-KEM is the first-class API) and define aliases for Kyber (as suggested in 2)

Also, the FFI currently provides kyber-specific APIs. They should just work (tm) with ML-KEM key material as well, but like above, the names might become confusing for some users in the near future.

Caveats

This pull request initially implemented the Initial Public Draft version of FIPS 203. It was never merged and eventually got updated to implement ML-KEM as specified in the final standard. As a result, no released version of Botan will implement ML-KEM-IPD.

Note that it is an addition! Kyber-R3 (and 90s mode) continue to work as before.

Example

#include <botan/pk_algs.h>
#include <botan/pubkey.h>
#include <botan/system_rng.h>

#include <cassert>
#include <iostream>

using namespace Botan;

int main() {
   // Alice
   System_RNG rng;
   auto sk = create_private_key("ML-KEM", rng, "ML-KEM-768");
   if(!sk) {
      std::cerr << "Failed to create private key\n";
      return 1;
   }
   auto pk = sk->public_key();

   // Bob
   const auto [ciphertext, shared_key_bob] = KEM_Encapsulation::destructure(PK_KEM_Encryptor(*pk, "Raw").encrypt(rng));

   // Alice
   auto shared_key_alice = PK_KEM_Decryptor(*sk, rng, "Raw").decrypt(ciphertext);

   // Test
   assert(shared_key_alice == shared_key_bob);
   std::cout << "Alice and Bob can now communicate securely!\n";
}

Closes #4346

@reneme reneme added the enhancement Enhancement or new feature label Jan 18, 2024
@reneme reneme added this to the Botan 3.4.0 milestone Jan 18, 2024
@reneme reneme self-assigned this Jan 18, 2024
@reneme reneme mentioned this pull request Jan 18, 2024
3 tasks
@reneme reneme force-pushed the feature/mlkem_ipd branch 2 times, most recently from 20da875 to 8c19232 Compare January 18, 2024 15:14
@coveralls
Copy link

coveralls commented Jan 18, 2024

Coverage Status

coverage: 92.001% (-0.005%) from 92.006%
when pulling 2dcd938 on Rohde-Schwarz:feature/mlkem_ipd
into 13c7e5f on randombit:master.

@falko-strenzke
Copy link
Collaborator

@reneme, that's great news. I highly appreciate that the different versions can be controlled in the build and invoked at run-time independently.
As to whether the ipd version should be merged into the master branch: I clearly tend say it should, for the following reasons:

  • Keeping it in a separate branch leaves the work of keeping the code up to date with the master branch either to a maintainer of the user of the branch.
  • The "kyber" version is already in master and released. So the addition of ML-KEM-idp does not introduce really a new problem regarding the maintenance of pre-final ML-KEM versions.
  • I think the decision whether and when to deprecate them and finally drop the pre-final versions of the PQC algorithms should best be taken after the final versions have been published. Considering the case of "Keccak", where the pre-final version is still contained in Botan – due to real world implementations still using it – it seems there might be a chance that the pre-final versions of the PQC algorithms also remain in the field. If that is not the case, they can be dropped after an adequate grace period.

@reneme

This comment was marked as outdated.

@reneme

This comment was marked as outdated.

@reneme

This comment was marked as outdated.

@coveralls
Copy link

coveralls commented Jul 18, 2024

Coverage Status

coverage: 91.131% (+0.006%) from 91.125%
when pulling 33000ab on Rohde-Schwarz:feature/mlkem_ipd
into 4996790 on randombit:master.

@FAlbertDev FAlbertDev mentioned this pull request Aug 6, 2024
7 tasks
@randombit
Copy link
Owner

https://nvlpubs.nist.gov/nistpubs/fips/nist.fips.203.pdf

@reneme
Copy link
Collaborator Author

reneme commented Aug 13, 2024

Judging from appendix C.2, the only relevant addition is the domain separation in the seed expansion in Algorithm 13 (KeyGen). Basically, they just append a single byte k = {2,3,4} (depending on the parameter set) to the G() hash function. That's a minimal change.

@reneme reneme mentioned this pull request Aug 13, 2024
18 tasks
@falko-strenzke

This comment was marked as resolved.

@reneme

This comment was marked as resolved.

@reneme

This comment was marked as outdated.

@reneme reneme mentioned this pull request Aug 15, 2024
@reneme
Copy link
Collaborator Author

reneme commented Sep 23, 2024

Work on it was finished on Friday, it is now awaiting review. So feature-wise it's ready to go.

Feel free to pull it into a local build if you want to start prototyping with ML-KEM. Any feedback is appreciated.

@mouse07410
Copy link
Contributor

Work on it was finished on Friday, it is now awaiting review. So feature-wise it's ready to go.

Feel free to pull it into a local build if you want to start prototyping with ML-KEM. Any feedback is appreciated.

My short feedback: it works fine. Thank you!

@tobiasbrunner
Copy link
Contributor

While trying to integrate these changes into our Botan plugin I noticed (see strongswan/strongswan#2228) that these changes allocate the seed for the key pair in a different order, first d and then z instead of the other way around, which is what FIPS-203 describes and other implementations do (and e.g. the referenced KAT vectors do as well). So it's quite awkward to test this implementation using the API. Is there a specific reason this order was chosen?

@reneme
Copy link
Collaborator Author

reneme commented Oct 8, 2024

[...] these changes allocate the seed for the key pair in a different order, first d and then z instead of the other way around, which is what FIPS-203 describes and other implementations

For reference: I added my view on this here. TL;DR: In my opinion, downstream users should not pose assumptions on the RNG invocation pattern of any implementation of ML-KEM. For testing with KAT vectors, they should prepare a concatenation of d || z taken from the KAT and pass those 64 bytes into the "private key loading procedure". For Botan that would be: Botan::Kyber_PrivateKey(d_and_z, Botan::KyberMode("ML-KEM-512")).

Is there a specific reason this order was chosen?

Not really. We followed the order of operations as stated in FIPS 203, Algorithm 19.

@tobiasbrunner
Copy link
Contributor

For testing with KAT vectors, they should prepare a concatenation of d || z taken from the KAT and pass those 64 bytes into the "private key loading procedure". For Botan that would be: Botan::Kyber_PrivateKey(d_and_z, Botan::KyberMode("ML-KEM-512")).

OK, thanks. That sounds like a way to handle this. I've successfully used botan_privkey_load_ml_kem() with my original test vectors that assume a 64-byte pull from the DRBG.

Not really. We followed the order of operations as stated in FIPS 203, Algorithm 19.

Sorry about that. I read the order in FIPS-203 wrong for some reason, it's definitely d before z (also for the inputs to Algorithm 16). So it's more the referenced KAT vectors, which allocate z before d from the DRBG, that made this awkward.

src/lib/ffi/ffi_pkey_algs.cpp Outdated Show resolved Hide resolved
@reneme
Copy link
Collaborator Author

reneme commented Oct 14, 2024

Rebased after #4367 and #4372 caused conflicts.

@reneme reneme force-pushed the feature/mlkem_ipd branch 2 times, most recently from adbceba to 22a97fa Compare October 14, 2024 17:01
reneme and others added 7 commits October 15, 2024 09:11
This adapts the existing implementation of Kyber to accomodate the
differences between Kyber 3.01 and the final FIPS 203 standard. Kyber
(and Kyber 90s) remain supported.

All references to the specification were updated to refer into the
final FIPS 203. Also, this includes a number of refactorings and
improvements in the shared code between Kyber and ML-KEM.

Co-Authored-By: Fabian Albert <fabian.albert@rohde-schwarz.com>
Co-Authored-By: Amos Treiber <amos.treiber@rohde-schwarz.com>
This adds 'official' KATs from the ACVP-Server provided by NIST.
Also included are some python scripts to illustrate the pre-processing
necessary to update those KAT files.
The kyber-specific functions remain supported but work slightly
different, so that dedicated functions for ML-KEM became necessary.
@reneme reneme merged commit 7dd5905 into randombit:master Oct 15, 2024
38 checks passed
@reneme reneme deleted the feature/mlkem_ipd branch October 15, 2024 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to access ML-KEM?
9 participants