Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML-KEM: Bring encoding functions in line with the spec #160

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

marsella
Copy link
Contributor

@marsella marsella commented Oct 21, 2024

Addresses part of #144, but does not complete it.

This handles all the encode / decode parts of #144. It was getting a little hefty as-is so I'm breaking it into its own PR.

This was mainly a project in mucking around with types. The spec assumes that all arithmetic operations can be done on all integral types, but that's not true in Cryptol. For example, you can't multiply a 1-bit vector by 2, and you can't mod an integer mod q by 2. These things "just happen" in the spec, but in this implementation we have to manually convert types between bit vectors and integers mod, and sometimes we have to use more width-independent operations (like bit shifting instead of multiplying by 2^j).

The original plan for this PR was to combine the ByteEncode and ByteEncode12 functions into one (and similarly for decoding). However in order to do so I would've had to represent d-bit vectors as integers mod 2^d (e.g. using the Z type), and that was ugly, hard to understand, and slow to prove any properties about. So I kept them separate, as they were previously, and just changed the names to be more in line with the spec.

The other overloaded name in the spec is that the encode/decode functions use the same name to apply to either one or many inputs. I added a _Vec suffix to distinguish these cases.

This feels very cluttered to me with so many properties and conditional functions to handle just 2 algorithms in the spec. I welcome suggestions to reduce that mess.

- adds the bytedecode function
- adds an inversion property about decode + encode
- adds some properties about rewrites that I did for decoding
Also adds the relevant property
Also adds vector versions of encode/decode-12.
@marsella marsella changed the title 144 encoding ML-KEM: Bring encoding functions in line with the spec Oct 21, 2024
@marsella marsella marked this pull request as ready for review October 21, 2024 20:44
Comment on lines 336 to 337
ByteDecode12 : [32 * 12]Byte -> [256](Z q)
ByteDecode12 B = F' where
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this function is never actually used alone, but I don't feel like it's reasonable skip it and only make ByteDecode12_Vec.

*/
EncodeBytes' : {ell, c} (fin ell, ell > 0, fin c) => [c * 8][ell] -> [c * ell]Byte
EncodeBytes' = regroup
* Encode an array of `d`-bit integers into a byte array, for `d < 12`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: I was also thinking of making all these private, since they're not part of the public API of ML-KEM.

deletes two functions that were previously used in encoding, and
relocate one to the one place where it's used.
Copy link
Collaborator

@nlschimanski nlschimanski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EncodeBytes and DecodeBytes implementations look quite a bit different from the spec, but the explanations for why that's the case help build confidence. I especially appreciate proving each of the claims that were made (e.g., multiplying by 2^j is the same as left shift).

* This is used in some places where the `ByteDecode` function is required in
* the spec. It's a 3D version of `DecodeBytes'`.
* The subtract-and-divide algorithm applied to `a` in `ByteEncode` is the
* same as shifting right.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be helpful to add the step in the algorithm (5) where this comes up in the explanation.


/**
* This is used in some places where the `ByteEncode` function is required in
* the spec. It's a 3D version of `EncodeBytes'`.
* Encode a set of `k` vectors of integers mod `q` into a byte array.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is [FIPS-203] Section 2.4.8 relevant here like it is in ByteEncode_Vec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks.

* :prove CorrectnessEncodeBytes
* :prove mod2IsFinalBit`{d_u}
* :prove mod2IsFinalBit`{d_v}
* :prove mod2IsFinalBit`{12}
* ```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to add some text about why we care that mod 2 is the final bit as it relates to the EncodeBytes algorithm.

// Step 1.
b = BytesToBits B
// Steps 2-4. The `mod m` is implicit in the type because the `[d]` type
// always operates `mod 2^d`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I paused for a second here and had to look back at the spec. Maybe be explicit about how m=2^d?


/**
* Proof that the efficient decode function is the same as the spec version.
* Multiplying a value by `2^^j` is the same as bit-shifting it left by `j`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should 2^^j be 2^j in this line?

* ```repl
* :prove ByteEncodeInvertsByteDecode`{1}
* :prove ByteEncodeInvertsByteDecode`{d_u}
* :prove ByteEncodeInvertsByteDecode`{d_v}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does d_u and d_v come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an explanation here and everywhere I use these without comment in the doctests.

Copy link
Contributor

@mccleeary-galois mccleeary-galois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well, note that #166 applies here as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bring encoding and compression functions in ML-KEM up to gold standard
3 participants