-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add x4 batched SHAKE128 and SHAKE256 APIs #2247
Conversation
Batched Keccak APIs would be used by mlkem/mldsa where multiple SHAKE XOF functions are computed
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2247 +/- ##
==========================================
+ Coverage 79.02% 79.04% +0.01%
==========================================
Files 612 612
Lines 106588 106630 +42
Branches 15082 15098 +16
==========================================
+ Hits 84232 84282 +50
+ Misses 21703 21695 -8
Partials 653 653 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @manastasova!
Could we please document the failure conditions for those functions? The ML-KEM call-site does not expect errors from the SHAKE API and hence needs to argue why none of the failure conditions may occurr.
(This is also an issue with the pre-existing x1 API (see #2250) for which no documentation of failure conditions is given, but the description may be easier for the non-incremental case).
Also, do we need squeeze to accept an arbitrary length, or can it be restricted to multiples of the block size? The current API forces the internal maintenance of an offset pointer, which would not be necessary if only full blocks were squeezed.
|
||
int SHAKE128_Init_x4(KECCAK1600_CTX_x4 *ctx) { | ||
|
||
int ok = (SHAKE_Init(&(*ctx)[0], SHAKE128_BLOCKSIZE) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is intentional that later calls are skipped if earlier calls fail (because of lazy evaluation of &&
), this may be worth documenting. Same for the other functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
The batched function succeeds on success of all underlying functions. Thus, it won't be necessary to continue after fail condition is met. I added a comment here:
aws-lc/crypto/fipsmodule/sha/internal.h
Line 463 in 4fca7ae
// It fails on the first |SHAKE_Init| function fail. |
@@ -377,3 +377,48 @@ int SHAKE_Squeeze(uint8_t *md, KECCAK1600_CTX *ctx, size_t len) { | |||
//FIPS_service_indicator_update_state(); | |||
return 1; | |||
} | |||
|
|||
int SHAKE128_Init_x4(KECCAK1600_CTX_x4 *ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this function actually fail? If so, when?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment
aws-lc/crypto/fipsmodule/sha/internal.h
Line 464 in 4fca7ae
// As part of MLKEM PQ algorithm: SHAKE128_Init_x4 always returns 1 since it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the function always succeeds, can we drop the return code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to keep it. It is, indeed, a function designed specifically for mlkem, nevertheless, nothing prevents from calling it in different use case scenario (if some). Wouldn't it be better to document the skipped return code check in mlkem-native, and argument it with the input values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to keep it. It is, indeed, a function designed specifically for mlkem, nevertheless, nothing prevents from calling it in different use case scenario (if some).
Yes, but my understanding is that this function will never fail, regardless of who calls it?
return ok; | ||
} | ||
|
||
int SHAKE128_Absorb_once_x4(KECCAK1600_CTX_x4 *ctx, const void *data0, const void *data1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under what condition would this function fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented here
aws-lc/crypto/fipsmodule/sha/internal.h
Line 473 in 4fca7ae
// As part of MLKEM PQ algorithm: SHAKE128_Absorb_once_x4 always returns 1 since it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the function always succeeds, can we drop the return code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above: I think it is better to keep it. It is, indeed, a function designed specifically for mlkem, nevertheless, nothing prevents from calling it in different use case scenario (if some). Wouldn't it be better to document the skipped return code check in mlkem-native, and argument it with the input values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this is not a question about the caller/use case?
return ok; | ||
} | ||
|
||
int SHAKE128_Squeezeblocks_x4(uint8_t *md0, uint8_t *md1, uint8_t *md2, uint8_t *md3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the generality of an arbitrary len
, or can we stick to a multiple of the block size for now? I seem to remember that ML-KEM only squeezes a block a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the generality of an arbitary length is needed, the function should not be called Squeezeblocks
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHAKE128_Squeezeblocks_x4 aims at defining the new API needed by mlkem-native. Thus, it is defined as mlkem-specific function (only allows squeezes of entire blocks), while currently internally calling existing SHAKE sequential APIs (to split changes into incremental updates).
That is my understanding based on our previous discussions.
The same reasoning is applied to the SHAKE128_Absorb_once_x4 mlkem-specific API which inherits the generic functionality from the underlying existing SHAKE functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that the current implementation does not squeeze a number of blocks, but len
is used as the number of bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want something like (but 4-fold)
aws-lc/crypto/fipsmodule/ml_kem/fips202_glue.h
Lines 36 to 41 in 09ad19e
static MLK_INLINE void mlk_shake128_squeezeblocks(uint8_t *output, size_t nblocks, | |
mlk_shake128ctx *state) { | |
// TODO: Document why this function does not fail in the context | |
// of the calls made by mlkem-native. | |
(void) SHAKE_Squeeze(output, state, nblocks * SHAKE128_RATE); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I just updated the code.
Add detailed description for the return valuesof SHA3 and SHAKE functions. Specify the x4 return values and fail condiditons to ease mlkem transition to aws-lc batched shake implementation (based on mlkem input values). This is needed since mlkem does not check on shake return codes.
crypto/fipsmodule/sha/sha3_test.cc
Outdated
ASSERT_TRUE(SHAKE128_Absorb_once_x4(&ctx, random_in[0], random_in[1], random_in[2], random_in[3], | ||
RAND_BYTES_x4)); | ||
ASSERT_TRUE(SHAKE128_Squeezeblocks_x4(digest_x4[0], digest_x4[1], digest_x4[2], digest_x4[3], | ||
&ctx, RAND_OUT_BYTES_BLCKS * SHAKE128_BLOCKSIZE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Squeezeblocks
should take the number of blocks, not the number of bytes, and the multiplication with SHAKE128_BLOCKSIZE
to get the number of bytes should be in the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Batched block squeeze function takes number of blocks and internally requests the equvalent number of bytes from the underlying SHAKE_Squeeze generic function
crypto/fipsmodule/sha/sha3.c
Outdated
// FIPS202 APIs manage internal input/output buffer on top of Keccak1600 API layer | ||
// FIPS202_Reset zero's |ctx| fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NP: It seems that the "FIPS202 APIs manage ..." comment might be better placed inside a multi-line /* ... */
comment as it relates to several functions below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just updated.
OPENSSL_EXPORT int SHAKE128_Absorb_once_x4(KECCAK1600_CTX_x4 *ctx, const void *data0, const void *data1, | ||
const void *data2, const void *data3, size_t len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NP: can you expand and clarify this absorbs |len| bytes from data1-4 and they all have to have have the same size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the info.
OPENSSL_EXPORT int SHAKE128_Squeezeblocks_x4(uint8_t *md0, uint8_t *md1, uint8_t *md2, uint8_t *md3, | ||
KECCAK1600_CTX_x4 *ctx, size_t blks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you changing the parameter order from SHAKE128_Absorb_once_x4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the SHAKE/SHA3 {Init, Update/Absorb, Final/Squeeze} pattern:
aws-lc/crypto/fipsmodule/sha/internal.h
Line 441 in 4fca7ae
int SHAKE_Absorb(KECCAK1600_CTX *ctx, const void *data, |
aws-lc/crypto/fipsmodule/sha/internal.h
Line 450 in 4fca7ae
int SHAKE_Squeeze(uint8_t *md, KECCAK1600_CTX *ctx, size_t len); |
But I can change them all if it looks cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, that's fine.
crypto/fipsmodule/sha/internal.h
Outdated
// As part of MLKEM PQ algorithm: SHAKE128_Init_x4 always returns 1 since it | ||
// is called with a valid |ctx|. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to have here, we don't know how the ML-KEM code is going to call this function or if this is even true. This feels like this comment belongs in the ML-KEM code. Instead what we could put here is something like "If called with valid inputs this function never fails" which is true of most AWS-LC functions, the only case where valid inputs would cause AWS-LC to return a failure is if malloc failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Just fixed it. Thanks.
int ok = (SHAKE_Init(&(*ctx)[0], SHAKE128_BLOCKSIZE) && | ||
SHAKE_Init(&(*ctx)[1], SHAKE128_BLOCKSIZE) && | ||
SHAKE_Init(&(*ctx)[2], SHAKE128_BLOCKSIZE) && | ||
SHAKE_Init(&(*ctx)[3], SHAKE128_BLOCKSIZE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to take the address of the de-rerferenced pointer?
SHAKE_Init(&(*ctx)[3], SHAKE128_BLOCKSIZE)); | |
SHAKE_Init(ctx[3], SHAKE128_BLOCKSIZE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since ctx
is the pointer to (not actually the) array of structs:
typedef KECCAK1600_CTX KECCAK1600_CTX_x4[4];
...
KECCAK1600_CTX_x4 ctx;
ASSERT_TRUE(SHAKE128_Init_x4(**&ctx**));
We need to dereference to get back to the array of stucts, then to access some index, e.g., struct [3]; later we need the address of that struct for the next function.
int SHAKE_Init(KECCAK1600_CTX *ctx, size_t block_size);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's subtle
Co-authored-by: Andrew Hopkins <andhop@amazon.com>
crypto/fipsmodule/sha/internal.h
Outdated
// SHA3_Init initialises |ctx| fields through |FIPS202_Init| and | ||
// returns 1 on success and 0 on failure. SHA3_Init fails if | ||
// |ctx| is nullptr or |bitlen| is inproper for SHA3 (any security | ||
// level) block size length. Otherwise, it propagates the |FIPS202_Init| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a caller of this function, the details of which functions are called underneath are not important -- but the caller needs to understand failure conditions. Here, one could recursively lookup the documentation of FIPS202_Init
to understand its failure conditions, but it's a bit cumbersome. (When) Does FIPS202_Init
fail?
Is there a way one can express success/failure conditions at this level without resorting to details of which functions are being called?
Similar questions below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the description
aws-lc/crypto/fipsmodule/sha/internal.h
Line 422 in 026f32e
* SHA3 context must go through the flow: (a) Init, (b) Update [multiple times], |
crypto/fipsmodule/sha/internal.h
Outdated
// of equal length of |len| bytes through four consecutive calls to |SHAKE_Absorb| | ||
// and returns 1 on success and 0 on failure. SHAKE128_Absorb_once_x4 succeeds when | ||
// all four |SHAKE_Absorb| functions succeed. It fails on the first |SHAKE_Absorb| | ||
// function fail, however, if called with valid inputs, this function never fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear what 'valid' means here. You want the input to come fresh out of SHAKE128_Init_x4
, or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Valid input' -> 'valid input pointer'.
However, I updated this following the suggested documentation format. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @manastasova for the further work!
I think there is still need for improvement in the documentation: We should not re-state implementation details of which lower level functions are called (this will change), but provide a high-level contract to the caller indicating when to expect success/failure. This may not always be possible, but in this instance, I think it may.
Concretely, could one say something like: SHAKE context must go through the flow (a) Init, (b) Absorb [many times], (c) Squeeze [many times], (d) Final. If this call-discipline is maintained and the pointers passed to the functions are valid and of respective sizes, there will be no error.
Similarly, for SHAKEx4, it would be: If you go through the flow (a) Init, (b) Absorb_once [once], (c) Squeezeblocks [many times], (d) Final -- and pointers are always valid, then there will be no error.
Can you clarify if the above is true? If so, I think this would be a good high-level documentation that callers such as ML-KEM/ML-DSA can reason against when ignoring return values, without being concerned with which lower level functions are being called, and what their failure conditions may be.
Thanks @hanno-becker! I updated the comments based on your suggestions. |
OPENSSL_EXPORT int SHAKE128_Squeezeblocks_x4(uint8_t *md0, uint8_t *md1, uint8_t *md2, uint8_t *md3, | ||
KECCAK1600_CTX_x4 *ctx, size_t blks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, that's fine.
int ok = (SHAKE_Init(&(*ctx)[0], SHAKE128_BLOCKSIZE) && | ||
SHAKE_Init(&(*ctx)[1], SHAKE128_BLOCKSIZE) && | ||
SHAKE_Init(&(*ctx)[2], SHAKE128_BLOCKSIZE) && | ||
SHAKE_Init(&(*ctx)[3], SHAKE128_BLOCKSIZE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's subtle
crypto/fipsmodule/sha/internal.h
Outdated
* of SHA3/SHAKE API layer | ||
* | ||
* SHA3/SHAKE single-shot functions never fail when the later call-discipline is | ||
* adhered to: (a) the pointers passed to the functions are valid, is satisfied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentence structure seems off here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
crypto/fipsmodule/sha/internal.h
Outdated
|
||
// SHAKE128_Init_x4 is a batched API that operates on four independent | ||
// Keccak bitstates. It initialises all four |ctx| fields through four | ||
// consecutive calls to |SHAKE_Init| and returns 1 on success and 0 on failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the mention of the implementation details here and elsewhere? Once we start implementing fast batched x4, we won't merely fall back to x1 anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to follow the general pattern for function description in the file/library. I was thinking once we start implementing it, we could update these comments.
I just fixed it. Thank you.
mlkem-native uses batched SHAKE to compute different XOF digests simultaneously I. e., every call to batched SHAKE function performs four independent algorithm execution (x4).
This PR introduces the required new shake SHAKE x4 APIs. Internally, they call 4 independent SHAKE functions (either the entire algorithm or a part of it, such as init, absorb, squeeze).
Issues:
Resolves #CryptoAlg-2959
Description of changes:
Call-outs:
Currently, the APIs are named as mlkem-specific functions since there is no other use-case for them (absorbing once, squeezing entire blocks). This naming will allow to implement them optimally later on, skipping some unnecessary checks. However, since they are internally calling the existing aws-lc shake functions, they inherit all functionalities supported by the aws-lc shake functions. (incremental absorbs, incremental byte-wise squeezes).
The aws-lc integration of the x4 batched SHAKE APIs will allow to comply with the current mkem-native design. These functions will not allow a SIMD assembly keccak implementation to be added to aws-lc without further changes. However, they set the batched APIs and could be inlined in mlkem-native code.
Testing:
New SHAKETest_x4 test:
./crypto/crypto_test --gtest_filter="SHAKETest_x4.RandomMessages"
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.