Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add x4 batched SHAKE128 and SHAKE256 APIs #2247

Merged
merged 16 commits into from
Mar 12, 2025
Merged

Conversation

manastasova
Copy link
Contributor

@manastasova manastasova commented Mar 6, 2025

mlkem-native uses batched SHAKE to compute different XOF digests simultaneously I. e., every call to batched SHAKE function performs four independent algorithm execution (x4).

This PR introduces the required new shake SHAKE x4 APIs. Internally, they call 4 independent SHAKE functions (either the entire algorithm or a part of it, such as init, absorb, squeeze).

Issues:

Resolves #CryptoAlg-2959

Description of changes:

  • SHAKE128_Init_x4
  • SHAKE128_Absorb_once_x4
  • SHAKE128_Squeezeblocks_x4
  • SHAKE256_x4

Call-outs:

Currently, the APIs are named as mlkem-specific functions since there is no other use-case for them (absorbing once, squeezing entire blocks). This naming will allow to implement them optimally later on, skipping some unnecessary checks. However, since they are internally calling the existing aws-lc shake functions, they inherit all functionalities supported by the aws-lc shake functions. (incremental absorbs, incremental byte-wise squeezes).

The aws-lc integration of the x4 batched SHAKE APIs will allow to comply with the current mkem-native design. These functions will not allow a SIMD assembly keccak implementation to be added to aws-lc without further changes. However, they set the batched APIs and could be inlined in mlkem-native code.

Testing:

New SHAKETest_x4 test:
./crypto/crypto_test --gtest_filter="SHAKETest_x4.RandomMessages"

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

Batched Keccak APIs would be used by mlkem/mldsa where multiple SHAKE XOF functions are computed
@manastasova manastasova requested a review from a team as a code owner March 6, 2025 00:30
@codecov-commenter
Copy link

codecov-commenter commented Mar 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.04%. Comparing base (aa43a85) to head (1d0178f).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2247      +/-   ##
==========================================
+ Coverage   79.02%   79.04%   +0.01%     
==========================================
  Files         612      612              
  Lines      106588   106630      +42     
  Branches    15082    15098      +16     
==========================================
+ Hits        84232    84282      +50     
+ Misses      21703    21695       -8     
  Partials      653      653              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @manastasova!

Could we please document the failure conditions for those functions? The ML-KEM call-site does not expect errors from the SHAKE API and hence needs to argue why none of the failure conditions may occurr.

(This is also an issue with the pre-existing x1 API (see #2250) for which no documentation of failure conditions is given, but the description may be easier for the non-incremental case).

Also, do we need squeeze to accept an arbitrary length, or can it be restricted to multiples of the block size? The current API forces the internal maintenance of an offset pointer, which would not be necessary if only full blocks were squeezed.


int SHAKE128_Init_x4(KECCAK1600_CTX_x4 *ctx) {

int ok = (SHAKE_Init(&(*ctx)[0], SHAKE128_BLOCKSIZE) &&
Copy link
Contributor

@hanno-becker hanno-becker Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is intentional that later calls are skipped if earlier calls fail (because of lazy evaluation of &&), this may be worth documenting. Same for the other functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.
The batched function succeeds on success of all underlying functions. Thus, it won't be necessary to continue after fail condition is met. I added a comment here:

// It fails on the first |SHAKE_Init| function fail.

@@ -377,3 +377,48 @@ int SHAKE_Squeeze(uint8_t *md, KECCAK1600_CTX *ctx, size_t len) {
//FIPS_service_indicator_update_state();
return 1;
}

int SHAKE128_Init_x4(KECCAK1600_CTX_x4 *ctx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this function actually fail? If so, when?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment

// As part of MLKEM PQ algorithm: SHAKE128_Init_x4 always returns 1 since it
.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the function always succeeds, can we drop the return code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to keep it. It is, indeed, a function designed specifically for mlkem, nevertheless, nothing prevents from calling it in different use case scenario (if some). Wouldn't it be better to document the skipped return code check in mlkem-native, and argument it with the input values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to keep it. It is, indeed, a function designed specifically for mlkem, nevertheless, nothing prevents from calling it in different use case scenario (if some).

Yes, but my understanding is that this function will never fail, regardless of who calls it?

return ok;
}

int SHAKE128_Absorb_once_x4(KECCAK1600_CTX_x4 *ctx, const void *data0, const void *data1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what condition would this function fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented here

// As part of MLKEM PQ algorithm: SHAKE128_Absorb_once_x4 always returns 1 since it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the function always succeeds, can we drop the return code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: I think it is better to keep it. It is, indeed, a function designed specifically for mlkem, nevertheless, nothing prevents from calling it in different use case scenario (if some). Wouldn't it be better to document the skipped return code check in mlkem-native, and argument it with the input values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this is not a question about the caller/use case?

return ok;
}

int SHAKE128_Squeezeblocks_x4(uint8_t *md0, uint8_t *md1, uint8_t *md2, uint8_t *md3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the generality of an arbitrary len, or can we stick to a multiple of the block size for now? I seem to remember that ML-KEM only squeezes a block a time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the generality of an arbitary length is needed, the function should not be called Squeezeblocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHAKE128_Squeezeblocks_x4 aims at defining the new API needed by mlkem-native. Thus, it is defined as mlkem-specific function (only allows squeezes of entire blocks), while currently internally calling existing SHAKE sequential APIs (to split changes into incremental updates).
That is my understanding based on our previous discussions.
The same reasoning is applied to the SHAKE128_Absorb_once_x4 mlkem-specific API which inherits the generic functionality from the underlying existing SHAKE functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that the current implementation does not squeeze a number of blocks, but len is used as the number of bytes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want something like (but 4-fold)

static MLK_INLINE void mlk_shake128_squeezeblocks(uint8_t *output, size_t nblocks,
mlk_shake128ctx *state) {
// TODO: Document why this function does not fail in the context
// of the calls made by mlkem-native.
(void) SHAKE_Squeeze(output, state, nblocks * SHAKE128_RATE);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I just updated the code.

manastasova and others added 4 commits March 6, 2025 09:53
Add detailed description for the return valuesof SHA3 and SHAKE functions. Specify the x4 return values and fail condiditons to ease mlkem transition to aws-lc batched shake implementation (based on mlkem input values). This is needed since mlkem does not check on shake return codes.
ASSERT_TRUE(SHAKE128_Absorb_once_x4(&ctx, random_in[0], random_in[1], random_in[2], random_in[3],
RAND_BYTES_x4));
ASSERT_TRUE(SHAKE128_Squeezeblocks_x4(digest_x4[0], digest_x4[1], digest_x4[2], digest_x4[3],
&ctx, RAND_OUT_BYTES_BLCKS * SHAKE128_BLOCKSIZE));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squeezeblocks should take the number of blocks, not the number of bytes, and the multiplication with SHAKE128_BLOCKSIZE to get the number of bytes should be in the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

manastasova and others added 2 commits March 10, 2025 09:36
Batched block squeeze function takes number of blocks and internally requests the equvalent number of bytes from the underlying SHAKE_Squeeze generic function
justsmth
justsmth previously approved these changes Mar 10, 2025
Comment on lines 112 to 113
// FIPS202 APIs manage internal input/output buffer on top of Keccak1600 API layer
// FIPS202_Reset zero's |ctx| fields.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP: It seems that the "FIPS202 APIs manage ..." comment might be better placed inside a multi-line /* ... */ comment as it relates to several functions below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just updated.

Comment on lines +476 to +477
OPENSSL_EXPORT int SHAKE128_Absorb_once_x4(KECCAK1600_CTX_x4 *ctx, const void *data0, const void *data1,
const void *data2, const void *data3, size_t len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP: can you expand and clarify this absorbs |len| bytes from data1-4 and they all have to have have the same size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the info.

Comment on lines +487 to +488
OPENSSL_EXPORT int SHAKE128_Squeezeblocks_x4(uint8_t *md0, uint8_t *md1, uint8_t *md2, uint8_t *md3,
KECCAK1600_CTX_x4 *ctx, size_t blks);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you changing the parameter order from SHAKE128_Absorb_once_x4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the SHAKE/SHA3 {Init, Update/Absorb, Final/Squeeze} pattern:

int SHAKE_Absorb(KECCAK1600_CTX *ctx, const void *data,

int SHAKE_Squeeze(uint8_t *md, KECCAK1600_CTX *ctx, size_t len);

But I can change them all if it looks cleaner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that's fine.

Comment on lines 464 to 465
// As part of MLKEM PQ algorithm: SHAKE128_Init_x4 always returns 1 since it
// is called with a valid |ctx|.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to have here, we don't know how the ML-KEM code is going to call this function or if this is even true. This feels like this comment belongs in the ML-KEM code. Instead what we could put here is something like "If called with valid inputs this function never fails" which is true of most AWS-LC functions, the only case where valid inputs would cause AWS-LC to return a failure is if malloc failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Just fixed it. Thanks.

int ok = (SHAKE_Init(&(*ctx)[0], SHAKE128_BLOCKSIZE) &&
SHAKE_Init(&(*ctx)[1], SHAKE128_BLOCKSIZE) &&
SHAKE_Init(&(*ctx)[2], SHAKE128_BLOCKSIZE) &&
SHAKE_Init(&(*ctx)[3], SHAKE128_BLOCKSIZE));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to take the address of the de-rerferenced pointer?

Suggested change
SHAKE_Init(&(*ctx)[3], SHAKE128_BLOCKSIZE));
SHAKE_Init(ctx[3], SHAKE128_BLOCKSIZE));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ctx is the pointer to (not actually the) array of structs:
typedef KECCAK1600_CTX KECCAK1600_CTX_x4[4];
...
KECCAK1600_CTX_x4 ctx;
ASSERT_TRUE(SHAKE128_Init_x4(**&ctx**));
We need to dereference to get back to the array of stucts, then to access some index, e.g., struct [3]; later we need the address of that struct for the next function.
int SHAKE_Init(KECCAK1600_CTX *ctx, size_t block_size);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's subtle

// SHA3_Init initialises |ctx| fields through |FIPS202_Init| and
// returns 1 on success and 0 on failure. SHA3_Init fails if
// |ctx| is nullptr or |bitlen| is inproper for SHA3 (any security
// level) block size length. Otherwise, it propagates the |FIPS202_Init|
Copy link
Contributor

@hanno-becker hanno-becker Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a caller of this function, the details of which functions are called underneath are not important -- but the caller needs to understand failure conditions. Here, one could recursively lookup the documentation of FIPS202_Init to understand its failure conditions, but it's a bit cumbersome. (When) Does FIPS202_Init fail?

Is there a way one can express success/failure conditions at this level without resorting to details of which functions are being called?

Similar questions below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the description

* SHA3 context must go through the flow: (a) Init, (b) Update [multiple times],
following the suggested documentation format. Thank you.

// of equal length of |len| bytes through four consecutive calls to |SHAKE_Absorb|
// and returns 1 on success and 0 on failure. SHAKE128_Absorb_once_x4 succeeds when
// all four |SHAKE_Absorb| functions succeed. It fails on the first |SHAKE_Absorb|
// function fail, however, if called with valid inputs, this function never fails.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear what 'valid' means here. You want the input to come fresh out of SHAKE128_Init_x4, or?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Valid input' -> 'valid input pointer'.
However, I updated this following the suggested documentation format. Thank you.

Copy link
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @manastasova for the further work!

I think there is still need for improvement in the documentation: We should not re-state implementation details of which lower level functions are called (this will change), but provide a high-level contract to the caller indicating when to expect success/failure. This may not always be possible, but in this instance, I think it may.

Concretely, could one say something like: SHAKE context must go through the flow (a) Init, (b) Absorb [many times], (c) Squeeze [many times], (d) Final. If this call-discipline is maintained and the pointers passed to the functions are valid and of respective sizes, there will be no error.

Similarly, for SHAKEx4, it would be: If you go through the flow (a) Init, (b) Absorb_once [once], (c) Squeezeblocks [many times], (d) Final -- and pointers are always valid, then there will be no error.

Can you clarify if the above is true? If so, I think this would be a good high-level documentation that callers such as ML-KEM/ML-DSA can reason against when ignoring return values, without being concerned with which lower level functions are being called, and what their failure conditions may be.

@manastasova
Copy link
Contributor Author

Thank you @manastasova for the further work!

I think there is still need for improvement in the documentation: We should not re-state implementation details of which lower level functions are called (this will change), but provide a high-level contract to the caller indicating when to expect success/failure. This may not always be possible, but in this instance, I think it may.

Concretely, could one say something like: SHAKE context must go through the flow (a) Init, (b) Absorb [many times], (c) Squeeze [many times], (d) Final. If this call-discipline is maintained and the pointers passed to the functions are valid and of respective sizes, there will be no error.

Similarly, for SHAKEx4, it would be: If you go through the flow (a) Init, (b) Absorb_once [once], (c) Squeezeblocks [many times], (d) Final -- and pointers are always valid, then there will be no error.

Can you clarify if the above is true? If so, I think this would be a good high-level documentation that callers such as ML-KEM/ML-DSA can reason against when ignoring return values, without being concerned with which lower level functions are being called, and what their failure conditions may be.

Thanks @hanno-becker! I updated the comments based on your suggestions.

andrewhop
andrewhop previously approved these changes Mar 11, 2025
Comment on lines +487 to +488
OPENSSL_EXPORT int SHAKE128_Squeezeblocks_x4(uint8_t *md0, uint8_t *md1, uint8_t *md2, uint8_t *md3,
KECCAK1600_CTX_x4 *ctx, size_t blks);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that's fine.

int ok = (SHAKE_Init(&(*ctx)[0], SHAKE128_BLOCKSIZE) &&
SHAKE_Init(&(*ctx)[1], SHAKE128_BLOCKSIZE) &&
SHAKE_Init(&(*ctx)[2], SHAKE128_BLOCKSIZE) &&
SHAKE_Init(&(*ctx)[3], SHAKE128_BLOCKSIZE));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's subtle

* of SHA3/SHAKE API layer
*
* SHA3/SHAKE single-shot functions never fail when the later call-discipline is
* adhered to: (a) the pointers passed to the functions are valid, is satisfied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence structure seems off here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.


// SHAKE128_Init_x4 is a batched API that operates on four independent
// Keccak bitstates. It initialises all four |ctx| fields through four
// consecutive calls to |SHAKE_Init| and returns 1 on success and 0 on failure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the mention of the implementation details here and elsewhere? Once we start implementing fast batched x4, we won't merely fall back to x1 anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to follow the general pattern for function description in the file/library. I was thinking once we start implementing it, we could update these comments.
I just fixed it. Thank you.

@andrewhop andrewhop merged commit 1cf1ee2 into aws:main Mar 12, 2025
102 of 107 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants