Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHA3 and SHAKE - New API Design #2098

Merged
merged 78 commits into from
Feb 5, 2025
Merged

Conversation

manastasova
Copy link
Contributor

@manastasova manastasova commented Jan 7, 2025

Issues:

Resolves #CryptoAlg-2810

Description of changes:

AWS-LC supports SHA3 and SHAKE algorithms though low level SHA3_Init, SHA3_Update, SHA3_Final and SHAKE_init, SHAKE_Final APIs. Currently, there are two issues with the implementation and usage of SHA3 and SHAKE:

  • There is no support for SHAKE_Update function. SHAKE is implemented by calling SHAKE_Init, SHA3_Update and SHAKE_Final.
  • SHAKE_Final allows multiple consecutive calls to enable incremental XOF output generation.

This PR addresses both of them as follows:

  • Introduce new API layers - FIPS202, SHA3 and SHAKE.

    • Keccak1600 layer (Keccak1600_Squeeze/Absorb Layer (rename) #2097) implements KeccakF1600 Absorb and Squeeze functions; Keccak1600 layer does not manage internal input/output buffers.
    • FIPS202 layer implements Reset, Init, Update, and Finalize functionalities; FIPS202 layer manages the internal input/output buffers, allowing incremental requests (not necessarily multiple of block size) to Update (Absorb) and Squeeze for input/output processing. (Other functionalities, such as zero-ing of bitstate, block size checks, etc. are also handled by FIPS202 API layer).
      • FIPS202 layer implements all common behavior between SHA3 and SHAKE algorithms.
      • FIPS202 layer checks/updates the |ctx->state| flag when handling a common behavior between SHA3 and SHAKE algorithms. |ctx->state| is updated in the higher level SHA3_ SHAKE_ API layer when the behavior of both algorithms diverges (SHAKE can allow incremental squeezes).
    • SHA3 layer implements Init, Update, and Final functionalities; SHA3 layer only implements SHA3 algorithm, thus, offers a single-call SHA3_Final function. SHA3_Final will update internal |ctx->state| flag to prevent any sequential calls.
    • SHAKE layer implements XOF SHAKE algorithm, therefore, offers Init, Absorb, Squeeze, and Final functionalities;
      • SHAKE layer implements Init, and Absorb, Squeeze with incremental call support for absorb (byte-wise) and squeeze (block-wise).
      • SHAKE layer implements a single-call SHAKE_Final function that generates an arbitrary length output and finalizes SHAKE. Incremental XOF output generation is handled by |SHAKE_Squeeze|. |SHAKE_Squeeze| can be called multiple times. SHAKE_Final should be called only once.
  • KECCAK600_CTX struct updates:

    • Remove |padded| field
    • Introduce |state| field
      • |state| can be |KECCAK1600_STATE_ABSORB|, |KECCAK1600_STATE_SQUEEZE|, |KECCAK1600_STATE_FINAL|
      • |KECCAK1600_STATE_ABSORB| - allows incremental absorbs until the state is changed
      • |KECCAK1600_STATE_SQUEEZE| - allows incremental squeezes for |SHAKE_Squeeze|
      • |KECCAK1600_STATE_Final| - prevents from incremental squeezes via |SHAKE_Final| and prevents from consecutive calls to |SHA3_Final| (Final functions are single-shot functions).

SHA3 vs SHAKE algorithms (APIs usage):

  • SHA3 digest generation: SHA3_Init; SHA3_Update; SHA3_Final;
  • SHAKE (single-shot-output) output generation: SHAKE_Init; SHAKE_Absorb; SHAKE_Final;
  • SHAKE (incremental) output generation: SHAKE_Init; SHAKE_Absorb; SHAKE_Squeeze+;

Call-outs:

Service indicator is updated:

  • Inside SHA3 and SHAKE single shot APIs (as previously in AWS-LC);
  • Inside SHA3_Final (as previously in AWS-LC);
  • Inside SHAKE_Final (Single-Shot XOF Final output generation as previously in AWS-LC);
  • Inside SHAKE_Squeeze (Streaming XOF Squeezes output generation updates the service indicator after each extendable output update);

All other algorithms that use SHA3/SHAKE APIs are updated:

  • ML-KEM (SHA3/SHAKE calls will be inlined later)
  • ML-DSA (SHAKE_Squeeze (incremental XOF output functionality) inside ML-DSA is never invoked with the KAT test vectors and gtests)

Testing:

./crypto/crypto_test --gtest_filter="KeccakInternalTest.*"
./crypto/crypto_test --gtest_filter="SHA3Test.*"
./crypto/crypto_test --gtest_filter="SHAKETest.*"

./crypto/crypto_test --gtest_filter="All/PerKEMTest.*"
./crypto/crypto_test --gtest_filter="All/PQDSAParameterTest.*"

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

manastasova and others added 26 commits December 30, 2024 16:57
Define Keccak1600, FIPS202, and SHA3/SHAKE API layers

Keccak1600 implements absorb and squeeze functionalities. It defines the lowest lever APIs for SHA3/SHAKE; Keccak1600 functions only process complete blocks; internal input/output buffers are handles by higher layer (FIPS202) APIs.

FIPS202 APIs handle the internal input/output buffers to allow incremental function calls. FIPS202 layer is used by both SHA3 and SHAKE high level APIs. FIPS202 defines Reset, Init, Update, Finalize APIs.

SHA3/SHAKE layer implements the SHA3 and SHAKE algorithms. SHA3 supports Init, Update and Final APIs since it produces a given length digest and should be Finalized in a single Final function call. SHAKE supports Init, Update, Finalize and Squeeze APIs since it can generate arbitrary length output in incremental way. SHAKE_Finalize processes padding and absorb of last input block and generates the first output value; Incremental XOF output generation is defined by SHAKE_Squeeze function which implements the squeeze phase (it does not finalize the absorb, SHAKE_Squeeze can be only called after SHAKE_Finalize
Note: symmetric-shake.c will be inlined, therefore, additional checks for |ctx->padded| flag will be omitted (SHAKE_Finalize should be called once to finalize absorb and initialize squeeze phase; SHAKE_Squeeze should be called to generate additional incremental XOF output).
…on may not be completed (incremental XOF output request); however, the SHAKE_Finalize function completes the first requested output, thus, SHAKE has processed the first valid output value
Rename SHAKE_Update to SHAKE_Absorb; Define SHAKE_Final as a single-shot API; Defined SHAKE_Squeeze as an incremental (independent) API. It can can be called immediately after SHAKE_Absorb; SHAKE_Squeeze will finalize absorb phase and initiate squeeze phase; When called a signle time SHAKE_Squeeze has the same behavior as SHAKE_Final, SHAKE_Final cannot be called consecutive times; SHAKE_Squeeze can be called multiple times; SHAKE_Squeeze can be called after SHAKE_Final (to squeeze more bytes).
Allow KECCAK1600_STATE_ABSORB, KECCAK1600_STATE_SQUEEZE, KECCAK1600_STATE_FINAL state flag values to prevent incremental usage of SHAKE_Final or SHAKE_Squeeze after SHAKE_Final

The cahnge is introduced for consistency reasons

KECCAK1600_STATE_ABSORB corresponds to |ctx->padded| = 0 (NOT_PADDED), KECCAK1600_STATE_SQUEEZE corresponds to |ctx->padded| = 1 (PADDED), and KECCAK1600_STATE_FINAL blocks incremental Squeeze calls.
Make FIPS202 functions static; Do not export SHA3 and SHAKE internal functions
Clean redefinition of SHAKE blocksize/rate macros; Update their use inside MLKEM and MLDSA.
Fix alignment

Co-authored-by: Jake Massimo <jakemas@amazon.com>
Upon Init, the |ctx->state| is set to |KECCAK1600_STATE_ABSORB| allowing absorb calls from both SHA3 and SHAKE; Upon Finalize (padding and last absorb) SHA3 and SHAKE (Final or incremental Squeeze) behave in a different way; thus, the |ctx->state| is set to |KECCAK1600_STATE_FINAL| when no incremental calls are allowed (|SHA3_Final| and |SHAKE_Final| and to |KECCAK1600_STATE_SQUEEZE| when incremental squeezes are allowed (SHAKE_Squeeze).
@manastasova manastasova requested a review from a team as a code owner January 7, 2025 21:11
@codecov-commenter
Copy link

codecov-commenter commented Jan 7, 2025

Codecov Report

Attention: Patch coverage is 78.18182% with 24 lines in your changes missing coverage. Please review.

Project coverage is 78.96%. Comparing base (cc9c9f0) to head (873f017).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
crypto/fipsmodule/sha/sha3.c 75.00% 21 Missing ⚠️
crypto/ml_dsa/ml_dsa_ref/poly.c 76.92% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2098      +/-   ##
==========================================
- Coverage   78.97%   78.96%   -0.02%     
==========================================
  Files         611      611              
  Lines      105552   105599      +47     
  Branches    14950    14963      +13     
==========================================
+ Hits        83361    83382      +21     
- Misses      21538    21564      +26     
  Partials      653      653              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

davidben and others added 14 commits January 30, 2025 17:36
This makes me sad, but strdup may be more trouble than is worth it?
Being not in C (until C23) and only a (by POSIX standards) recent
addition to POSIX means a lot of folks seem to make it unnecessarily
hard to use:

- MSVC adds a deprecation warning that we have to suppress

- glibc gates it on feature macros; we just don't notice because we
  already have to work around their bad behavior for pthread_rwlock

- musl gates it on feature macros, which was one of the things that
  tripped cl/583161936

Given we only want to use strdup in one file (err.c, which wants to
avoid OPENSSL_malloc), a small reimplementation is probably not the end
of the world.

While I'm here, we can actually make OPENSSL_strdup's implementation a
little simpler.

Change-Id: I4e6c743b3104a67357d7d527c178c615de6bc844
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/64047
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
(cherry picked from commit 89f097740e6376521926eb56a61b25f639c473ac)
### Description of changes: 
The [GitHub Mac runners occasionally
fail](https://github.com/aws/aws-lc/actions/runs/12839452846/job/35806842991?pr=2127)
the Channel ID tests with an error like:
```
bad error (wanted ':CHANNEL_ID_SIGNATURE_INVALID:' / ''): local error 'channel ID unexpectedly negotiated', child error 'exit status 1', stdout:
```
This change logs more things to get more info to help debugging
these failures.

### Call-outs:
I wasn't able to test this code path locally because I can't get these
failures to reproduce locally.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
### Issues:
Resolves: P189934541

### Description of changes: 
* The size of `generated-src/crypto_test_data.cc` is approaching 100MB.
This change generates the file into a compressed form.

### Call-outs:
Updating scripts related to this files creation and validation.

### Testing:
These changes are thoroughly covered by our CI.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
### Description of changes: 
Prepare AWS-LC v1.43.0

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
Ruby's made a couple larger refactors to require versions later than
OpenSSL 1.1.1.

These changes require us to make a few tweaks to the patch in
aws#2071 and have exposed a couple minor
symbols that we don't support. Adding support for the ones that aren't
complicated in this commit.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
### Issues:
Resolves #CryptoAlg-2819

### Description of changes: 
Implements:
- https://pages.nist.gov/ACVP/draft-celi-acvp-ml-dsa.html
-
https://github.com/usnistgov/ACVP-Server/tree/master/gen-val/json-files/ML-DSA-keyGen-FIPS204
-
https://github.com/usnistgov/ACVP-Server/tree/master/gen-val/json-files/ML-DSA-sigGen-FIPS204
-
https://github.com/usnistgov/ACVP-Server/tree/master/gen-val/json-files/ML-DSA-sigVer-FIPS204

### Additional Details
Added testing for ML-DSA configured with`deterministic = true`
`externalmu = true, false` `signatureInterface = internal` for
`parameterSet = ML-DSA-44, ML-DSA-65, ML-DSA-87`.

We add the [ML-DSA sigGen Test Case JSON
Schema](https://pages.nist.gov/ACVP/draft-celi-acvp-ml-dsa.html#name-ml-dsa-siggen-test-case-jso):

```
type mlDsaSigGenTestGroup struct {
	ID                 uint64 `json:"tgId"`
	Type               string `json:"testType"`
	ParameterSet       string `json:"parameterSet"`
	Deterministic      bool   `json:"deterministic"`
	SignatureInterface string `json:"signatureInterface"`
	ExternalMu         bool   `json:"externalMu`
	Tests         []struct {
		ID      uint64               `json:"tcId"`
		Message hexEncodedByteString `json:"message"`
		MU      hexEncodedByteString `json:"mu"`
		SK      hexEncodedByteString `json:"sk"`
		RND     hexEncodedByteString `json:"rnd"`
		Context hexEncodedByteString `json:"context"`
		HashAlg string               `json:"hashAlg"`
	}
}
```
and [ML-DSA sigVer Test Case JSON
Schema](https://pages.nist.gov/ACVP/draft-celi-acvp-ml-dsa.html#name-ml-dsa-sigver-test-case-jso):
```
type mlDsaSigVerTestGroup struct {
	ID                 uint64 `json:"tgId"`
	Type               string `json:"testType"`
	ParameterSet       string `json:"parameterSet"`
	Deterministic      bool   `json:"deterministic"`
	SignatureInterface string `json:"signatureInterface"`
	ExternalMu         bool   `json:"externalMu`
	Tests        []struct {
		ID        uint64               `json:"tcId"`
		PK        hexEncodedByteString `json:"pk"`
		Message   hexEncodedByteString `json:"message"`
		MU        hexEncodedByteString `json:"mu"`
		Signature hexEncodedByteString `json:"signature"`
		Context   hexEncodedByteString `json:"context"`
                HashAlg   string               `json:"hashAlg"`
	}
}
```




### Testing:
```
Running ACVP tests
2025/01/17 14:09:39 40 ACVP tests matched expectations
2025/01/17 14:09:40 1 ACVP tests matched expectations
2025/01/17 14:09:41 1 ACVP tests matched expectations
2025/01/17 14:09:42 1 ACVP tests matched expectations
2025/01/17 14:09:44 1 ACVP tests matched expectations
2025/01/17 14:09:46 1 ACVP tests matched expectations
2025/01/17 14:09:48 1 ACVP tests matched expectations
2025/01/17 14:09:50 1 ACVP tests matched expectations
2025/01/17 14:09:52 1 ACVP tests matched expectations
2025/01/17 14:10:09 1 ACVP tests matched expectations
2025/01/17 14:10:10 1 ACVP tests matched expectations
2025/01/17 14:10:11 1 ACVP tests matched expectations
2025/01/17 14:10:12 1 ACVP tests matched expectations
2025/01/17 14:10:12 1 ACVP tests matched expectations
```

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
…s#2136)

These operating systems are unused, untested, and not a priority for us. Remove remnants of them
A bimodal performance occurred in the XTS encrypt AVX512 implementation. We have observed more than 80% drop in performance. This is caused by mixing SSE and AVX instructions in the AVX512 implementation. For a subset of input lengths, the code path contained a single move movdqa, an SSE instruction. Use vmovdqa instead.
### Issues:
Resolves #CryptoAlg-2868

### Description of changes: 

It is often useful when serializing asymmetric key pairs to populate
both the public and private elements, given only the private element.
For this to be possible, an algorithm utility function is often provided
to derive key material. ML-DSA does not support this in the reference
implementation.

#### Background ML-DSA keypairs 

An ML-DSA private key is constructed of the following elements: (ref
https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.204.pdf)
```
sk = (
      rho, // public random seed (32-bytes)
      tr,  // public key hash (64-bytes)
      key, // private random seed (32-bytes) (utilized during sign)
      t0,  // polynomial vector: encodes the least significant bits of public-key polynomial t, facilitating certain computational efficiencies.
      s1,  // secret polynomial vectors. These vectors contain polynomials with coefficients in a specified range, 
      s2.  // serving as the secret components in the lattice-based structure of ML-DSA.
)
```

An ML-DSA public key is constructed of the following elements:
```
pk = (
      rho, // public random seed (32-bytes)
      t1.  // compressed representation of the public key polynomial 
)
```

- The vector t is decomposed into two parts:
- `t1`: Represents the higher-order bits of `t`.
- `t0`: Represents the lower-order bits of `t`.

One can see that to reconstruct the public key from the private key, one
must:
1. Extract all elements from `sk`, using the existing function in
`/ml_dsa_ref/packing.c`: `ml_dsa_unpack_sk`
    1. This will provide `sk = (rho, tr, key, t0, s1, s2)`.
2. Reconstruct `A` using `rho` with the existing function in
`/ml_dsa_ref/polyvec.c`: `ml_dsa_polyvec_matrix_expand`
3. Reconstruct `t` from `t = A*s1 + s2`
4. Drop `d` lower bits from `t` to get `t1`
5. Pack `rho`, `t1` into public key.
6. Verify `pk` matches expected value, by comparing SHAKE256(pk) + `tr`
(unpacked from secret key).

This has been implemented in `ml_dsa_pack_pk_from_sk` -- not tied to the
name, just using what I've seen so far in common nomenclature.

As the values of `d` differ for each parameter set of ML-DSA, we must
create packing functions for each parameter size. As such,
`ml_dsa_44_pack_pk_from_sk``, `ml_dsa_65_pack_pk_from_sk``, and
`ml_dsa_87_pack_pk_from_sk`` have been added to `ml_dsa.h` to serve as
utility functions in higher level EVP APIs.

### Call-outs:

The scope of this PR is only the algorithm level, using these functions
for useful tasks such as populating the public key automatically on
private key import -- will be added in subsequent PRs.

### Testing:
A new test has been added to `PQDSAParameterTest`, namely,
`KeyConsistencyTest` that will assert that packing the key is
successful, and that the key produced matches the original public key.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
### Description of changes: 
* Add support for Ed25519ph and Ed25519ph where the digest is
pre-computed and provided externally.
* Add support for Ed25519ctx from RFC 8032.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
### Issues:
Addresses: 
* aws/aws-lc-rs#522

### Description of changes: 
Setup macros for big-endian MIPS in "target.h"

### Callout
A contributor informs us that the build for `mips-unknown-linux-musl`
succeeds with this change:
aws/aws-lc-rs#522 (comment)

### Testing:
No CI testing for it. 

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
As suggested in https://github.com/animetosho/md5-optimisation?tab=readme-ov-file#dependency-shortcut-in-g-function, we can delay the dependency on 'x' by recognizing that ((x & z) | (y & ~z)) is equivalent to ((x & z) + (y + ~z)) in this scenario, and we can perform those additions independently, leaving our dependency on x to the final addition. This speeds it up around 5% on both platforms.
### Issues:
Resolves #CryptoAlg-2868

### Description of changes: 
Following from the first part of this PR to add low level support for
ML-DSA public key generation from private keys. This PR uses the new
`pqdsa` method `pqdsa->pqdsa_pack_pk_from_sk`. We supply
`ml_dsa_44_pack_pk_from_sk`, `ml_dsa_65_pack_pk_from_sk`, and
`ml_dsa_87_pack_pk_from_sk` from `ml_dsa.h` to provide this
functionality for ML-DSA.

As we use `EVP_parse_private_key` to import `PQDSA` keys into AWS-LC, we
need to modify the `PQDSA` asn.1 private key decoding function to
additionally populate the corresponding public key.

The `pqdsa_priv_decode` function in `p_pqdsa_asn1.c` has been modified
to attempt to generate a public key from the provided secret key, should
the method be defined for that `pqdsa` structure. As ML-DSA is the only
current `PQDSA`, the library currently does implement methods for all
`PQDSA` PKEY types.

### Call-outs:
> what happens if a new `PQDSA` algorithm is introduced to aws-lc, and
doesn't support `pk` calculation from `sk`?

We populate the methods as `NULL` and will modify the G.test that
expects the public key to be generated. We can make this change simply
if/when we add to the `PQDSA` capabilites with an algorithm that doesn't
support `pk` from `sk`.

### Testing:
To test functionality, I have adapted the `PQDSAParameterTest` G.test
`MarshalParse` to now verify that the public key is populated after
calling `EVP_parse_private_key`. We then verify that the public key
calculated is equal to the original public key.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
@manastasova manastasova marked this pull request as ready for review January 30, 2025 17:44
@nebeid nebeid self-requested a review January 31, 2025 19:48
@justsmth justsmth self-requested a review February 3, 2025 18:01
Comment on lines 74 to 80
// Define state flag values for Keccak-based functions
#define KECCAK1600_STATE_ABSORB 0
#define KECCAK1600_STATE_SQUEEZE 1
// KECCAK1600_STATE_FINAL restricts the incremental calls to SHAKE_Final .
// KECCAK1600_STATE_FINAL can be called once. SHAKE_Squeeze cannot be called after SHAKE_Final.
// SHAKE_Squeeze should be called for streaming XOF output.
#define KECCAK1600_STATE_FINAL 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to represent a failure state if one of the operations were to fail (or only partially succeed). Is it possible to get a KECCAK1600_CTX into a no longer usable state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be covered by the returns of the functions. I don't think an additional value for |state| is needed since the purpose is to restrict the number/sequence of: absorb (only before padding is performed), squeeze (incremental squeezes are performed with no limitation), final (one call to final is allowed) function. Once a squeeze is called, no further absorb should be allowed, once final is called, no further absorb, squeeze, or final is allowed.

Comment on lines 260 to 267
int SHAKE_Absorb(KECCAK1600_CTX *ctx, const void *data, size_t len) {
if (ctx == NULL) {
return 0;
}

if (len == 0) {
return 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify that data != NULL

@@ -113,7 +113,7 @@ uint8_t *SHAKE256(const uint8_t *data, const size_t in_len, uint8_t *out, size_t
static void FIPS202_Reset(KECCAK1600_CTX *ctx) {
memset(ctx->A, 0, sizeof(ctx->A));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this instead use OPENSSL_memset? (Here and other places...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Justin. Updated it.

// leaving the rest for later processing.
// There is enough data to fill or overflow the intermediate
// buffer. So we append |rem| bytes and process the block,
// leaving the rest for later processing.
memcpy(ctx->buf + num, data_ptr_copy, rem);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this instead use OPENSSL_memcpy? (Here and other places...)

Comment on lines 234 to 237
int SHA3_Final(uint8_t *md, KECCAK1600_CTX *ctx) {
if (ctx->md_size == 0) {
return 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify that md != NULL and ctx != NULL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added these check in all functions! Thank you!
They are checked in the higher EVP_ layer functions, however, since SHA3/SHAKE functions are also internally called, it is worth checking the values as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the checks here should not be needed in this "layer" if we're confident the parameters are checked on all logic paths going into these functions. What I noticed was that there was some inconsistency between the functions, where some were checking certain parameters and others were not, and it wasn't clear to me why they weren't all performing the same checks.

Comment on lines 306 to 308
if (ctx->state == KECCAK1600_STATE_ABSORB) {
// Skip FIPS202_Finalize if the input has been padded and the last block has been processed
if (FIPS202_Finalize(md, ctx) == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment feels like it's in the wrong place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed it.

Comment on lines 78 to 79
// KECCAK1600_STATE_FINAL can be called once. SHAKE_Squeeze cannot be called after SHAKE_Final.
// SHAKE_Squeeze should be called for streaming XOF output.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly,

Suggested change
// KECCAK1600_STATE_FINAL can be called once. SHAKE_Squeeze cannot be called after SHAKE_Final.
// SHAKE_Squeeze should be called for streaming XOF output.
// KECCAK1600_STATE_FINAL is set once |SHAKE_Final| is called so that |SHAKE_Squeeze| cannot be called anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

OPENSSL_EXPORT int SHA3_Init(KECCAK1600_CTX *ctx, size_t bitlen);

// SHA3_Update processes all data blocks that don't need pad through
// |Keccak1600_Absorb| and returns 1 and 0 on failure.
// SHA3_Update check |ctx| pointer and |len| value, calls |FIPS202_Update|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it not do what it used to do: "processes all full blocks of the input data"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHA3_Update technically does the same: "processes all data blocks that don't need pad through". However, now it is performed via call to FIPS202_Update, which internally "processes all data blocks that don't need pad through".

// SHA3_Final pads the last data block and processes it through |Keccak1600_Absorb|.
// It processes the data through |Keccak1600_Squeeze| and returns 1 and 0 on failure.
// SHA3_Final pads the last data block and absorbs it through |FIPS202_Finalize|.
// It processes the data through |Keccak1600_Squeeze| and returns 1 on success
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// It processes the data through |Keccak1600_Squeeze| and returns 1 on success
// It then calls |Keccak1600_Squeeze| and returns 1 on success

// returns the remaining number of bytes.
// SHAKE_Init initialises |ctx| fields through |FIPS202_Init| and
// returns 1 on success and 0 on failure.
int SHAKE_Init(KECCAK1600_CTX *ctx, size_t block_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a historical reason why we chose to input bitlength to SHA3_Init and block_size to SHAKE_Init?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good observation!
I see it as more intuitive to consider the digest length as an identifier for the security level of the hashing algs rather than the block size (as it is for SHA2)
The block size for SHAKE servers as the SL identifier based on the lack of digest lengths.
I can change the SHA3_Init to take the block size as a parameter. However, the reverse, changing SHAKE_Init won't be possible, so I think it may be more confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to leave the bitlength in SHA3_Init because there isn't an equivalent function for SHA2 that we need to align with (e.g. using byte length). If we used bitlength with SHAKE_Init, it would identify 128 or 256. It's minor anyway and if it makes more sense to you this way, that's good.

@@ -71,6 +71,14 @@ extern "C" {
// SHAKE128 has the maximum block size among the SHA3/SHAKE algorithms.
#define SHA3_MAX_BLOCKSIZE SHAKE128_BLOCKSIZE

// Define state flag values for Keccak-based functions
#define KECCAK1600_STATE_ABSORB 0
#define KECCAK1600_STATE_SQUEEZE 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to put a comment here like
'''
// KECCAK1600_STATE_SQUEEZE is set when |SHAKE_Squeeze| is called.
// It remains set while |SHAKE_Squeeze| is called repeatedly to output chunks of the XOF output.
'''

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Nevine. Added!

return 0;
}
// SHA3_Final should be called once to process final digest value
// |ctx->state| flag does not need to be updated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state is updated within the function, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my confusion here!
The flag does not need to be updated inside SHA3_Final incorrect use will be prevented in the EVP_ high level functions (i.e., EVP_DigestFinal will cleanse the |ctx->digest| value. Later on, if EVP_DigestUpdate or EVP_DigestFinal is/are called incorrectly after EVP_DigestFinal, the check on ctx->digest| == NULL will prevent from executing EVP_DigestUpdate or EVP_DigestFinal)
Therefore, EVP_DigestFinal cannot be called incorrectly.
However, if we want to prevent the incorrect use of the low level internal APIs SHA3_Final, then the flag update will prevent calling SHA3_Final more then once via state check in FIPS202_Finalize. The flag update here would also prevent calling SHA3_Update after SHA3_Final via state check in FIPS202_Update.
I would consider your recommendation here. For now I will leave the flag update and will remove the comment.

return FIPS202_Update(ctx, data, len);
}

// SHAKE_Final is a single-shot API and can be called once to finalize absorb and squeeze phases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// SHAKE_Final is a single-shot API and can be called once to finalize absorb and squeeze phases
// SHAKE_Final is to be called once to finalize absorb and squeeze phases

I think "single-shot" here is not the right term because it's used with non-streaming APIs such as SHAKE_256()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Thank you!

Comment on lines 293 to 294
// SHAKE_Squeeze can be called multiple times
// SHAKE_Squeeze should be called for incremental XOF output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// SHAKE_Squeeze can be called multiple times
// SHAKE_Squeeze should be called for incremental XOF output
// SHAKE_Squeeze can be called multiple time for incremental XOF output

}

if (ctx->state == KECCAK1600_STATE_ABSORB) {
// Skip FIPS202_Finalize if the input has been padded and the last block has been processed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this comment one line up because it's talking about skipping this if statement (if I understood correctly).


// SHAKE_Squeeze can be called multiple times
// SHAKE_Squeeze should be called for incremental XOF output
int SHAKE_Squeeze(uint8_t *md, KECCAK1600_CTX *ctx, size_t len) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have unit tests for SHAKE_Squeeze or we're relying on ml_dsa and ml_kem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am adding multiple tests for SHAKE_Squeeze in #2155. Currently, the SHAKE_Squeeze is not exposed through the EVP functions, therefore, I only tested it through ML-KEM.
Actually, I noticed that ML-DSA test vectors never enter in the Squeeze loop (there is always enough mod q samples to generate the entire matrix of polys). I noted that in the call-outs in the PR description since I though it is an interesting observation.

manastasova and others added 4 commits February 4, 2025 09:56
Fix comments

Replace memset with OPENSSL_memset

Add NULL pointer checks

Clean (reverse) logic of branches

Update the buf_load inside |FIPS202_Finalize| since last absorb free the internal buffer. Later, it will be used to store the intermediate result for the XOF output chunks (when incremental byte squeezes are added).
Fix comments

Replace memset, memcpy with OPENSSL_memset, OPENSSL_memcpy

Add NULL pointer checks

Clean (reverse) logic of branches

Update the buf_load inside |FIPS202_Finalize| since last absorb free the internal buffer. Later, it will be used to store the intermediate result for the XOF output chunks (when incremental byte squeezes are added).
@justsmth justsmth self-requested a review February 5, 2025 18:58
// returns the remaining number of bytes.
// SHAKE_Init initialises |ctx| fields through |FIPS202_Init| and
// returns 1 on success and 0 on failure.
int SHAKE_Init(KECCAK1600_CTX *ctx, size_t block_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to leave the bitlength in SHA3_Init because there isn't an equivalent function for SHA2 that we need to align with (e.g. using byte length). If we used bitlength with SHAKE_Init, it would identify 128 or 256. It's minor anyway and if it makes more sense to you this way, that's good.

@justsmth justsmth merged commit 9660ac3 into aws:main Feb 5, 2025
122 of 126 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.