driver-only ECDSA: fix testing disparities in ecp, random, se_driver_hal #6856

mpg · 2022-12-30T10:22:58Z

Context: #6839.

This task is to finish off TL-ecdsa.PSA by achieving testing parity in "basic" test suites (that is, everything except PK, X.509, TLS). Ideally, once this is done, in analyze_outcomes.py, the task analyze_driver_vs_reference_ecdsa will no longer ignore test suites other than PK, X.509, TLS.

In test_suite_ecp, the function mbedtls_ecp_group_metadata depends on ECDSA_C (and ECDH_C). This is because it tests mbedtls_ecdsa_can_do() (and mbedtls_ecdh_can_do()). But it also tests lots of other things. Instead of having the entire test function depend on ECDSA (and ECDH), only the specific line that depends on it should be guarded by an #if defined.
In test_suite_psa_crypto_se_driver_hal, several test cases using the function sign_verify declare a dependency on MBEDTLS_ECDSA_C:MBEDTLS_ECP_C:PSA_WANT_ECC_SECP_R1_256, while they actually only use psa_ APIs. The dependency declaration should probably be PSA_WANT_ALG_ECDSA:PSA_WANT_ECC_SECP_R1_256.
In test_suite_random, the function mbedtls_psa_get_random_ecdsa_sign depends on ECDSA_C. This seems correct, but is annoying because it doesn't play nice with out test disparity analysis. Find what to do.

Random ideas about mbedtls_psa_get_random_ecdsa_sign:

The test serves a purpose: checking that mbedtls_psa_get_random and MBEDTLS_PSA_RANDOM_STATE can be used with legacy APIs that accept f_rng, p_rng parameters. So, we may not want to just remove that test. (Historical note: the test was added by Expose the PSA RNG in mbedtls #4110.)
We could also just accept the testing disparity as harmless. That would mean leaving the ignore entry for test_suite_random in analyze_outcomes.py, with a comment explaining why it's harmless.
We could instead move the test to test_suite_ecdsa which is legitimately ignored already. This is not really satisfying, as the test logically belongs to test_suite_random.
We could also use something other than ECDSA in the role of "legacy API that takes f_rng, p_rng". Changing to ECDH, ECJPAKE or ECP or BIGNUM would only make us run into the same problem in future steps. But we could use RSA or DHM instead. In principle, it's also only moving the problem but it's unclear at this point if we're going to do driver-only RSA/DHM the way we're currently doing driver-only ECC. Or if it happens, it might be only after 4.0 where we'll remove f_rng, p_rng params from public APIs (always use the PSA RNG) so mbedtls_psa_get_random will be gone and this test with it.
Any better idea is welcome!

Depends on: #6855

The text was updated successfully, but these errors were encountered:

gilles-peskine-arm · 2023-01-05T16:14:28Z

The test serves a purpose: checking that mbedtls_psa_get_random and MBEDTLS_PSA_RANDOM_STATE can be used with legacy APIs that accept f_rng, p_rng parameters. So, we may not want to just remove that test.

Indeed.

We could also just accept the testing disparity as harmless. That would mean leaving the ignore entry for test_suite_random in analyze_outcomes.py, with a comment explaining why it's harmless.

This has my preference. The cost is minimal. Unless we really expect to have no other disparities, which would save us from supporting an exception mechanism? Is it hard to support exceptions?

We could instead move the test to test_suite_ecdsa which is legitimately ignored already. This is not really satisfying, as the test logically belongs to test_suite_random.

Indeed. It's testing mbedtls_psa_get_random, and the involvement of ECDSA is anecdotic.

We could also use something other than ECDSA in the role of "legacy API that takes f_rng, p_rng"

That would be fine. It should be something where all-bits-zero (a plausible failure mode) causes a failure (which is the case for both ECC blinding and randomized ECDSA k generation, and also for DHM blinding, and for RSA key generation but it's very slow, but I'm not sure about RSA blinding, and it's not the case for RSA encryption nonce or RSA padding).

mpg · 2023-01-06T09:47:10Z

We could also just accept the testing disparity as harmless. That would mean leaving the ignore entry for test_suite_random in analyze_outcomes.py, with a comment explaining why it's harmless.

This has my preference. The cost is minimal. Unless we really expect to have no other disparities, which would save us from supporting an exception mechanism? Is it hard to support exceptions?

We already have support or test-suite-level exceptions and we need it because we always want to ignore the test suite of the thing that's being accelerated: for example when ECDSA is accelerated clearly we expect the tests in test_suite_ecdsa to be skipped. (Support for exceptions is also super useful in the development process, because we can start by adding the tests we want, but more exceptions than warranted, then gradually work towards removing them, see #6855 and its follow-ups - you could call it test-driven task breakdown.)

What we don't have, is test-case-level exceptions where we could say it's OK for this case in particular to be skipped. But here I think it's OK to have an exception for the full test_suite_random anyway.

That would be fine. It should be something where all-bits-zero (a plausible failure mode) causes a failure (which is the case for both ECC blinding and randomized ECDSA k generation, and also for DHM blinding, and for RSA key generation but it's very slow, but I'm not sure about RSA blinding, and it's not the case for RSA encryption nonce or RSA padding).

Good point about RSA being slow (as well as DHM parameter generation).

gilles-peskine-arm · 2023-01-06T10:17:21Z

But here I think it's OK to have an exception for the full test_suite_random anyway.

I disagree. That would be problematic start working on PSA random driver support.

mpg · 2023-01-06T10:42:05Z

Should have made a complete sentence: I think it's OK to have an exception for the full test_suite_random in driver-only ECC builds anyway. Exceptions are per pair all.sh components to compare. Obviously in builds focused on PSA random drivers that wouldn't be OK.

Anyway, I've been thinking and at some point we'll probably want to expand exceptions support to target specific test cases anyway, and I don't think it would be that hard. (For example at some point in PK parse we'll want to specifically ignore test cases that rely on compressed points, but definitely not ignore the full suite.) So, we might as well do that extension now and ignore only that specific test case here too.

valeriosetti · 2023-01-18T14:38:16Z

I'm not an expert at all, but since I'm currently trying to address this issue I just give my 2cents.

At first I was more OK with this solution:

We could also use something other than ECDSA in the role of "legacy API that takes f_rng, p_rng". [...] But we could use RSA or DHM instead.

However I see that there are concerns about it as pointed out here.

As a consequence my question goes to this sentence:

I've been thinking and at some point we'll probably want to expand exceptions support to target specific test cases anyway, and I don't think it would be that hard

So, if I understood correctly, this means that I should extend analyze_outcomes.py in order to allow exceptions for specific test inside a certain suite. Am I right?
If this is the case, then:

looking at the generated csv file, the only option I see for adding an additional "filter" is to consider the string being printed on the test, such as PSA classic wrapper: ECDSA signature (SECP256R1) correct?
can I do this directly in this PR to make things faster?

valeriosetti · 2023-01-18T16:40:13Z

A tentative change to analyze_outcomes.py was proposed in the connected PR at this commit

gilles-peskine-arm · 2023-01-18T21:29:06Z

So, if I understood correctly, this means that I should extend analyze_outcomes.py in order to allow exceptions for specific test inside a certain suite. Am I right?

Yes.

can I do this directly in this PR to make things faster?

It'll only make things faster if everyone is happy with the design.

I don't think including test case descriptions in the script will work well. Someone might tweak the description (e.g. fix a typo, or make it more distinctive when adding a similar test case), and there's basically no way to know the test case is referenced elsewhere. I'm not even overjoyed at hard-coding test suite names.

I'd prefer a mechanism where the test suite/case itself has an annotation that says THIS_TEST_IS_EXCLUDED_FROM_DRIVER_TEST_COVERAGE_PARITY. How about adding this as a “fake” depency to depends_on declarations? This requires no change to the test framework and has a familiar interface. The missing piece is for the Python code to understand test suite dependencies and test case dependencies, which can be done by leveraging some of the parsing that's already implemented in generate_test_cases.py.

Note — to smoothe development, we can have the simple mechanism with a hard-coded name in this PR, and then make a follow-up with a cleaner mechanism.

valeriosetti · 2023-01-19T06:57:23Z

It'll only make things faster if everyone is happy with the design.

Yes, of course ;). I didn't want to sound disrespectful here, I just wanted to say that I can take care of this directly in this PR without opening another one just to fix this part

Someone might tweak the description (e.g. fix a typo, or make it more distinctive when adding a similar test case), and there's basically no way to know the test case is referenced elsewhere

You're right, the proposed solution is not super safe

How about adding this as a “fake” depency to depends_on declarations?

I agree, this looks like a better approach. I'll give it a try!

mpg · 2023-01-24T09:18:22Z

@gilles-peskine-arm

I don't think including test case descriptions in the script will work well. Someone might tweak the description (e.g. fix a typo, or make it more distinctive when adding a similar test case), and there's basically no way to know the test case is referenced elsewhere. I'm not even overjoyed at hard-coding test suite names.

As a matter of principle I'm not a huge fan of referring to test cases by their description, as you say someone might tweak it, but then what would happen? outcome_analysis.py will not see the new description in its ignore list, so it will not ignore it, notice a disparity in coverage, exit non-zero, and the CI will fail. So we're bound to notice and fix it.

So, while referring to things by name is not great in general, I think the special case of an ignore list in a script run in the CI is actually quite OK. The worst that can happen is we waste a push-wait-for-CI-rework cycle on this. I don't think we tune descriptions often enough that the amount of cycles wasted would be unacceptable.

I'd prefer a mechanism where the test suite/case itself has an annotation that says THIS_TEST_IS_EXCLUDED_FROM_DRIVER_TEST_COVERAGE_PARITY.

I see two problems with this. First, the annotation needs to be more precise: we don't want to exclude this test form parity checking when testing accelerated hashes from example. This is of course trivial to fix by appending _FOR_ECDSA (or similar) to the name used.

The second problem is related, and involves a plan I had in my head but hadn't written so far: in the long run I think we can save CI resources by having a single "reference" component that would be common to all "accelerated" components. Currently this is not possible the reference component for accelerated hashes disables the entropy module and uses an external RNG, while the ECDSA reference component disables ECDSA-based key exchanges, but once hashes are improved and ECDSA has progressed enough, both should be able to use the same reference component, which would be just full config + drivers enabled. But this doesn't work as well if we skip test cases in the reference component. Though that's of course not a problem now, so we could use a simpler approach now and worry about that later.

@valeriosetti quoting from #6946 (comment)

instead of adding a new possible option for depends_on in the .data file, which didn't look very good to me, I preferred to add a new "line" to the .data file called test_options

I'm not sure why re-using the existing depends_on line doesn't look good, can you expand on that? Your solution adds a bit of complexity, and it's not immediately clear to me what justifies this added complexity.

valeriosetti · 2023-01-24T09:59:25Z

I'm not sure why re-using the existing depends_on line doesn't look good, can you expand on that?

In my opinion a "dependency" is something that is needed for building or running the library/test-code. In this case we would like to skip a test because it can be built/tested in one case, but not in the other, so the cross-check does not work properly. However the single build, considered independently, is OK.
Besides this new symbol is not something defined in some header file, but something that makes sense only for the test purpose.

However this just my interpretation, of course, so I'm open to any different approach ;)

gilles-peskine-arm · 2023-01-24T10:15:53Z

First, the annotation needs to be more precise: we don't want to exclude this test form parity checking when testing accelerated hashes from example.

Does that really matter? The test case will have a more specific dependency on some non-PSA symbol. The extra symbol to exclude from parity checking is just there to say “it's not a mistake to have a non-PSA dependency”.

we can save CI resources by having a single "reference" component that would be common to all "accelerated" components. (…) But this doesn't work as well if we skip test cases in the reference component.

I don't understand this part. What wouldn't work with an everything-through-driver component?

mpg · 2023-01-24T10:44:34Z

First, the annotation needs to be more precise: we don't want to exclude this test form parity checking when testing accelerated hashes from example.

Does that really matter? The test case will have a more specific dependency on some non-PSA symbol. The extra symbol to exclude from parity checking is just there to say “it's not a mistake to have a non-PSA dependency”.

I don't think the extra symbol is here just for that. We already have a non-PSA dependency indeed and it's not a mistake indeed. But what analyze_outcomes.py is doing is comparing the sets of test skipped in a build with ECDSA_C vs in a build without ECDSA_C (but ECDSA provided by a driver). Currently, it sees a test skipped in the second build but not the first, so it complains, that's its job. We want to tell the script "when comparing those two builds, don't worry about this case", because we've manually checked that this disparity is not an indication of anything wrong in ECDSA driver support.

But when the script is comparing "hashes in software" vs "accelerated hashes" build, if the test mbedtls_psa_get_random_ecdsa_sign is skipped in the accelerated build and not the software build, I want the script to let me know because that's definitely not expected, so likely an indication of a problem somewhere.

Basically the set of expected & acceptable disparities depends on the pair of builds being compared.

On another front, what do you think about my point that referring to a case by its description would be acceptable in the specific case of an ignore list, since any change in the description would be noticed as a result of the script failing?

gilles-peskine-arm · 2023-01-24T11:20:57Z

The implementation of driver parity testing feels more complex than I'd expect. In particular, I wonder why “no non-PSA dependencies in the PSA domain (except explicit exceptions)” is not a good enough approximation. Is there a design analysis document that lists the requirements and derives the architecture?

I don't think the extra symbol is here just for that. We already have a non-PSA dependency indeed and it's not a mistake indeed.

Hmmm. A test is skipped only due to a dependency or to a runtime decision to skip. For a test to not run with drivers, one of the following has to happen:

The test case depends on a non-PSA crypto feature.
The test case depends on a combination of compilation options that is never reached in any driver build.
The test case is skipped at runtime with a condition that is true in at least one non-driver build and false in all driver builds.

I thought the goal of the test parity check was primarily (2), since (1) could be enforced through simple dependency checks and (3) is rare enough to not really matter. So for (2), is it really important which driver-based build gets the test to run? Especially when we can do all-driver builds (as opposed to separating algorithms due to incomplete support): wouldn't this reduce (2) to an analysis of test case dependencies?

Basically the set of expected & acceptable disparities depends on the pair of builds being compared.

Well, ok, that's how it's currently implemented. But why do we need this level of precision?

what do you think about my point that referring to a case by its description would be acceptable in the specific case of an ignore list, since any change in the description would be noticed as a result of the script failing?

That's not great, but it's acceptable.

mpg added enhancement size-s Estimated task size: small (~2d) labels Dec 30, 2022

mpg mentioned this issue Dec 30, 2022

Driver-only ECDSA starter #6855

Merged

3 tasks

mpg mentioned this issue Jan 6, 2023

Driver-only ECC: goals and strategy #6839

Closed

valeriosetti self-assigned this Jan 18, 2023

valeriosetti mentioned this issue Jan 18, 2023

driver-only ECDSA: fix testing disparities in ecp, random, se_driver_hal #6946

Merged

3 tasks

mpg closed this as completed in #6946 Feb 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

driver-only ECDSA: fix testing disparities in ecp, random, se_driver_hal #6856

driver-only ECDSA: fix testing disparities in ecp, random, se_driver_hal #6856

mpg commented Dec 30, 2022 •

edited by mprse

Loading

gilles-peskine-arm commented Jan 5, 2023

mpg commented Jan 6, 2023

gilles-peskine-arm commented Jan 6, 2023

mpg commented Jan 6, 2023

valeriosetti commented Jan 18, 2023

valeriosetti commented Jan 18, 2023

gilles-peskine-arm commented Jan 18, 2023

valeriosetti commented Jan 19, 2023

mpg commented Jan 24, 2023

valeriosetti commented Jan 24, 2023

gilles-peskine-arm commented Jan 24, 2023

mpg commented Jan 24, 2023

gilles-peskine-arm commented Jan 24, 2023

driver-only ECDSA: fix testing disparities in ecp, random, se_driver_hal #6856

driver-only ECDSA: fix testing disparities in ecp, random, se_driver_hal #6856

Comments

mpg commented Dec 30, 2022 • edited by mprse Loading

gilles-peskine-arm commented Jan 5, 2023

mpg commented Jan 6, 2023

gilles-peskine-arm commented Jan 6, 2023

mpg commented Jan 6, 2023

valeriosetti commented Jan 18, 2023

valeriosetti commented Jan 18, 2023

gilles-peskine-arm commented Jan 18, 2023

valeriosetti commented Jan 19, 2023

mpg commented Jan 24, 2023

valeriosetti commented Jan 24, 2023

gilles-peskine-arm commented Jan 24, 2023

mpg commented Jan 24, 2023

gilles-peskine-arm commented Jan 24, 2023

mpg commented Dec 30, 2022 •

edited by mprse

Loading