ECC: use a bit of ROM rather than lots of RAM and improve performance #4128

mpg · 2021-02-09T11:29:40Z

Description

Type: Enhancement
Priority: Minor

Note: was previously raised internally as IOTSSL-1865/IOTCRYPT-614 in Nov. 2017.

The current implementation of MBEDTLS_FIXED_POINT_OPTIM is supposed to be a RAM-performance trade-off but actually doesn't improve performance (while consuming RAM) in common workflows, most notably not in TLS. The only workflow where it actually improves performance is when you perform repeated ECDSA operations with the same context, but not through the PK layer.

It is possible to rewrite it in order to:

stop using extra RAM
use a much smaller amount of ROM instead
actually speed up operations in common workflows
lift the limitation that ecp_grp must not be shared between threads

(Technical details: instead of storing for each ecp_grp a copy of the full T in RAM, store in ROM only the 2**i multiples of G.)

Alternatively (or while waiting for that work to happen), the current non-optimisation should be disabled by default as in point 3 of #4127 which brings the RAM savings without the performance improvement.

The expected performance improvement range from +120% (ops/sec) for ECDSA signature to 0% for static ECDH, and +40% for ECDHE and ECDSA verification. The overall expected improvement on a typical TLS handshake is about +40% (hs/time unit).

The text was updated successfully, but these errors were encountered:

mpg · 2021-02-09T11:46:51Z

Note: the reasons why the current implementation doesn't deliver the expected performance improvement for common use cases such as a TLS ECDHE-ECDSA handshake are twofold:

for the ECDHE part, it's the fact that each handshake starts with a fresh ECDH context and throws it away once the handshake is completed, so the supposedly-cached mutliples of G are computed afresh every time and don't stand a chance of being re-used
for the ECDSA part it's ECKEY PK structure duplicates itself on every use #2034 - again a fresh context is created for each operation, so time (and RAM) is spend of filling up the cache but it's then discarded before it can be re-used.

Note 2: one could also use a central cache (instead of per-context) in RAM but since the values are known at compile-time (multiples of the standard base point) it makes more sense to have it in ROM/flash.

gilles-peskine-arm · 2021-02-09T11:51:39Z

the supposedly-cached mutliples of G are computed afresh every time and don't stand a chance of being re-used

I understand that we don't like global data, but at the same time it doesn't really make sense to me that we store and cache a lot of data per key when that data is not specific to a key, but to a curve. Why don't we introduce a global cache? This could be done in 3.x, once the mbedtls_ecp_group structure is opaque.

mpg · 2021-02-09T12:03:07Z

I agree, I'm just not sure there's any reason for the global cache to live in RAM, since all of its content is known at compile time. But yes, the core of the improvement is really to make the cache global - either in flash (filled up at compile time) as the description suggests, or in RAM, filled up at the time of the first access (but this might require some thinking about thread-safety). (We could even have hybrids where only the part that takes the most time to compute is stored in flash and the rest is cached in RAM, if we wanted to be fancy.)

(In case it wasn't clear: I think having a cache per ecp_group structure was a big design mistake and there was no good reason for it other than I failed to think about this properly back then.)

mpg · 2021-02-09T12:09:27Z

I understand that we don't like global data

Btw that's one of the reasons I'm suggesting to store it in flash, as I guess it would be more precise to say that "we don't like global mutable data".

mpg mentioned this issue Feb 9, 2021

Improve ECC defaults to decrease RAM usage (for the same speed) #4127

Closed

mpg mentioned this issue Feb 9, 2021

Remove the RSA key mutex #4124

Open

bensze01 added enhancement component-crypto Crypto primitives and low-level interfaces labels Feb 10, 2021

mpg mentioned this issue Apr 7, 2021

ecdsa: pre-compute grp->T to gain runtime performance #4303

Closed

Kxuan mentioned this issue Apr 8, 2021

Static initialize comb table #4315

Merged

mpg closed this as completed in #4315 Jun 3, 2021

zugzwang mentioned this issue Mar 10, 2023

test_ec_compressed_points is very slow in SGX fortanix/rust-mbedtls#134

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ECC: use a bit of ROM rather than lots of RAM and improve performance #4128

ECC: use a bit of ROM rather than lots of RAM and improve performance #4128

mpg commented Feb 9, 2021 •

edited

Loading

mpg commented Feb 9, 2021 •

edited

Loading

gilles-peskine-arm commented Feb 9, 2021

mpg commented Feb 9, 2021 •

edited

Loading

mpg commented Feb 9, 2021

ECC: use a bit of ROM rather than lots of RAM and improve performance #4128

ECC: use a bit of ROM rather than lots of RAM and improve performance #4128

Comments

mpg commented Feb 9, 2021 • edited Loading

Description

mpg commented Feb 9, 2021 • edited Loading

gilles-peskine-arm commented Feb 9, 2021

mpg commented Feb 9, 2021 • edited Loading

mpg commented Feb 9, 2021

mpg commented Feb 9, 2021 •

edited

Loading

mpg commented Feb 9, 2021 •

edited

Loading

mpg commented Feb 9, 2021 •

edited

Loading