Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECC: use a bit of ROM rather than lots of RAM and improve performance #4128

Closed
mpg opened this issue Feb 9, 2021 · 4 comments · Fixed by #4315
Closed

ECC: use a bit of ROM rather than lots of RAM and improve performance #4128

mpg opened this issue Feb 9, 2021 · 4 comments · Fixed by #4315
Labels
component-crypto Crypto primitives and low-level interfaces enhancement

Comments

@mpg
Copy link
Contributor

mpg commented Feb 9, 2021

Description

  • Type: Enhancement
  • Priority: Minor

Note: was previously raised internally as IOTSSL-1865/IOTCRYPT-614 in Nov. 2017.

The current implementation of MBEDTLS_FIXED_POINT_OPTIM is supposed to be a RAM-performance trade-off but actually doesn't improve performance (while consuming RAM) in common workflows, most notably not in TLS. The only workflow where it actually improves performance is when you perform repeated ECDSA operations with the same context, but not through the PK layer.

It is possible to rewrite it in order to:

  • stop using extra RAM
  • use a much smaller amount of ROM instead
  • actually speed up operations in common workflows
  • lift the limitation that ecp_grp must not be shared between threads

(Technical details: instead of storing for each ecp_grp a copy of the full T in RAM, store in ROM only the 2**i multiples of G.)

Alternatively (or while waiting for that work to happen), the current non-optimisation should be disabled by default as in point 3 of #4127 which brings the RAM savings without the performance improvement.

The expected performance improvement range from +120% (ops/sec) for ECDSA signature to 0% for static ECDH, and +40% for ECDHE and ECDSA verification. The overall expected improvement on a typical TLS handshake is about +40% (hs/time unit).

@mpg
Copy link
Contributor Author

mpg commented Feb 9, 2021

Note: the reasons why the current implementation doesn't deliver the expected performance improvement for common use cases such as a TLS ECDHE-ECDSA handshake are twofold:

  • for the ECDHE part, it's the fact that each handshake starts with a fresh ECDH context and throws it away once the handshake is completed, so the supposedly-cached mutliples of G are computed afresh every time and don't stand a chance of being re-used
  • for the ECDSA part it's ECKEY PK structure duplicates itself on every use #2034 - again a fresh context is created for each operation, so time (and RAM) is spend of filling up the cache but it's then discarded before it can be re-used.

Note 2: one could also use a central cache (instead of per-context) in RAM but since the values are known at compile-time (multiples of the standard base point) it makes more sense to have it in ROM/flash.

@gilles-peskine-arm
Copy link
Contributor

the supposedly-cached mutliples of G are computed afresh every time and don't stand a chance of being re-used

I understand that we don't like global data, but at the same time it doesn't really make sense to me that we store and cache a lot of data per key when that data is not specific to a key, but to a curve. Why don't we introduce a global cache? This could be done in 3.x, once the mbedtls_ecp_group structure is opaque.

@mpg
Copy link
Contributor Author

mpg commented Feb 9, 2021

I agree, I'm just not sure there's any reason for the global cache to live in RAM, since all of its content is known at compile time. But yes, the core of the improvement is really to make the cache global - either in flash (filled up at compile time) as the description suggests, or in RAM, filled up at the time of the first access (but this might require some thinking about thread-safety). (We could even have hybrids where only the part that takes the most time to compute is stored in flash and the rest is cached in RAM, if we wanted to be fancy.)

(In case it wasn't clear: I think having a cache per ecp_group structure was a big design mistake and there was no good reason for it other than I failed to think about this properly back then.)

@mpg
Copy link
Contributor Author

mpg commented Feb 9, 2021

I understand that we don't like global data

Btw that's one of the reasons I'm suggesting to store it in flash, as I guess it would be more precise to say that "we don't like global mutable data".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component-crypto Crypto primitives and low-level interfaces enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants