-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ECC: use a bit of ROM rather than lots of RAM and improve performance #4128
Comments
Note: the reasons why the current implementation doesn't deliver the expected performance improvement for common use cases such as a TLS ECDHE-ECDSA handshake are twofold:
Note 2: one could also use a central cache (instead of per-context) in RAM but since the values are known at compile-time (multiples of the standard base point) it makes more sense to have it in ROM/flash. |
I understand that we don't like global data, but at the same time it doesn't really make sense to me that we store and cache a lot of data per key when that data is not specific to a key, but to a curve. Why don't we introduce a global cache? This could be done in 3.x, once the |
I agree, I'm just not sure there's any reason for the global cache to live in RAM, since all of its content is known at compile time. But yes, the core of the improvement is really to make the cache global - either in flash (filled up at compile time) as the description suggests, or in RAM, filled up at the time of the first access (but this might require some thinking about thread-safety). (We could even have hybrids where only the part that takes the most time to compute is stored in flash and the rest is cached in RAM, if we wanted to be fancy.) (In case it wasn't clear: I think having a cache per |
Btw that's one of the reasons I'm suggesting to store it in flash, as I guess it would be more precise to say that "we don't like global mutable data". |
Description
Note: was previously raised internally as IOTSSL-1865/IOTCRYPT-614 in Nov. 2017.
The current implementation of
MBEDTLS_FIXED_POINT_OPTIM
is supposed to be a RAM-performance trade-off but actually doesn't improve performance (while consuming RAM) in common workflows, most notably not in TLS. The only workflow where it actually improves performance is when you perform repeated ECDSA operations with the same context, but not through the PK layer.It is possible to rewrite it in order to:
ecp_grp
must not be shared between threads(Technical details: instead of storing for each
ecp_grp
a copy of the fullT
in RAM, store in ROM only the2**i
multiples ofG
.)Alternatively (or while waiting for that work to happen), the current non-optimisation should be disabled by default as in point 3 of #4127 which brings the RAM savings without the performance improvement.
The expected performance improvement range from +120% (ops/sec) for ECDSA signature to 0% for static ECDH, and +40% for ECDHE and ECDSA verification. The overall expected improvement on a typical TLS handshake is about +40% (hs/time unit).
The text was updated successfully, but these errors were encountered: