-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
secp256k1_ecmult_gen_context_build allocates 90KB on the stack #251
Comments
There is already a discussion happening in issue #189 about how to make this library work well on microcontrollers. I suspect it would be best to just have a precomputed context that is neither created nor destroyed, and gets stored in flash, but I haven't looked into it. |
Precomputing the tables and storing them in the executable could also help in the libbitcoinconsensus case, where we don't really want a dynamically allocated context. |
Precomputing verify would be very undesirable-- it's 1.3MB with current settings (which are optimal for performance, when enough memory is available). Libbitcoinconsensus really should be really using a context in order to avoid the constant dynamic memory allocations all through its internals which it currently has. |
The context brings a lot of issues for users, and risks to use more memory in total. |
Yes, contexts are sharable between threads.Only function that modify them
need exclusive access (randomize being the only one currently).
|
There are actually two problems here: I am not sure it is worth trying to reduce the temporary RAM usage, because the devices where that is the problem generally also have memory-mapped Flash. |
The STM32F205RE microcontroller on the TREZOR has 512 KB of flash and 128 KB of RAM, and it is capable of doing ECDSA signing. So hopefully a signing-capable context can be made to fit within that amount of flash. Of course, the smaller we can make it, the better. I imagine you would run a utility on your computer that uses libsecp256k1 to generate a (randomized?) context for signing, and then it would save it in the form of C code which can be compiled by the cross-compiler for the microcontroller. |
Yes, the best solution IMHO is to have a compile-time option to have a
pre-built context in the binary.
|
DavidEGrayson: the current signing context is 65536 bytes, and it can be trivially adjusted to whatever size makes sense. The size grows exponentially for linear increases in performance-- I've thought it would be good to support a mode that uses less but since realizing that all the current mid sized microcontrollers have memory mapped flash it hardly seems worth carrying around extra support code for smaller sizes. Tdaede suggested on IRC that he was thinking about making a tool that uses the current precomp code to write out a header, and having a switch to use the prefabricated header; which I said sounded good at least for the signing context. |
Fixed by #254 |
dcb2e3b variable signing precompute table (djb) Pull request description: This pull request gives an option to reduce the precomputed table size for the signing context (`ctx`) by setting `#define ECMULT_GEN_PREC_BITS [N_BITS]`. Motivation: Per #251 and #254, the static table can be reduced to 64kB. However, this is still too big for some of my embedded applications. Setting `#define ECMULT_GEN_PREC_BITS 2` produces a 32kB table at a tradeoff of about 75% of the signing speed. Not defining this value will default to the existing implementation of 4 bits. Statistics: ``` ECMULT_GEN_PREC_BITS = 1 Precomputed table size: 32kB ./bench_sign ecdsa_sign: min 195us / avg 200us / max 212us ECMULT_GEN_PREC_BITS = 2 Precomputed table size: 32kB ./bench_sign ecdsa_sign: min 119us / avg 126us / max 134us ECMULT_GEN_PREC_BITS = 4 (default) Precomputed table size: 64kB ./bench_sign ecdsa_sign: min 83.5us / avg 89.6us / max 95.3us ECMULT_GEN_PREC_BITS = 8 Precomputed table size: 512kB ./bench_sign ecdsa_sign: min 96.4us / avg 99.4us / max 104us ``` Only values of 2 and 4 make sense. 8 bits causes a larger table size with no increase in speed. 1 bit runs, actually, but does not reduce table size and is slower than 2 bits. ACKs for top commit: real-or-random: ACK dcb2e3b verified that all changes to the previous ACKed 1d26b27 were due to the rebase jonasnick: ACK dcb2e3b read the code and tested various configurations with valgrind Tree-SHA512: ed6f68ca23ffdc4b59d51525336b34b25521233537edbc74d32dfb3eafd8196419be17f01cbf10bd8d87ce745ce143085abc6034727f742163f7e5f13f26f56e
Microcontrollers have expensive SRAM and cheap Flash. These should be turned into precomputed tables, at least as an option.
The text was updated successfully, but these errors were encountered: