-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport 2.16: range-based constant-flow base64 #4819
Backport 2.16: range-based constant-flow base64 #4819
Conversation
Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
Base64 decoding uses equality comparison tests for characters that don't leak information about the content of the data other than its length, such as whitespace. Do this with '=' as well, since it only reveals information about the length. This way the table lookup can focus on character validity and decoding value. Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
Instead of doing constant-flow table lookup, which requires 128 memory loads for each lookup into a 128-entry table, do a range-based calculation, which requires more CPU instructions per range but there are only 5 ranges. Experimentally, this is ~12x faster on my PC (based on programs/x509/load_roots). The code is slightly smaller, too. Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
Document what each local variable does when it isn't obvious from the name. Don't reuse a variable for different purposes. This commit has very little impact on the generated code (same code size on a sample Thumb build), although it does fix a theoretical bug that 2^32 spaces inside a line would be ignored instead of treated as an error. Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
Instead of doing constant-flow table lookup, which requires 64 memory loads for each lookup into a 64-entry table, do a range-based calculation, which requires more CPU instructions per range but there are only 5 ranges. I expect a significant performance gain (although smaller than for decoding since the encoding table is half the size), but I haven't measured. Code size is slightly smaller. Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
programs/x509/load_roots.c
Outdated
struct mbedtls_timing_hr_time timer; | ||
unsigned long ms; | ||
|
||
if( argc == 0 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if( argc == 0 ) | |
if( argc < 2 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, when testing the program I didn't manage to get the usage string printed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a couple of places in the code where same comparison is made before printing USAGE. They should be at least subject of deeper look because they can be quite legit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking very good. It's very close to the previous version, as the main different is the constant-time table-lookup function was replaced with a function that does mostly the same thing but with a different implementation, so I think the risk of introducing issues is pretty low. I've reviewed the other changes carefully and couldn't find any place that would leak secret information, all the information that's leaked is the position of special chars in the input and the length of the output, none of which are sensitive.
Regarding testing, considering there's a fairly large amount of information that's acceptable to leak, it probably doesn't make sense to try making the whole decode function testable with valgrind/memsan, but I'm thinking perhaps dec_value
could be made STATIC_TESTABLE
and tested with an input marked as TEST_CF_SECRET
in the test suite? (Obviously that would be for the forward ports, as we don't have those testing facilities in 2.16.)
Other than that, I left a couple of mostly minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - for me it looks like constant-flow/time code.
n was used for two different purposes. Give it a different name the second time. This does not seem to change the generated code when compiling with optimization for size or performance. Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
To test c <= high, instead of testing the sign of (high + 1) - c, negate the sign of high - c (as we're doing for c - low). This is a little easier to read and shaves 2 instructions off the arm thumb build with arm-none-eabi-gcc 7.3.1. Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
I had originally thought to support directories with mbedtls_x509_crt_parse_path but it would have complicated the code more than I cared for. Remove a remnant of the original project in the documentation. Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
Yes, I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my feedback, looks all good to me now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed one typo - sorry I did not get this on the first round.
ChangeLog.d/base64-ranges.txt
Outdated
Changes | ||
* Improve the performance of base64 constant-flow code. The result is still | ||
slower than the original non-constant-flow implementation, but much faster | ||
than the previous constant-flow implemenation. Fixes #4814. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo nit: "implemenation" -> "implementation"
Hello! This seems to have been stalled for two months due to two issues in some comments. Is there any hope to see this finalized and merged for the next 2.16.x release? Alternatively, is this in a good enough state that we could use our own backport of this PR downstream (specifically the Godot game engine which uses mbedtls 2.16 and is affected by this regression)? |
@akien-mga Thanks for the ping! This seems to have gotten lost in people successively taking over then going on vacation and coming back to other tasks. We're bringing back to the top of the queue. In the meantime, the fix in this pull request is approved except for some documentation issues and has passed our CI, so you can safely go ahead and pick it up. |
Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>
8e82c78
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I checked that all of the CI failures in the pr-head job are instances of #5012 and the pr-merge job passes, so the pr-head failures can be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fix #4814: in Mbed TLS 2.26.0 and 2.16.10, we made the base64 code constant-flow by changing table lookup into constant-time table lookup (look up every item and or them together). This had a significant performance cost. This pull request uses a different approach: instead of doing a constant-flow table lookup, do a range-based or. Since base64 has only 5 ranges, as opposed to 64/128 table entries (encoding/decoding), this turns out to be a significant performance improvement. There is also a slight gain in code size (but still a loss compared to the original non-constant-time table approach).
load_roots
on 552 certs (x86_64)size base64.o
(thumb)This PR also introduces a new sample program which I used for benchmarking. Since it's there I propose to keep it, but extra features (including nontrivial amounts of documentation, or CI integration) are out of scope.
I started with a 2.16 patch because this all started from code I'd written in the Mbed Crypto days. This will need to be ported to all branches - 2.2x, development TODO.