Backport 2.16: range-based constant-flow base64 #4819

gilles-peskine-arm · 2021-07-28T12:50:00Z

Fix #4814: in Mbed TLS 2.26.0 and 2.16.10, we made the base64 code constant-flow by changing table lookup into constant-time table lookup (look up every item and or them together). This had a significant performance cost. This pull request uses a different approach: instead of doing a constant-flow table lookup, do a range-based or. Since base64 has only 5 ranges, as opposed to 64/128 table entries (encoding/decoding), this turns out to be a significant performance improvement. There is also a slight gain in code size (but still a loss compared to the original non-constant-time table approach).

Version	2.16.9	2.16.11	this PR
`load_roots` on 552 certs (x86_64)	11ms	428 ms	21 ms
`size base64.o` (thumb)	1147	1287	1187

This PR also introduces a new sample program which I used for benchmarking. Since it's there I propose to keep it, but extra features (including nontrivial amounts of documentation, or CI integration) are out of scope.

I started with a 2.16 patch because this all started from code I'd written in the Mbed Crypto days. This will need to be ported to all branches - 2.2x, development TODO.

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

Base64 decoding uses equality comparison tests for characters that don't leak information about the content of the data other than its length, such as whitespace. Do this with '=' as well, since it only reveals information about the length. This way the table lookup can focus on character validity and decoding value. Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

Instead of doing constant-flow table lookup, which requires 128 memory loads for each lookup into a 128-entry table, do a range-based calculation, which requires more CPU instructions per range but there are only 5 ranges. Experimentally, this is ~12x faster on my PC (based on programs/x509/load_roots). The code is slightly smaller, too. Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

Document what each local variable does when it isn't obvious from the name. Don't reuse a variable for different purposes. This commit has very little impact on the generated code (same code size on a sample Thumb build), although it does fix a theoretical bug that 2^32 spaces inside a line would be ignored instead of treated as an error. Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

Instead of doing constant-flow table lookup, which requires 64 memory loads for each lookup into a 64-entry table, do a range-based calculation, which requires more CPU instructions per range but there are only 5 ranges. I expect a significant performance gain (although smaller than for decoding since the encoding table is half the size), but I haven't measured. Code size is slightly smaller. Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

TRodziewicz · 2021-07-28T15:02:52Z

programs/x509/load_roots.c

+    struct mbedtls_timing_hr_time timer;
+    unsigned long ms;
+
+    if( argc == 0 )


Suggested change

if( argc == 0 )

if( argc < 2 )

+1, when testing the program I didn't manage to get the usage string printed.

I see a couple of places in the code where same comparison is made before printing USAGE. They should be at least subject of deeper look because they can be quite legit.

mpg

This is looking very good. It's very close to the previous version, as the main different is the constant-time table-lookup function was replaced with a function that does mostly the same thing but with a different implementation, so I think the risk of introducing issues is pretty low. I've reviewed the other changes carefully and couldn't find any place that would leak secret information, all the information that's leaked is the position of special chars in the input and the length of the output, none of which are sensitive.

Regarding testing, considering there's a fairly large amount of information that's acceptable to leak, it probably doesn't make sense to try making the whole decode function testable with valgrind/memsan, but I'm thinking perhaps dec_value could be made STATIC_TESTABLE and tested with an input marked as TEST_CF_SECRETin the test suite? (Obviously that would be for the forward ports, as we don't have those testing facilities in 2.16.)

Other than that, I left a couple of mostly minor comments.

library/base64.c

TRodziewicz

LGTM - for me it looks like constant-flow/time code.

programs/x509/load_roots.c

n was used for two different purposes. Give it a different name the second time. This does not seem to change the generated code when compiling with optimization for size or performance. Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

To test c <= high, instead of testing the sign of (high + 1) - c, negate the sign of high - c (as we're doing for c - low). This is a little easier to read and shaves 2 instructions off the arm thumb build with arm-none-eabi-gcc 7.3.1. Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

I had originally thought to support directories with mbedtls_x509_crt_parse_path but it would have complicated the code more than I cared for. Remove a remnant of the original project in the documentation. Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

gilles-peskine-arm · 2021-07-30T11:08:12Z

Regarding testing, considering there's a fairly large amount of information that's acceptable to leak, it probably doesn't make sense to try making the whole decode function testable with valgrind/memsan, but I'm thinking perhaps dec_value could be made STATIC_TESTABLE and tested with an input marked as TEST_CF_SECRETin the test suite? (Obviously that would be for the forward ports, as we don't have those testing facilities in 2.16.)

Yes, I think enc_char and dec_value should use this testing. There's some delicate logic in mbedtls_base64_decode which would be hard to test because it relies heavily on the fact that it's ok to test the equality of an input character against a non-digit constant. My earlier version at https://github.com/gilles-peskine-arm/mbedtls/tree/base64-no-table-2.16.9/library/base64.c made this simpler by first doing a “character type” calculation in constant time, and then basing the logic in mbedtls_base64_decode on this non-sensitive partial information, but this had a measurable cost in code size and performance.

mpg

Thanks for addressing my feedback, looks all good to me now!

davidhorstmann-arm

Noticed one typo - sorry I did not get this on the first round.

davidhorstmann-arm · 2021-08-18T09:13:37Z

ChangeLog.d/base64-ranges.txt

+Changes
+   * Improve the performance of base64 constant-flow code. The result is still
+     slower than the original non-constant-flow implementation, but much faster
+     than the previous constant-flow implemenation. Fixes #4814.


Typo nit: "implemenation" -> "implementation"

akien-mga · 2021-10-24T11:43:58Z

Hello! This seems to have been stalled for two months due to two issues in some comments. Is there any hope to see this finalized and merged for the next 2.16.x release? Alternatively, is this in a good enough state that we could use our own backport of this PR downstream (specifically the Godot game engine which uses mbedtls 2.16 and is affected by this regression)?

gilles-peskine-arm · 2021-10-25T15:18:39Z

@akien-mga Thanks for the ping! This seems to have gotten lost in people successively taking over then going on vacation and coming back to other tasks. We're bringing back to the top of the queue. In the meantime, the fix in this pull request is approved except for some documentation issues and has passed our CI, so you can safely go ahead and pick it up.

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

mpg

LGTM

mpg · 2021-10-26T07:46:27Z

I checked that all of the CI failures in the pr-head job are instances of #5012 and the pr-merge job passes, so the pr-head failures can be ignored.

davidhorstmann-arm

LGTM

gilles-peskine-arm added 6 commits July 26, 2021 20:20

New sample program to benchmark certificate loading

7023d6a

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

Base64 range-based constant-flow code: changelog entry

32e9530

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

gilles-peskine-arm requested a review from mpg July 28, 2021 12:50

gilles-peskine-arm mentioned this pull request Jul 28, 2021

Huge certificate parsing speed regression between mbedtls 2.16.9 and 2.16.10 (constant time base64). #4814

Closed

TRodziewicz reviewed Jul 28, 2021

View reviewed changes

mpg requested changes Jul 30, 2021

View reviewed changes

library/base64.c Outdated Show resolved Hide resolved

library/base64.c Outdated Show resolved Hide resolved

library/base64.c Show resolved Hide resolved

library/base64.c Show resolved Hide resolved

TRodziewicz previously approved these changes Jul 30, 2021

View reviewed changes

mpg reviewed Jul 30, 2021

View reviewed changes

programs/x509/load_roots.c Outdated Show resolved Hide resolved

mpg reviewed Jul 30, 2021

View reviewed changes

programs/x509/load_roots.c Show resolved Hide resolved

gilles-peskine-arm added 5 commits July 30, 2021 12:56

Base64 decode: simplify local variables (n)

231b67a

n was used for two different purposes. Give it a different name the second time. This does not seem to change the generated code when compiling with optimization for size or performance. Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

load_roots: fix no-argument detection

051b1da

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

load_roots: properly error out on an invalid option

9f6ab29

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

gilles-peskine-arm dismissed TRodziewicz’s stale review via 9f6ab29 July 30, 2021 11:02

gilles-peskine-arm requested a review from mpg July 30, 2021 13:57

mpg previously approved these changes Aug 2, 2021

View reviewed changes

mpg requested a review from TRodziewicz August 2, 2021 07:42

davidhorstmann-arm reviewed Aug 18, 2021

View reviewed changes

mpg added the needs-work label Aug 19, 2021

krispraws mentioned this pull request Sep 22, 2021

mbedtls upgrade from 2.24.0 to 2.26.0 causes significant performance regression in mbedtls_base64_encode fluent/fluent-bit#4110

Closed

gilles-peskine-arm assigned gilles-peskine-arm and unassigned tom-daubney-arm Oct 25, 2021

akien-mga mentioned this pull request Oct 25, 2021

mbedtls: Backport upstream PR 4819 to fix certs parsing speed godotengine/godot#54232

Merged

gilles-peskine-arm added 2 commits October 25, 2021 21:13

Fix copypasta in comment

cda1281

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

Fix typo in documentation

8e82c78

Signed-off-by: Gilles Peskine <Gilles.Peskine@arm.com>

gilles-peskine-arm dismissed stale reviews from mpg and TRodziewicz via 8e82c78 October 25, 2021 19:14

gilles-peskine-arm requested review from davidhorstmann-arm and mpg October 25, 2021 19:14

gilles-peskine-arm added needs-review Every commit must be reviewed by at least two team members, and removed needs-work labels Oct 25, 2021

gilles-peskine-arm mentioned this pull request Oct 25, 2021

range-based constant-flow base64 #5108

Merged

mpg approved these changes Oct 26, 2021

View reviewed changes

davidhorstmann-arm approved these changes Oct 26, 2021

View reviewed changes

mpg added approved Design and code approved - may be waiting for CI or backports and removed needs-review Every commit must be reviewed by at least two team members, labels Oct 27, 2021

mpg merged commit 70227d2 into Mbed-TLS:mbedtls-2.16 Oct 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport 2.16: range-based constant-flow base64 #4819

Backport 2.16: range-based constant-flow base64 #4819

gilles-peskine-arm commented Jul 28, 2021 •

edited by mpg

Loading

TRodziewicz Jul 28, 2021

mpg Jul 30, 2021

TRodziewicz Jul 30, 2021

mpg left a comment

TRodziewicz left a comment

gilles-peskine-arm commented Jul 30, 2021

mpg left a comment

davidhorstmann-arm left a comment

davidhorstmann-arm Aug 18, 2021

akien-mga commented Oct 24, 2021 •

edited

Loading

gilles-peskine-arm commented Oct 25, 2021

mpg left a comment

mpg commented Oct 26, 2021

davidhorstmann-arm left a comment

Backport 2.16: range-based constant-flow base64 #4819

Backport 2.16: range-based constant-flow base64 #4819

Conversation

gilles-peskine-arm commented Jul 28, 2021 • edited by mpg Loading

TRodziewicz Jul 28, 2021

Choose a reason for hiding this comment

mpg Jul 30, 2021

Choose a reason for hiding this comment

TRodziewicz Jul 30, 2021

Choose a reason for hiding this comment

mpg left a comment

Choose a reason for hiding this comment

TRodziewicz left a comment

Choose a reason for hiding this comment

gilles-peskine-arm commented Jul 30, 2021

mpg left a comment

Choose a reason for hiding this comment

davidhorstmann-arm left a comment

Choose a reason for hiding this comment

davidhorstmann-arm Aug 18, 2021

Choose a reason for hiding this comment

akien-mga commented Oct 24, 2021 • edited Loading

gilles-peskine-arm commented Oct 25, 2021

mpg left a comment

Choose a reason for hiding this comment

mpg commented Oct 26, 2021

davidhorstmann-arm left a comment

Choose a reason for hiding this comment

gilles-peskine-arm commented Jul 28, 2021 •

edited by mpg

Loading

akien-mga commented Oct 24, 2021 •

edited

Loading