Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/rsa: new key generation prohibitively slow under race detector #70644

Closed
rsc opened this issue Dec 2, 2024 · 16 comments
Closed

crypto/rsa: new key generation prohibitively slow under race detector #70644

rsc opened this issue Dec 2, 2024 · 16 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Dec 2, 2024

Although recent changes to crypto/rsa made key generation 3-4X slower and then improved that to only 50% slower, when run under the race detector the code still runs about 10X slower than before. Using go test golang.org/x/crypto/openpgp/clearsign as a test, here are the timings on my M3 MacBook Pro using different crypto commits:

# fa38b41be9 crypto/internal/fips140/rsa: check that e and N are odd
% go test .
ok  	golang.org/x/crypto/openpgp/clearsign	8.572s
% go test -race .
ok  	golang.org/x/crypto/openpgp/clearsign	18.289s
% 

# 7d7192e54f crypto/rsa: move precomputation to crypto/internal/fips140/rsa
% go test .
ok  	golang.org/x/crypto/openpgp/clearsign	8.469s
% go test -race .
ok  	golang.org/x/crypto/openpgp/clearsign	17.484s
% 

# acd54c9985 crypto/rsa: move key generation to crypto/internal/fips140/rsa
% go test .
ok  	golang.org/x/crypto/openpgp/clearsign	33.242s
% go test -race .
ok  	golang.org/x/crypto/openpgp/clearsign	180.334s


# c5c4f3dd5f crypto/x509: keep RSA CRT values in ParsePKCS1PrivateKey
% cd
% cd src/golang.org/x/crypto/openpgp/clearsign
% go test .
ok  	golang.org/x/crypto/openpgp/clearsign	25.387s
% go test -race .
ok  	golang.org/x/crypto/openpgp/clearsign	179.546s

# dd7ab5ec5d crypto/internal/fips140/rsa: do trial divisions in key generation
% go test .
ok  	golang.org/x/crypto/openpgp/clearsign	12.057s
% go test -race .
ok  	golang.org/x/crypto/openpgp/clearsign	212.460s
% 
@rsc rsc added NeedsFix The path to resolution is known, but the work has not been done. release-blocker labels Dec 2, 2024
@rsc rsc added this to the Go1.24 milestone Dec 2, 2024
@rsc
Copy link
Contributor Author

rsc commented Dec 2, 2024

/cc @rolandshoemaker @FiloSottile

@rsc
Copy link
Contributor Author

rsc commented Dec 2, 2024

Here is a more direct measurement, using crypto/rsa BenchmarkGenerateKey:

% go test -bench=GenerateKey
goos: darwin
goarch: arm64
pkg: crypto/rsa
cpu: Apple M3 Pro
BenchmarkGenerateKey/2048-12         	      24	 102854269 ns/op
PASS
ok  	crypto/rsa	3.976s
% go test -race -bench=GenerateKey
goos: darwin
goarch: arm64
pkg: crypto/rsa
cpu: Apple M3 Pro
BenchmarkGenerateKey/2048-12         	       8	 166628526 ns/op
PASS
ok  	crypto/rsa	6.051s
% git checkout acd54c9985  # crypto/rsa: move key generation to crypto/internal/fips140/rsa
% go test -bench=GenerateKey
goos: darwin
goarch: arm64
pkg: crypto/rsa
cpu: Apple M3 Pro
BenchmarkGenerateKey/2048-12         	       6	 421387931 ns/op
PASS
ok  	crypto/rsa	7.552s
% go test -race -bench=GenerateKey
goos: darwin
goarch: arm64
pkg: crypto/rsa
cpu: Apple M3 Pro
BenchmarkGenerateKey/2048-12         	       1	1534678542 ns/op
PASS
ok  	crypto/rsa	62.343s
% git checkout master
% go test -bench=GenerateKey
goos: darwin
goarch: arm64
pkg: crypto/rsa
cpu: Apple M3 Pro
BenchmarkGenerateKey/2048-12         	      73	 124260540 ns/op
PASS
ok  	crypto/rsa	11.227s
% go test -race -bench=GenerateKey
goos: darwin
goarch: arm64
pkg: crypto/rsa
cpu: Apple M3 Pro
BenchmarkGenerateKey/2048-12         	       1	6158143333 ns/op
PASS
ok  	crypto/rsa	47.660s
% 

With the improved trial divisions, GenerateKey takes 25% longer than before in non-race mode, but in race mode it takes 300% longer (4X the time).

Is the issue that bigmod has more loops not in assembly, and so more instrumented race loops?

@rolandshoemaker
Copy link
Member

rolandshoemaker commented Dec 2, 2024

Profiling suggests the addMulVVW loop not being in assembly may be biting us here (for the non-optimized sizes, > 2048).

@rsc
Copy link
Contributor Author

rsc commented Dec 2, 2024

The GenerateKey benchmark is only 2048-bit, so addMulVVW should not be the issue. But if it is, maybe we should add a //go:norace to it.

@rolandshoemaker
Copy link
Member

I updated the benchmark to test larger key sizes as well, as it seems like the slowdown is non-linear, but for really large keys maybe we just don't care.

@rolandshoemaker
Copy link
Member

Regardless, it seems like we're spending most of the new time in InverseVarTime, which makes sense. I cannot see particularly why that would be all that much worse than the old approach, but my understanding of the race detector is rather rudimentary.

Funnily I see significantly worse numbers (with race detector enabled for both runs), but this might just be an example of Apple chips being extremely weird to benchmark on:

goos: darwin
goarch: arm64
pkg: crypto/rsa
cpu: Apple M1 Pro
                    │   rsa-old    │                 rsa-new                 │
                    │    sec/op    │    sec/op      vs base                  │
GenerateKey/2048-10   200.1m ± 12%   2723.4m ± 62%  +1260.83% (p=0.000 n=10)

@rsc
Copy link
Contributor Author

rsc commented Dec 2, 2024

I poked at this some more and have a pair of CLs,
one adding //go:norace to all the loops that used to be assembly in math/big,
and one removing the use of temporaries in Add, Sub, and maybeSubtractModulus.
Combined they reduce the race detector cost by about 10X.
I will gather benchmarks and then send them out.

@magical
Copy link
Contributor

magical commented Dec 3, 2024

Just noting that Russ's CLs are CL 632978 (merged) and CL 632979 (on hold). (Neither one mentions this issue, so we didn't get the usual pingback from gopherbot)

@mknyszek
Copy link
Contributor

mknyszek commented Dec 4, 2024

This is still marked as a release blocker, but as @magical notes, CL 632978 appears to work around the issue. Can we close this issue? Thanks.

@rolandshoemaker
Copy link
Member

https://go.dev/cl/633995 also addressed this.

@rsc
Copy link
Contributor Author

rsc commented Dec 6, 2024

I will close this issue once I have an analysis ready to post. It looks like the problem is fixed.

@rsc
Copy link
Contributor Author

rsc commented Dec 6, 2024

I wrote a short RSA key generation benchmark:

package rsa_test

import (
	"crypto/rand"
	"crypto/rsa"
	"fmt"
	"io"
	"testing"
)

type reader struct{ io.Reader }

func BenchmarkGenerateKey(b *testing.B) {
	for _, kind := range []string{"Rand=Std", "Rand=NonStd"} {
		r := rand.Reader
		if kind == "Rand=NonStd" {
			r = &reader{r}
		}
		for _, size := range []int{1024, 1536, 2048, 3072, 4096, 6144, 8192} {
			b.Run(fmt.Sprint(kind, "/", size), func(b *testing.B) {
				for range b.N {
					if _, err := rsa.GenerateKey(r, size); err != nil {
						b.Fatal(err)
					}
				}
			})
		}
	}
}

I ran this against Go toolchains built at four versions, using GOEXPERIMENT=boringcrypto. Rand=Std and Rand=NonStd should behave identically, except that Rand=Std/2048, /3072, and /4096 use BoringCrypto, and other key sizes or non-crypto/rand.Reader randomness sources fall back to non-BoringCrypto paths. (I am making sure to test BoringCrypto because I wanted to see how our new FIPS code compared to the old FIPS "solution".)

Here are a sequence of tables showing each version separately, measuring the base time and then the time under sanitizer (race, asn, msan). I only ran key sizes up to 2048 because the intermediate versions were so slow. The Rand=Std/N and Rand=NonStd/N lines should measure identically, except that (as just noted) Rand=Std/2048 is a special case that dispatches to BoringCrypto in base, race, and asan builds (but not msan). BoringCrypto is invisible to base, race, and asan, so you don't see that line slow down in those modes.

You can see in these tables that those two intermediate versions were extremely slow, especially for asan and msan! It's surprising to me how much slower asan and msan are than race. I'd always thought race had the hardest job of the three, but maybe @dvyukov did a better job on the implementation.

$ benchstat -col 'mode@(base race asan msan)' -table version rsa.log
version: 622238
                               │     base     │                  race                   │                  asan                   │                  msan                   │
                               │    sec/op    │    sec/op      vs base                  │    sec/op      vs base                  │    sec/op      vs base                  │
GenerateKey/Rand=Std/1024-4      18.11m ±  9%    39.54m ± 10%  +118.31% (p=0.000 n=240)    71.68m ±  7%  +295.72% (p=0.000 n=240)    50.04m ± 10%  +176.29% (p=0.000 n=240)
GenerateKey/Rand=Std/1536-4      64.03m ±  9%   108.28m ± 12%   +69.11% (p=0.000 n=240)   201.64m ±  9%  +214.92% (p=0.000 n=240)   143.43m ± 10%  +124.02% (p=0.000 n=240)
GenerateKey/Rand=Std/2048-4      80.26m ± 12%    80.87m ± 11%         ~ (p=0.788 n=240)    88.20m ±  8%         ~ (p=0.227 n=240)   353.62m ± 12%  +340.61% (p=0.000 n=240)
GenerateKey/Rand=NonStd/1024-4   16.02m ± 12%    36.63m ± 10%  +128.64% (p=0.000 n=240)    74.95m ± 10%  +367.84% (p=0.000 n=240)    52.13m ± 10%  +225.37% (p=0.000 n=240)
GenerateKey/Rand=NonStd/1536-4   64.56m ±  8%   107.94m ± 12%   +67.21% (p=0.000 n=240)   205.41m ± 11%  +218.18% (p=0.000 n=240)   158.02m ± 13%  +144.78% (p=0.000 n=240)
GenerateKey/Rand=NonStd/2048-4   139.0m ± 11%    238.7m ± 13%   +71.77% (p=0.000 n=240)    435.6m ± 14%  +213.46% (p=0.000 n=240)    349.6m ± 11%  +151.60% (p=0.000 n=240)
geomean                          48.72m          82.99m         +70.34%                    143.0m        +193.49%                    139.3m        +185.93%

version: 632775
                               │     base     │                   race                   │                     asan                      │                     msan                      │
                               │    sec/op    │    sec/op      vs base                   │     sec/op      vs base                       │     sec/op      vs base                       │
GenerateKey/Rand=Std/1024-4      33.31m ± 13%   554.74m ±  9%  +1565.45% (p=0.000 n=320)   1861.09m ±  9%  +5487.44% (p=0.000 n=320)       1238.93m ± 13%  +3619.55% (p=0.000 n=320+310)
GenerateKey/Rand=Std/1536-4      117.0m ± 11%   1652.0m ± 10%  +1311.84% (p=0.000 n=320)    6959.7m ± 14%  +5847.80% (p=0.000 n=320)        4247.1m ± 11%  +3529.62% (p=0.000 n=320+305)
GenerateKey/Rand=Std/2048-4      81.17m ±  8%    80.00m ±  7%          ~ (p=0.419 n=320)     81.63m ± 10%          ~ (p=0.896 n=320)       6816.73m ± 15%  +8297.89% (p=0.000 n=320+296)
GenerateKey/Rand=NonStd/1024-4   38.38m ± 12%   563.26m ± 13%  +1367.58% (p=0.000 n=320)   1924.62m ± 10%  +4914.62% (p=0.000 n=320)       1269.57m ± 12%  +3207.88% (p=0.000 n=320+295)
GenerateKey/Rand=NonStd/1536-4   123.7m ± 12%   2029.4m ± 10%  +1541.13% (p=0.000 n=320)    6254.0m ± 13%  +4957.43% (p=0.000 n=320)        4120.4m ± 14%  +3232.09% (p=0.000 n=320+290)
GenerateKey/Rand=NonStd/2048-4   192.5m ± 19%   2620.8m ± 15%  +1261.54% (p=0.000 n=320)    9427.0m ± 10%  +4797.47% (p=0.000 n=320+317)    6292.5m ± 15%  +3169.07% (p=0.000 n=320+278)
geomean                          81.31m          776.7m         +855.27%                      2.221        +2631.21%                          3.251        +3898.24%

version: 632978
                               │     base     │                  race                  │                     asan                      │                     msan                      │
                               │    sec/op    │    sec/op     vs base                  │     sec/op      vs base                       │     sec/op      vs base                       │
GenerateKey/Rand=Std/1024-4      35.04m ± 10%   91.53m ± 12%  +161.21% (p=0.000 n=265)   1957.35m ± 11%  +5485.98% (p=0.000 n=265)       1327.60m ± 10%  +3688.78% (p=0.000 n=265+245)
GenerateKey/Rand=Std/1536-4      109.0m ± 15%   238.5m ± 17%  +118.69% (p=0.000 n=265)    5900.2m ± 15%  +5310.95% (p=0.000 n=265+261)    4200.0m ± 13%  +3751.74% (p=0.000 n=265+240)
GenerateKey/Rand=Std/2048-4      82.24m ± 10%   81.28m ±  8%         ~ (p=0.967 n=265)     81.28m ± 10%          ~ (p=0.923 n=265+255)   6534.53m ± 13%  +7845.25% (p=0.000 n=265+240)
GenerateKey/Rand=NonStd/1024-4   39.14m ± 10%   86.05m ± 13%  +119.87% (p=0.000 n=265)   1827.46m ± 11%  +4569.20% (p=0.000 n=265+255)   1246.42m ± 14%  +3084.63% (p=0.000 n=265+240)
GenerateKey/Rand=NonStd/1536-4   105.0m ± 12%   226.1m ±  9%  +115.43% (p=0.000 n=265)    5789.5m ± 13%  +5415.74% (p=0.000 n=265+250)    4041.3m ± 10%  +3750.18% (p=0.000 n=265+240)
GenerateKey/Rand=NonStd/2048-4   197.6m ± 11%   371.2m ± 10%   +87.84% (p=0.000 n=265)    8842.7m ± 14%  +4374.89% (p=0.000 n=265+245)    6741.7m ± 16%  +3311.64% (p=0.000 n=265+240)
geomean                          79.64m         153.0m         +92.08%                      2.108        +2547.36%                          3.277        +4014.29%

version: 633995
                               │     base     │                 race                  │                 asan                  │                  msan                   │
                               │    sec/op    │    sec/op     vs base                 │    sec/op     vs base                 │    sec/op      vs base                  │
GenerateKey/Rand=Std/1024-4      36.52m ± 13%   63.49m ± 15%  +73.84% (p=0.000 n=240)   55.65m ±  9%  +52.38% (p=0.000 n=240)    52.63m ± 10%   +44.11% (p=0.000 n=240)
GenerateKey/Rand=Std/1536-4      117.5m ± 13%   151.0m ± 13%  +28.48% (p=0.000 n=240)   142.2m ± 14%  +21.02% (p=0.001 n=240)    145.2m ± 11%   +23.52% (p=0.001 n=240)
GenerateKey/Rand=Std/2048-4      83.68m ±  9%   78.60m ± 11%        ~ (p=0.128 n=240)   77.54m ± 10%        ~ (p=0.145 n=240)   256.64m ± 14%  +206.70% (p=0.000 n=240)
GenerateKey/Rand=NonStd/1024-4   36.50m ± 14%   55.19m ± 13%  +51.19% (p=0.000 n=240)   44.87m ± 11%  +22.92% (p=0.000 n=240)    50.46m ± 17%   +38.25% (p=0.000 n=240)
GenerateKey/Rand=NonStd/1536-4   117.6m ± 11%   143.0m ± 10%  +21.53% (p=0.001 n=240)   125.3m ± 15%        ~ (p=0.359 n=240)    129.6m ± 11%         ~ (p=0.175 n=240)
GenerateKey/Rand=NonStd/2048-4   224.5m ± 12%   264.9m ± 12%  +17.97% (p=0.013 n=240)   243.3m ±  9%        ~ (p=0.575 n=240)    257.9m ± 14%         ~ (p=0.098 n=240)
geomean                          83.80m         107.9m        +28.72%                   97.12m        +15.90%                    122.1m         +45.66%
$ 

And here it is flipped so that the columns compare versions and the tables show different modes. You can see in this form that although the base RSA operations have gotten slower, as expected because of the use of constant-time bignum routines, the asan and msan modes have actually gotten faster, because we've hidden more from them than we did when using math/big. Again race is somehow an exception, as is the Std/2048 BoringCrypto line (but at least we understand that one).

$ benchstat -col version -table 'mode@(base race asan msan)' rsa.log
mode: base
                               │   622238     │                   632775                    │                   632978                    │                 633995                  │
                               │    sec/op    │    sec/op      vs base                      │    sec/op      vs base                      │    sec/op      vs base                  │
GenerateKey/Rand=Std/1024-4      18.11m ±  9%    33.31m ± 13%   +83.89% (p=0.000 n=240+320)    35.04m ± 10%   +93.45% (p=0.000 n=240+265)    36.52m ± 13%  +101.64% (p=0.000 n=240)
GenerateKey/Rand=Std/1536-4      64.03m ±  9%   117.01m ± 11%   +82.75% (p=0.000 n=240+320)   109.04m ± 15%   +70.30% (p=0.000 n=240+265)   117.51m ± 13%   +83.53% (p=0.000 n=240)
GenerateKey/Rand=Std/2048-4      80.26m ± 12%    81.17m ±  8%         ~ (p=0.647 n=240+320)    82.24m ± 10%         ~ (p=0.665 n=240+265)    83.68m ±  9%         ~ (p=0.659 n=240)
GenerateKey/Rand=NonStd/1024-4   16.02m ± 12%    38.38m ± 12%  +139.56% (p=0.000 n=240+320)    39.14m ± 10%  +144.30% (p=0.000 n=240+265)    36.50m ± 14%  +127.83% (p=0.000 n=240)
GenerateKey/Rand=NonStd/1536-4   64.56m ±  8%   123.66m ± 12%   +91.55% (p=0.000 n=240+320)   104.96m ± 12%   +62.59% (p=0.000 n=240+265)   117.64m ± 11%   +82.22% (p=0.000 n=240)
GenerateKey/Rand=NonStd/2048-4   139.0m ± 11%    192.5m ± 19%   +38.53% (p=0.000 n=240+320)    197.6m ± 11%   +42.21% (p=0.000 n=240+265)    224.5m ± 12%   +61.59% (p=0.000 n=240)
geomean                          48.72m          81.31m         +66.89%                        79.64m         +63.45%                        83.80m         +71.99%

mode: race
                               │   622238     │                   632775                     │                  632978                    │                633995                 │
                               │    sec/op    │    sec/op      vs base                       │    sec/op     vs base                      │    sec/op     vs base                 │
GenerateKey/Rand=Std/1024-4      39.54m ± 10%   554.74m ±  9%  +1302.88% (p=0.000 n=240+320)   91.53m ± 12%  +131.47% (p=0.000 n=240+265)   63.49m ± 15%  +60.57% (p=0.000 n=240)
GenerateKey/Rand=Std/1536-4      108.3m ± 12%   1652.0m ± 10%  +1425.73% (p=0.000 n=240+320)   238.5m ± 17%  +120.23% (p=0.000 n=240+265)   151.0m ± 13%  +39.43% (p=0.000 n=240)
GenerateKey/Rand=Std/2048-4      80.87m ± 11%    80.00m ±  7%          ~ (p=0.156 n=240+320)   81.28m ±  8%         ~ (p=0.478 n=240+265)   78.60m ± 11%        ~ (p=0.151 n=240)
GenerateKey/Rand=NonStd/1024-4   36.63m ± 10%   563.26m ± 13%  +1437.68% (p=0.000 n=240+320)   86.05m ± 13%  +134.92% (p=0.000 n=240+265)   55.19m ± 13%  +50.66% (p=0.000 n=240)
GenerateKey/Rand=NonStd/1536-4   107.9m ± 12%   2029.4m ± 10%  +1780.06% (p=0.000 n=240+320)   226.1m ±  9%  +109.48% (p=0.000 n=240+265)   143.0m ± 10%  +32.44% (p=0.000 n=240)
GenerateKey/Rand=NonStd/2048-4   238.7m ± 13%   2620.8m ± 15%   +998.05% (p=0.000 n=240+320)   371.2m ± 10%   +55.52% (p=0.000 n=240+265)   264.9m ± 12%        ~ (p=0.178 n=240)
geomean                          82.99m          776.7m         +835.92%                       153.0m         +84.32%                       107.9m        +29.96%

mode: asan
                               │   622238     │                    632775                     │                    632978                     │                633995                 │
                               │    sec/op    │     sec/op      vs base                       │     sec/op      vs base                       │    sec/op     vs base                 │
GenerateKey/Rand=Std/1024-4      71.68m ±  7%   1861.09m ±  9%  +2496.51% (p=0.000 n=240+320)   1957.35m ± 11%  +2630.80% (p=0.000 n=240+265)   55.65m ±  9%  -22.36% (p=0.000 n=240)
GenerateKey/Rand=Std/1536-4      201.6m ±  9%    6959.7m ± 14%  +3351.52% (p=0.000 n=240+320)    5900.2m ± 15%  +2826.10% (p=0.000 n=240+261)   142.2m ± 14%  -29.47% (p=0.000 n=240)
GenerateKey/Rand=Std/2048-4      88.20m ±  8%     81.63m ± 10%          ~ (p=0.112 n=240+320)     81.28m ± 10%          ~ (p=0.081 n=240+255)   77.54m ± 10%  -12.09% (p=0.024 n=240)
GenerateKey/Rand=NonStd/1024-4   74.95m ± 10%   1924.62m ± 10%  +2467.79% (p=0.000 n=240+320)   1827.46m ± 11%  +2338.16% (p=0.000 n=240+255)   44.87m ± 11%  -40.14% (p=0.000 n=240)
GenerateKey/Rand=NonStd/1536-4   205.4m ± 11%    6254.0m ± 13%  +2944.63% (p=0.000 n=240+320)    5789.5m ± 13%  +2718.53% (p=0.000 n=240+250)   125.3m ± 15%  -39.00% (p=0.000 n=240)
GenerateKey/Rand=NonStd/2048-4   435.6m ± 14%    9427.0m ± 10%  +2064.35% (p=0.000 n=240+317)    8842.7m ± 14%  +1930.22% (p=0.000 n=240+245)   243.3m ±  9%  -44.14% (n=240)
geomean                          143.0m            2.221        +1453.07%                          2.108        +1374.40%                       97.12m        -32.08%

mode: msan
                               │   622238     │                    632775                     │                    632978                     │                633995                 │
                               │    sec/op    │     sec/op      vs base                       │     sec/op      vs base                       │    sec/op     vs base                 │
GenerateKey/Rand=Std/1024-4      50.04m ± 10%   1238.93m ± 13%  +2375.69% (p=0.000 n=240+310)   1327.60m ± 10%  +2552.88% (p=0.000 n=240+245)   52.63m ± 10%        ~ (p=0.663 n=240)
GenerateKey/Rand=Std/1536-4      143.4m ± 10%    4247.1m ± 11%  +2861.02% (p=0.000 n=240+305)    4200.0m ± 13%  +2828.19% (p=0.000 n=240)       145.2m ± 11%        ~ (p=0.707 n=240)
GenerateKey/Rand=Std/2048-4      353.6m ± 12%    6816.7m ± 15%  +1827.71% (p=0.000 n=240+296)    6534.5m ± 13%  +1747.91% (p=0.000 n=240)       256.6m ± 14%  -27.42% (p=0.000 n=240)
GenerateKey/Rand=NonStd/1024-4   52.13m ± 10%   1269.57m ± 12%  +2335.52% (p=0.000 n=240+295)   1246.42m ± 14%  +2291.11% (p=0.000 n=240)       50.46m ± 17%        ~ (p=0.271 n=240)
GenerateKey/Rand=NonStd/1536-4   158.0m ± 13%    4120.4m ± 14%  +2507.53% (p=0.000 n=240+290)    4041.3m ± 10%  +2457.46% (p=0.000 n=240)       129.6m ± 11%  -17.97% (p=0.003 n=240)
GenerateKey/Rand=NonStd/2048-4   349.6m ± 11%    6292.5m ± 15%  +1699.94% (p=0.000 n=240+278)    6741.7m ± 16%  +1828.41% (p=0.000 n=240)       257.9m ± 14%  -26.24% (p=0.000 n=240)
geomean                          139.3m            3.251        +2233.69%                          3.277        +2251.98%                       122.1m        -12.38%
$

(The two benchstat outputs are very wide. You have to scroll to see the full text, even on a monitor that is plenty wide enough. I don't know why GitHub insists on capping the width at something so narrow!)

@rsc rsc closed this as completed Dec 6, 2024
@rsc
Copy link
Contributor Author

rsc commented Dec 6, 2024

By the way, there are so many data points above because benchmarking RSA key generation is difficult because the time to generate a particular key is highly variable, based on whether the "guess and check" algorithms get lucky. Here are some CDFs of the time distribution for individual GenerateKey calls. Those tails!

Screenshot 2024-12-06 at 3 05 15 PM

@FiloSottile
Copy link
Contributor

FiloSottile commented Dec 6, 2024

I was gonna ask why base non-BoringCrypto gets slower after CL 632775, but it looks like it's just within the very wide margins of error.

By the way, probably not worth it, but to make it less noisy we could make key generation report the number of MR rejections, and normalize the benchmark based on that.

Or, try to make the sequence of tested primes fixed. That part doesn't need to change often. (This is yet another reason I like rigid key generation from seed. You can benchmark apples to apples because everyone needs to come to the same output.)

@rsc
Copy link
Contributor Author

rsc commented Dec 7, 2024

By the way, probably not worth it, but to make it less noisy we could make key generation report the number of MR rejections, and normalize the benchmark based on that.

Even better, it would help to have a benchmark for a single Miller-Rabin round at important sizes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Projects
None yet
Development

No branches or pull requests

6 participants