crypto/rsa: new key generation prohibitively slow under race detector #70644

rsc · 2024-12-02T19:31:13Z

Although recent changes to crypto/rsa made key generation 3-4X slower and then improved that to only 50% slower, when run under the race detector the code still runs about 10X slower than before. Using go test golang.org/x/crypto/openpgp/clearsign as a test, here are the timings on my M3 MacBook Pro using different crypto commits:

# fa38b41be9 crypto/internal/fips140/rsa: check that e and N are odd
% go test .
ok  	golang.org/x/crypto/openpgp/clearsign	8.572s
% go test -race .
ok  	golang.org/x/crypto/openpgp/clearsign	18.289s
% 

# 7d7192e54f crypto/rsa: move precomputation to crypto/internal/fips140/rsa
% go test .
ok  	golang.org/x/crypto/openpgp/clearsign	8.469s
% go test -race .
ok  	golang.org/x/crypto/openpgp/clearsign	17.484s
% 

# acd54c9985 crypto/rsa: move key generation to crypto/internal/fips140/rsa
% go test .
ok  	golang.org/x/crypto/openpgp/clearsign	33.242s
% go test -race .
ok  	golang.org/x/crypto/openpgp/clearsign	180.334s


# c5c4f3dd5f crypto/x509: keep RSA CRT values in ParsePKCS1PrivateKey
% cd
% cd src/golang.org/x/crypto/openpgp/clearsign
% go test .
ok  	golang.org/x/crypto/openpgp/clearsign	25.387s
% go test -race .
ok  	golang.org/x/crypto/openpgp/clearsign	179.546s

# dd7ab5ec5d crypto/internal/fips140/rsa: do trial divisions in key generation
% go test .
ok  	golang.org/x/crypto/openpgp/clearsign	12.057s
% go test -race .
ok  	golang.org/x/crypto/openpgp/clearsign	212.460s
%

The text was updated successfully, but these errors were encountered:

rsc · 2024-12-02T19:31:32Z

/cc @rolandshoemaker @FiloSottile

gabyhelp · 2024-12-02T19:35:17Z

Related Issues

crypto/rsa: RSA key generation is unreasonably slow on MIPS architecture #33224
crypto/rsa: rsa.PrivateKey with json.Unmarshal and Go1.20 results in slow keys #59695 (closed)
crypto/rsa: Some severe performance regressions in Go 1.20 #59442 (closed)
crypto/rsa: severe VerifyPKCS1v15 performance regression in Go 1.20, Go 1.21 [freeze exception] #63516 (closed)
crypto/rsa: RSA key generation is slow #649 (closed)

Related Code Changes

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

rsc · 2024-12-02T19:40:34Z

Here is a more direct measurement, using crypto/rsa BenchmarkGenerateKey:

% go test -bench=GenerateKey
goos: darwin
goarch: arm64
pkg: crypto/rsa
cpu: Apple M3 Pro
BenchmarkGenerateKey/2048-12         	      24	 102854269 ns/op
PASS
ok  	crypto/rsa	3.976s
% go test -race -bench=GenerateKey
goos: darwin
goarch: arm64
pkg: crypto/rsa
cpu: Apple M3 Pro
BenchmarkGenerateKey/2048-12         	       8	 166628526 ns/op
PASS
ok  	crypto/rsa	6.051s
% git checkout acd54c9985  # crypto/rsa: move key generation to crypto/internal/fips140/rsa
% go test -bench=GenerateKey
goos: darwin
goarch: arm64
pkg: crypto/rsa
cpu: Apple M3 Pro
BenchmarkGenerateKey/2048-12         	       6	 421387931 ns/op
PASS
ok  	crypto/rsa	7.552s
% go test -race -bench=GenerateKey
goos: darwin
goarch: arm64
pkg: crypto/rsa
cpu: Apple M3 Pro
BenchmarkGenerateKey/2048-12         	       1	1534678542 ns/op
PASS
ok  	crypto/rsa	62.343s
% git checkout master
% go test -bench=GenerateKey
goos: darwin
goarch: arm64
pkg: crypto/rsa
cpu: Apple M3 Pro
BenchmarkGenerateKey/2048-12         	      73	 124260540 ns/op
PASS
ok  	crypto/rsa	11.227s
% go test -race -bench=GenerateKey
goos: darwin
goarch: arm64
pkg: crypto/rsa
cpu: Apple M3 Pro
BenchmarkGenerateKey/2048-12         	       1	6158143333 ns/op
PASS
ok  	crypto/rsa	47.660s
%

With the improved trial divisions, GenerateKey takes 25% longer than before in non-race mode, but in race mode it takes 300% longer (4X the time).

Is the issue that bigmod has more loops not in assembly, and so more instrumented race loops?

rolandshoemaker · 2024-12-02T20:18:45Z

Profiling suggests the addMulVVW loop not being in assembly may be biting us here (for the non-optimized sizes, > 2048).

rsc · 2024-12-02T21:00:18Z

The GenerateKey benchmark is only 2048-bit, so addMulVVW should not be the issue. But if it is, maybe we should add a //go:norace to it.

rolandshoemaker · 2024-12-02T21:05:18Z

I updated the benchmark to test larger key sizes as well, as it seems like the slowdown is non-linear, but for really large keys maybe we just don't care.

rolandshoemaker · 2024-12-02T21:26:45Z

Regardless, it seems like we're spending most of the new time in InverseVarTime, which makes sense. I cannot see particularly why that would be all that much worse than the old approach, but my understanding of the race detector is rather rudimentary.

Funnily I see significantly worse numbers (with race detector enabled for both runs), but this might just be an example of Apple chips being extremely weird to benchmark on:

goos: darwin
goarch: arm64
pkg: crypto/rsa
cpu: Apple M1 Pro
                    │   rsa-old    │                 rsa-new                 │
                    │    sec/op    │    sec/op      vs base                  │
GenerateKey/2048-10   200.1m ± 12%   2723.4m ± 62%  +1260.83% (p=0.000 n=10)

rsc · 2024-12-02T22:21:10Z

I poked at this some more and have a pair of CLs,
one adding //go:norace to all the loops that used to be assembly in math/big,
and one removing the use of temporaries in Add, Sub, and maybeSubtractModulus.
Combined they reduce the race detector cost by about 10X.
I will gather benchmarks and then send them out.

magical · 2024-12-03T23:24:45Z

Just noting that Russ's CLs are CL 632978 (merged) and CL 632979 (on hold). (Neither one mentions this issue, so we didn't get the usual pingback from gopherbot)

mknyszek · 2024-12-04T17:15:05Z

This is still marked as a release blocker, but as @magical notes, CL 632978 appears to work around the issue. Can we close this issue? Thanks.

rolandshoemaker · 2024-12-05T22:45:21Z

https://go.dev/cl/633995 also addressed this.

rsc · 2024-12-06T18:33:34Z

I will close this issue once I have an analysis ready to post. It looks like the problem is fixed.

rsc · 2024-12-06T20:03:43Z

I wrote a short RSA key generation benchmark:

package rsa_test

import (
	"crypto/rand"
	"crypto/rsa"
	"fmt"
	"io"
	"testing"
)

type reader struct{ io.Reader }

func BenchmarkGenerateKey(b *testing.B) {
	for _, kind := range []string{"Rand=Std", "Rand=NonStd"} {
		r := rand.Reader
		if kind == "Rand=NonStd" {
			r = &reader{r}
		}
		for _, size := range []int{1024, 1536, 2048, 3072, 4096, 6144, 8192} {
			b.Run(fmt.Sprint(kind, "/", size), func(b *testing.B) {
				for range b.N {
					if _, err := rsa.GenerateKey(r, size); err != nil {
						b.Fatal(err)
					}
				}
			})
		}
	}
}

I ran this against Go toolchains built at four versions, using GOEXPERIMENT=boringcrypto. Rand=Std and Rand=NonStd should behave identically, except that Rand=Std/2048, /3072, and /4096 use BoringCrypto, and other key sizes or non-crypto/rand.Reader randomness sources fall back to non-BoringCrypto paths. (I am making sure to test BoringCrypto because I wanted to see how our new FIPS code compared to the old FIPS "solution".)

CL 622238 is a Go tree not too long before any FIPS work (4a0d5d6)
CL 632775 has the new RSA implementation, before the first norace optimization (9d76157)
CL 632978 is the first norace optimization (bdedc5c)
CL 633995 is the second norace optimization (c3a706e)

Here are a sequence of tables showing each version separately, measuring the base time and then the time under sanitizer (race, asn, msan). I only ran key sizes up to 2048 because the intermediate versions were so slow. The Rand=Std/N and Rand=NonStd/N lines should measure identically, except that (as just noted) Rand=Std/2048 is a special case that dispatches to BoringCrypto in base, race, and asan builds (but not msan). BoringCrypto is invisible to base, race, and asan, so you don't see that line slow down in those modes.

You can see in these tables that those two intermediate versions were extremely slow, especially for asan and msan! It's surprising to me how much slower asan and msan are than race. I'd always thought race had the hardest job of the three, but maybe @dvyukov did a better job on the implementation.

$ benchstat -col 'mode@(base race asan msan)' -table version rsa.log
version: 622238
                               │     base     │                  race                   │                  asan                   │                  msan                   │
                               │    sec/op    │    sec/op      vs base                  │    sec/op      vs base                  │    sec/op      vs base                  │
GenerateKey/Rand=Std/1024-4      18.11m ±  9%    39.54m ± 10%  +118.31% (p=0.000 n=240)    71.68m ±  7%  +295.72% (p=0.000 n=240)    50.04m ± 10%  +176.29% (p=0.000 n=240)
GenerateKey/Rand=Std/1536-4      64.03m ±  9%   108.28m ± 12%   +69.11% (p=0.000 n=240)   201.64m ±  9%  +214.92% (p=0.000 n=240)   143.43m ± 10%  +124.02% (p=0.000 n=240)
GenerateKey/Rand=Std/2048-4      80.26m ± 12%    80.87m ± 11%         ~ (p=0.788 n=240)    88.20m ±  8%         ~ (p=0.227 n=240)   353.62m ± 12%  +340.61% (p=0.000 n=240)
GenerateKey/Rand=NonStd/1024-4   16.02m ± 12%    36.63m ± 10%  +128.64% (p=0.000 n=240)    74.95m ± 10%  +367.84% (p=0.000 n=240)    52.13m ± 10%  +225.37% (p=0.000 n=240)
GenerateKey/Rand=NonStd/1536-4   64.56m ±  8%   107.94m ± 12%   +67.21% (p=0.000 n=240)   205.41m ± 11%  +218.18% (p=0.000 n=240)   158.02m ± 13%  +144.78% (p=0.000 n=240)
GenerateKey/Rand=NonStd/2048-4   139.0m ± 11%    238.7m ± 13%   +71.77% (p=0.000 n=240)    435.6m ± 14%  +213.46% (p=0.000 n=240)    349.6m ± 11%  +151.60% (p=0.000 n=240)
geomean                          48.72m          82.99m         +70.34%                    143.0m        +193.49%                    139.3m        +185.93%

version: 632775
                               │     base     │                   race                   │                     asan                      │                     msan                      │
                               │    sec/op    │    sec/op      vs base                   │     sec/op      vs base                       │     sec/op      vs base                       │
GenerateKey/Rand=Std/1024-4      33.31m ± 13%   554.74m ±  9%  +1565.45% (p=0.000 n=320)   1861.09m ±  9%  +5487.44% (p=0.000 n=320)       1238.93m ± 13%  +3619.55% (p=0.000 n=320+310)
GenerateKey/Rand=Std/1536-4      117.0m ± 11%   1652.0m ± 10%  +1311.84% (p=0.000 n=320)    6959.7m ± 14%  +5847.80% (p=0.000 n=320)        4247.1m ± 11%  +3529.62% (p=0.000 n=320+305)
GenerateKey/Rand=Std/2048-4      81.17m ±  8%    80.00m ±  7%          ~ (p=0.419 n=320)     81.63m ± 10%          ~ (p=0.896 n=320)       6816.73m ± 15%  +8297.89% (p=0.000 n=320+296)
GenerateKey/Rand=NonStd/1024-4   38.38m ± 12%   563.26m ± 13%  +1367.58% (p=0.000 n=320)   1924.62m ± 10%  +4914.62% (p=0.000 n=320)       1269.57m ± 12%  +3207.88% (p=0.000 n=320+295)
GenerateKey/Rand=NonStd/1536-4   123.7m ± 12%   2029.4m ± 10%  +1541.13% (p=0.000 n=320)    6254.0m ± 13%  +4957.43% (p=0.000 n=320)        4120.4m ± 14%  +3232.09% (p=0.000 n=320+290)
GenerateKey/Rand=NonStd/2048-4   192.5m ± 19%   2620.8m ± 15%  +1261.54% (p=0.000 n=320)    9427.0m ± 10%  +4797.47% (p=0.000 n=320+317)    6292.5m ± 15%  +3169.07% (p=0.000 n=320+278)
geomean                          81.31m          776.7m         +855.27%                      2.221        +2631.21%                          3.251        +3898.24%

version: 632978
                               │     base     │                  race                  │                     asan                      │                     msan                      │
                               │    sec/op    │    sec/op     vs base                  │     sec/op      vs base                       │     sec/op      vs base                       │
GenerateKey/Rand=Std/1024-4      35.04m ± 10%   91.53m ± 12%  +161.21% (p=0.000 n=265)   1957.35m ± 11%  +5485.98% (p=0.000 n=265)       1327.60m ± 10%  +3688.78% (p=0.000 n=265+245)
GenerateKey/Rand=Std/1536-4      109.0m ± 15%   238.5m ± 17%  +118.69% (p=0.000 n=265)    5900.2m ± 15%  +5310.95% (p=0.000 n=265+261)    4200.0m ± 13%  +3751.74% (p=0.000 n=265+240)
GenerateKey/Rand=Std/2048-4      82.24m ± 10%   81.28m ±  8%         ~ (p=0.967 n=265)     81.28m ± 10%          ~ (p=0.923 n=265+255)   6534.53m ± 13%  +7845.25% (p=0.000 n=265+240)
GenerateKey/Rand=NonStd/1024-4   39.14m ± 10%   86.05m ± 13%  +119.87% (p=0.000 n=265)   1827.46m ± 11%  +4569.20% (p=0.000 n=265+255)   1246.42m ± 14%  +3084.63% (p=0.000 n=265+240)
GenerateKey/Rand=NonStd/1536-4   105.0m ± 12%   226.1m ±  9%  +115.43% (p=0.000 n=265)    5789.5m ± 13%  +5415.74% (p=0.000 n=265+250)    4041.3m ± 10%  +3750.18% (p=0.000 n=265+240)
GenerateKey/Rand=NonStd/2048-4   197.6m ± 11%   371.2m ± 10%   +87.84% (p=0.000 n=265)    8842.7m ± 14%  +4374.89% (p=0.000 n=265+245)    6741.7m ± 16%  +3311.64% (p=0.000 n=265+240)
geomean                          79.64m         153.0m         +92.08%                      2.108        +2547.36%                          3.277        +4014.29%

version: 633995
                               │     base     │                 race                  │                 asan                  │                  msan                   │
                               │    sec/op    │    sec/op     vs base                 │    sec/op     vs base                 │    sec/op      vs base                  │
GenerateKey/Rand=Std/1024-4      36.52m ± 13%   63.49m ± 15%  +73.84% (p=0.000 n=240)   55.65m ±  9%  +52.38% (p=0.000 n=240)    52.63m ± 10%   +44.11% (p=0.000 n=240)
GenerateKey/Rand=Std/1536-4      117.5m ± 13%   151.0m ± 13%  +28.48% (p=0.000 n=240)   142.2m ± 14%  +21.02% (p=0.001 n=240)    145.2m ± 11%   +23.52% (p=0.001 n=240)
GenerateKey/Rand=Std/2048-4      83.68m ±  9%   78.60m ± 11%        ~ (p=0.128 n=240)   77.54m ± 10%        ~ (p=0.145 n=240)   256.64m ± 14%  +206.70% (p=0.000 n=240)
GenerateKey/Rand=NonStd/1024-4   36.50m ± 14%   55.19m ± 13%  +51.19% (p=0.000 n=240)   44.87m ± 11%  +22.92% (p=0.000 n=240)    50.46m ± 17%   +38.25% (p=0.000 n=240)
GenerateKey/Rand=NonStd/1536-4   117.6m ± 11%   143.0m ± 10%  +21.53% (p=0.001 n=240)   125.3m ± 15%        ~ (p=0.359 n=240)    129.6m ± 11%         ~ (p=0.175 n=240)
GenerateKey/Rand=NonStd/2048-4   224.5m ± 12%   264.9m ± 12%  +17.97% (p=0.013 n=240)   243.3m ±  9%        ~ (p=0.575 n=240)    257.9m ± 14%         ~ (p=0.098 n=240)
geomean                          83.80m         107.9m        +28.72%                   97.12m        +15.90%                    122.1m         +45.66%
$

And here it is flipped so that the columns compare versions and the tables show different modes. You can see in this form that although the base RSA operations have gotten slower, as expected because of the use of constant-time bignum routines, the asan and msan modes have actually gotten faster, because we've hidden more from them than we did when using math/big. Again race is somehow an exception, as is the Std/2048 BoringCrypto line (but at least we understand that one).

$ benchstat -col version -table 'mode@(base race asan msan)' rsa.log
mode: base
                               │   622238     │                   632775                    │                   632978                    │                 633995                  │
                               │    sec/op    │    sec/op      vs base                      │    sec/op      vs base                      │    sec/op      vs base                  │
GenerateKey/Rand=Std/1024-4      18.11m ±  9%    33.31m ± 13%   +83.89% (p=0.000 n=240+320)    35.04m ± 10%   +93.45% (p=0.000 n=240+265)    36.52m ± 13%  +101.64% (p=0.000 n=240)
GenerateKey/Rand=Std/1536-4      64.03m ±  9%   117.01m ± 11%   +82.75% (p=0.000 n=240+320)   109.04m ± 15%   +70.30% (p=0.000 n=240+265)   117.51m ± 13%   +83.53% (p=0.000 n=240)
GenerateKey/Rand=Std/2048-4      80.26m ± 12%    81.17m ±  8%         ~ (p=0.647 n=240+320)    82.24m ± 10%         ~ (p=0.665 n=240+265)    83.68m ±  9%         ~ (p=0.659 n=240)
GenerateKey/Rand=NonStd/1024-4   16.02m ± 12%    38.38m ± 12%  +139.56% (p=0.000 n=240+320)    39.14m ± 10%  +144.30% (p=0.000 n=240+265)    36.50m ± 14%  +127.83% (p=0.000 n=240)
GenerateKey/Rand=NonStd/1536-4   64.56m ±  8%   123.66m ± 12%   +91.55% (p=0.000 n=240+320)   104.96m ± 12%   +62.59% (p=0.000 n=240+265)   117.64m ± 11%   +82.22% (p=0.000 n=240)
GenerateKey/Rand=NonStd/2048-4   139.0m ± 11%    192.5m ± 19%   +38.53% (p=0.000 n=240+320)    197.6m ± 11%   +42.21% (p=0.000 n=240+265)    224.5m ± 12%   +61.59% (p=0.000 n=240)
geomean                          48.72m          81.31m         +66.89%                        79.64m         +63.45%                        83.80m         +71.99%

mode: race
                               │   622238     │                   632775                     │                  632978                    │                633995                 │
                               │    sec/op    │    sec/op      vs base                       │    sec/op     vs base                      │    sec/op     vs base                 │
GenerateKey/Rand=Std/1024-4      39.54m ± 10%   554.74m ±  9%  +1302.88% (p=0.000 n=240+320)   91.53m ± 12%  +131.47% (p=0.000 n=240+265)   63.49m ± 15%  +60.57% (p=0.000 n=240)
GenerateKey/Rand=Std/1536-4      108.3m ± 12%   1652.0m ± 10%  +1425.73% (p=0.000 n=240+320)   238.5m ± 17%  +120.23% (p=0.000 n=240+265)   151.0m ± 13%  +39.43% (p=0.000 n=240)
GenerateKey/Rand=Std/2048-4      80.87m ± 11%    80.00m ±  7%          ~ (p=0.156 n=240+320)   81.28m ±  8%         ~ (p=0.478 n=240+265)   78.60m ± 11%        ~ (p=0.151 n=240)
GenerateKey/Rand=NonStd/1024-4   36.63m ± 10%   563.26m ± 13%  +1437.68% (p=0.000 n=240+320)   86.05m ± 13%  +134.92% (p=0.000 n=240+265)   55.19m ± 13%  +50.66% (p=0.000 n=240)
GenerateKey/Rand=NonStd/1536-4   107.9m ± 12%   2029.4m ± 10%  +1780.06% (p=0.000 n=240+320)   226.1m ±  9%  +109.48% (p=0.000 n=240+265)   143.0m ± 10%  +32.44% (p=0.000 n=240)
GenerateKey/Rand=NonStd/2048-4   238.7m ± 13%   2620.8m ± 15%   +998.05% (p=0.000 n=240+320)   371.2m ± 10%   +55.52% (p=0.000 n=240+265)   264.9m ± 12%        ~ (p=0.178 n=240)
geomean                          82.99m          776.7m         +835.92%                       153.0m         +84.32%                       107.9m        +29.96%

mode: asan
                               │   622238     │                    632775                     │                    632978                     │                633995                 │
                               │    sec/op    │     sec/op      vs base                       │     sec/op      vs base                       │    sec/op     vs base                 │
GenerateKey/Rand=Std/1024-4      71.68m ±  7%   1861.09m ±  9%  +2496.51% (p=0.000 n=240+320)   1957.35m ± 11%  +2630.80% (p=0.000 n=240+265)   55.65m ±  9%  -22.36% (p=0.000 n=240)
GenerateKey/Rand=Std/1536-4      201.6m ±  9%    6959.7m ± 14%  +3351.52% (p=0.000 n=240+320)    5900.2m ± 15%  +2826.10% (p=0.000 n=240+261)   142.2m ± 14%  -29.47% (p=0.000 n=240)
GenerateKey/Rand=Std/2048-4      88.20m ±  8%     81.63m ± 10%          ~ (p=0.112 n=240+320)     81.28m ± 10%          ~ (p=0.081 n=240+255)   77.54m ± 10%  -12.09% (p=0.024 n=240)
GenerateKey/Rand=NonStd/1024-4   74.95m ± 10%   1924.62m ± 10%  +2467.79% (p=0.000 n=240+320)   1827.46m ± 11%  +2338.16% (p=0.000 n=240+255)   44.87m ± 11%  -40.14% (p=0.000 n=240)
GenerateKey/Rand=NonStd/1536-4   205.4m ± 11%    6254.0m ± 13%  +2944.63% (p=0.000 n=240+320)    5789.5m ± 13%  +2718.53% (p=0.000 n=240+250)   125.3m ± 15%  -39.00% (p=0.000 n=240)
GenerateKey/Rand=NonStd/2048-4   435.6m ± 14%    9427.0m ± 10%  +2064.35% (p=0.000 n=240+317)    8842.7m ± 14%  +1930.22% (p=0.000 n=240+245)   243.3m ±  9%  -44.14% (n=240)
geomean                          143.0m            2.221        +1453.07%                          2.108        +1374.40%                       97.12m        -32.08%

mode: msan
                               │   622238     │                    632775                     │                    632978                     │                633995                 │
                               │    sec/op    │     sec/op      vs base                       │     sec/op      vs base                       │    sec/op     vs base                 │
GenerateKey/Rand=Std/1024-4      50.04m ± 10%   1238.93m ± 13%  +2375.69% (p=0.000 n=240+310)   1327.60m ± 10%  +2552.88% (p=0.000 n=240+245)   52.63m ± 10%        ~ (p=0.663 n=240)
GenerateKey/Rand=Std/1536-4      143.4m ± 10%    4247.1m ± 11%  +2861.02% (p=0.000 n=240+305)    4200.0m ± 13%  +2828.19% (p=0.000 n=240)       145.2m ± 11%        ~ (p=0.707 n=240)
GenerateKey/Rand=Std/2048-4      353.6m ± 12%    6816.7m ± 15%  +1827.71% (p=0.000 n=240+296)    6534.5m ± 13%  +1747.91% (p=0.000 n=240)       256.6m ± 14%  -27.42% (p=0.000 n=240)
GenerateKey/Rand=NonStd/1024-4   52.13m ± 10%   1269.57m ± 12%  +2335.52% (p=0.000 n=240+295)   1246.42m ± 14%  +2291.11% (p=0.000 n=240)       50.46m ± 17%        ~ (p=0.271 n=240)
GenerateKey/Rand=NonStd/1536-4   158.0m ± 13%    4120.4m ± 14%  +2507.53% (p=0.000 n=240+290)    4041.3m ± 10%  +2457.46% (p=0.000 n=240)       129.6m ± 11%  -17.97% (p=0.003 n=240)
GenerateKey/Rand=NonStd/2048-4   349.6m ± 11%    6292.5m ± 15%  +1699.94% (p=0.000 n=240+278)    6741.7m ± 16%  +1828.41% (p=0.000 n=240)       257.9m ± 14%  -26.24% (p=0.000 n=240)
geomean                          139.3m            3.251        +2233.69%                          3.277        +2251.98%                       122.1m        -12.38%
$

(The two benchstat outputs are very wide. You have to scroll to see the full text, even on a monitor that is plenty wide enough. I don't know why GitHub insists on capping the width at something so narrow!)

rsc · 2024-12-06T20:09:46Z

By the way, there are so many data points above because benchmarking RSA key generation is difficult because the time to generate a particular key is highly variable, based on whether the "guess and check" algorithms get lucky. Here are some CDFs of the time distribution for individual GenerateKey calls. Those tails!

FiloSottile · 2024-12-06T23:03:01Z

I was gonna ask why base non-BoringCrypto gets slower after CL 632775, but it looks like it's just within the very wide margins of error.

By the way, probably not worth it, but to make it less noisy we could make key generation report the number of MR rejections, and normalize the benchmark based on that.

Or, try to make the sequence of tested primes fixed. That part doesn't need to change often. (This is yet another reason I like rigid key generation from seed. You can benchmark apples to apples because everyone needs to come to the same output.)

rsc · 2024-12-07T20:47:27Z

By the way, probably not worth it, but to make it less noisy we could make key generation report the number of MR rejections, and normalize the benchmark based on that.

Even better, it would help to have a benchmark for a single Miller-Rabin round at important sizes.

rsc added NeedsFix The path to resolution is known, but the work has not been done. release-blocker labels Dec 2, 2024

rsc added this to the Go1.24 milestone Dec 2, 2024

rsc closed this as completed Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crypto/rsa: new key generation prohibitively slow under race detector #70644

crypto/rsa: new key generation prohibitively slow under race detector #70644

rsc commented Dec 2, 2024

rsc commented Dec 2, 2024

gabyhelp commented Dec 2, 2024

rsc commented Dec 2, 2024

rolandshoemaker commented Dec 2, 2024 •

edited

Loading

rsc commented Dec 2, 2024

rolandshoemaker commented Dec 2, 2024

rolandshoemaker commented Dec 2, 2024

rsc commented Dec 2, 2024

magical commented Dec 3, 2024

mknyszek commented Dec 4, 2024

rolandshoemaker commented Dec 5, 2024

rsc commented Dec 6, 2024

rsc commented Dec 6, 2024 •

edited by gabyhelp

Loading

rsc commented Dec 6, 2024

FiloSottile commented Dec 6, 2024 •

edited by gabyhelp

Loading

rsc commented Dec 7, 2024

crypto/rsa: new key generation prohibitively slow under race detector #70644

crypto/rsa: new key generation prohibitively slow under race detector #70644

Comments

rsc commented Dec 2, 2024

rsc commented Dec 2, 2024

gabyhelp commented Dec 2, 2024

rsc commented Dec 2, 2024

rolandshoemaker commented Dec 2, 2024 • edited Loading

rsc commented Dec 2, 2024

rolandshoemaker commented Dec 2, 2024

rolandshoemaker commented Dec 2, 2024

rsc commented Dec 2, 2024

magical commented Dec 3, 2024

mknyszek commented Dec 4, 2024

rolandshoemaker commented Dec 5, 2024

rsc commented Dec 6, 2024

rsc commented Dec 6, 2024 • edited by gabyhelp Loading

rsc commented Dec 6, 2024

FiloSottile commented Dec 6, 2024 • edited by gabyhelp Loading

rsc commented Dec 7, 2024

rolandshoemaker commented Dec 2, 2024 •

edited

Loading

rsc commented Dec 6, 2024 •

edited by gabyhelp

Loading

FiloSottile commented Dec 6, 2024 •

edited by gabyhelp

Loading