-
Notifications
You must be signed in to change notification settings - Fork 18k
crypto/rsa: severe VerifyPKCS1v15 performance regression in Go 1.20, Go 1.21 [freeze exception] #63516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
See also #57752. |
Confirming the issue here too. In our system we have a component that executes rsa operations at high rates and the critical part that calls |
CC @golang/security. |
Change https://go.dev/cl/552895 mentions this issue: |
Change https://go.dev/cl/552935 mentions this issue: |
Turns out we were benchmarking the wrong thing: all the benchmark keys had E = 3, which hid the algorithmic slowdown on the exponent size. See https://go.dev/cl/552895. Apologies, I am really not happy this happened, and that it took me so long to catch it. I have a CL out that makes public key operations faster than they were in Go 1.19, closing the loop on all the performance regressions. I'd appreciate if folks could test this with their workloads.
(This will get you something not too dissimilar from Go 1.22rc1, with the CL patched on top. I wouldn't run the whole fleet on it, but it should be stable enough for a lab.) |
So it turns out that the go1.19 -> go.1.20 (real-world) Verify operations were only slower by x4, not by x20 as noted in the release notes, right? https://go.dev/doc/go1.20#cryptorsapkgcryptorsa
|
hi @FiloSottile is this fix being downported to 1.21? |
@golang/release, @rsc suggested getting this freeze exception'd rather than backported. |
We've discussed and agreed to approve this freeze exception for Go 1.22. If this change turns out not to go as anticipated, please update the issue and follow up accordingly. Thanks. |
Now, this is embarrassing. For the whole Go 1.20 and Go 1.21 cycles, we based RSA public key operation (verification and decryption) benchmarks on the keys in rsa_test.go, which had E = 3. Most keys in use, including all those generated by GenerateKey, have E = 65537. This significantly skewed even relative benchmarks, because the new constant-time algorithms would incur a larger slowdown for larger exponents. I noticed this only because I got a production profile for an application that does a lot of RSA verifications, saw ExpShort show up, made ExpShort faster, and the crypto/rsa profiles didn't move. We were measuring the wrong thing, and the slowdown was worse than we thought. My apologies. (If E had not been parametrized, it would have avoided issues like this one, too. Grumble. https://words.filippo.io/parameters/#fn9) goos: darwin goarch: arm64 pkg: crypto/rsa │ g35222eeb78 │ new │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048-8 1.414m ± 2% 1.417m ± 1% ~ (p=0.971 n=10) DecryptPKCS1v15/3072-8 4.107m ± 0% 4.160m ± 1% +1.29% (p=0.000 n=10) DecryptPKCS1v15/4096-8 9.363m ± 1% 9.305m ± 1% ~ (p=0.143 n=10) EncryptPKCS1v15/2048-8 162.8µ ± 2% 212.1µ ± 0% +30.34% (p=0.000 n=10) DecryptOAEP/2048-8 1.460m ± 4% 1.413m ± 1% ~ (p=0.105 n=10) EncryptOAEP/2048-8 161.7µ ± 0% 213.4µ ± 0% +31.99% (p=0.000 n=10) SignPKCS1v15/2048-8 1.419m ± 1% 1.476m ± 1% +4.05% (p=0.000 n=10) VerifyPKCS1v15/2048-8 160.6µ ± 0% 212.6µ ± 3% +32.38% (p=0.000 n=10) SignPSS/2048-8 1.419m ± 0% 1.477m ± 2% +4.07% (p=0.000 n=10) VerifyPSS/2048-8 163.9µ ± 8% 212.3µ ± 0% +29.50% (p=0.000 n=10) geomean 802.5µ 899.1µ +12.04% Updates #63516 Change-Id: Iab4a0684d8101ae07dac8462908d8058fe5e9f3d Reviewed-on: https://go-review.googlesource.com/c/go/+/552895 Reviewed-by: Roland Shoemaker <roland@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com>
Submitted to master. |
Now, this is embarrassing. For the whole Go 1.20 and Go 1.21 cycles, we based RSA public key operation (verification and decryption) benchmarks on the keys in rsa_test.go, which had E = 3. Most keys in use, including all those generated by GenerateKey, have E = 65537. This significantly skewed even relative benchmarks, because the new constant-time algorithms would incur a larger slowdown for larger exponents. I noticed this only because I got a production profile for an application that does a lot of RSA verifications, saw ExpShort show up, made ExpShort faster, and the crypto/rsa profiles didn't move. We were measuring the wrong thing, and the slowdown was worse than we thought. My apologies. (If E had not been parametrized, it would have avoided issues like this one, too. Grumble. https://words.filippo.io/parameters/#fn9) goos: darwin goarch: arm64 pkg: crypto/rsa │ g35222eeb78 │ new │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048-8 1.414m ± 2% 1.417m ± 1% ~ (p=0.971 n=10) DecryptPKCS1v15/3072-8 4.107m ± 0% 4.160m ± 1% +1.29% (p=0.000 n=10) DecryptPKCS1v15/4096-8 9.363m ± 1% 9.305m ± 1% ~ (p=0.143 n=10) EncryptPKCS1v15/2048-8 162.8µ ± 2% 212.1µ ± 0% +30.34% (p=0.000 n=10) DecryptOAEP/2048-8 1.460m ± 4% 1.413m ± 1% ~ (p=0.105 n=10) EncryptOAEP/2048-8 161.7µ ± 0% 213.4µ ± 0% +31.99% (p=0.000 n=10) SignPKCS1v15/2048-8 1.419m ± 1% 1.476m ± 1% +4.05% (p=0.000 n=10) VerifyPKCS1v15/2048-8 160.6µ ± 0% 212.6µ ± 3% +32.38% (p=0.000 n=10) SignPSS/2048-8 1.419m ± 0% 1.477m ± 2% +4.07% (p=0.000 n=10) VerifyPSS/2048-8 163.9µ ± 8% 212.3µ ± 0% +29.50% (p=0.000 n=10) geomean 802.5µ 899.1µ +12.04% Updates golang#63516 Change-Id: Iab4a0684d8101ae07dac8462908d8058fe5e9f3d Reviewed-on: https://go-review.googlesource.com/c/go/+/552895 Reviewed-by: Roland Shoemaker <roland@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com>
Most libraries don't consider N secret, but it's arguably useful for privacy applications. However, E should generally be fixed, and there is a lot of performance to be gained by using variable-time exponentiation. The threshold trick is from BoringSSL. goos: linux goarch: amd64 pkg: crypto/rsa cpu: Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz │ old │ new │ │ sec/op │ sec/op vs base │ DecryptPKCS1v15/2048-4 1.398m ± 0% 1.396m ± 4% ~ (p=0.853 n=10) DecryptPKCS1v15/3072-4 3.640m ± 0% 3.652m ± 1% ~ (p=0.063 n=10) DecryptPKCS1v15/4096-4 7.756m ± 0% 7.764m ± 0% ~ (p=0.853 n=10) EncryptPKCS1v15/2048-4 175.50µ ± 0% 39.37µ ± 0% -77.57% (p=0.000 n=10) DecryptOAEP/2048-4 1.375m ± 0% 1.371m ± 1% ~ (p=0.089 n=10) EncryptOAEP/2048-4 177.64µ ± 0% 41.17µ ± 1% -76.82% (p=0.000 n=10) SignPKCS1v15/2048-4 1.419m ± 0% 1.393m ± 1% -1.84% (p=0.000 n=10) VerifyPKCS1v15/2048-4 173.70µ ± 1% 38.28µ ± 2% -77.96% (p=0.000 n=10) SignPSS/2048-4 1.437m ± 1% 1.413m ± 0% -1.64% (p=0.000 n=10) VerifyPSS/2048-4 176.83µ ± 1% 43.08µ ± 5% -75.64% (p=0.000 n=10) This finally makes everything in crypto/rsa faster than it was in Go 1.19. goos: linux goarch: amd64 pkg: crypto/rsa cpu: Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz │ go1.19.txt │ go1.20.txt │ go1.21.txt │ new.txt │ │ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │ DecryptPKCS1v15/2048-4 1.458m ± 0% 1.597m ± 1% +9.50% (p=0.000 n=10) 1.395m ± 1% -4.30% (p=0.000 n=10) 1.396m ± 4% -4.25% (p=0.002 n=10) DecryptPKCS1v15/3072-4 4.023m ± 1% 5.332m ± 1% +32.53% (p=0.000 n=10) 3.649m ± 1% -9.30% (p=0.000 n=10) 3.652m ± 1% -9.23% (p=0.000 n=10) DecryptPKCS1v15/4096-4 8.710m ± 1% 11.937m ± 1% +37.05% (p=0.000 n=10) 7.564m ± 1% -13.16% (p=0.000 n=10) 7.764m ± 0% -10.86% (p=0.000 n=10) EncryptPKCS1v15/2048-4 51.79µ ± 0% 267.68µ ± 0% +416.90% (p=0.000 n=10) 176.42µ ± 0% +240.67% (p=0.000 n=10) 39.37µ ± 0% -23.98% (p=0.000 n=10) DecryptOAEP/2048-4 1.461m ± 0% 1.613m ± 1% +10.37% (p=0.000 n=10) 1.415m ± 0% -3.13% (p=0.000 n=10) 1.371m ± 1% -6.18% (p=0.000 n=10) EncryptOAEP/2048-4 54.24µ ± 0% 269.19µ ± 0% +396.28% (p=0.000 n=10) 177.31µ ± 0% +226.89% (p=0.000 n=10) 41.17µ ± 1% -24.10% (p=0.000 n=10) SignPKCS1v15/2048-4 1.510m ± 0% 1.705m ± 0% +12.93% (p=0.000 n=10) 1.423m ± 1% -5.78% (p=0.000 n=10) 1.393m ± 1% -7.76% (p=0.000 n=10) VerifyPKCS1v15/2048-4 50.87µ ± 0% 266.41µ ± 1% +423.71% (p=0.000 n=10) 174.38µ ± 0% +242.79% (p=0.000 n=10) 38.28µ ± 2% -24.75% (p=0.000 n=10) SignPSS/2048-4 1.513m ± 1% 1.709m ± 0% +12.97% (p=0.000 n=10) 1.461m ± 0% -3.42% (p=0.000 n=10) 1.413m ± 0% -6.58% (p=0.000 n=10) VerifyPSS/2048-4 53.45µ ± 1% 268.56µ ± 0% +402.48% (p=0.000 n=10) 177.29µ ± 0% +231.72% (p=0.000 n=10) 43.08µ ± 5% -19.39% (p=0.000 n=10) geomean 514.6µ 1.094m +112.65% 801.6µ +55.77% 442.1µ -14.08% Fixes golang#63516 Change-Id: If40e596a2e4b3ab7a202ff34591cf9cffecfcc1b Reviewed-on: https://go-review.googlesource.com/c/go/+/552935 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Roland Shoemaker <roland@golang.org>
We're hoping that this will help address the high CPU usage we've observed in production when there's a lot of new connections being created. See: golang/go#63516 (comment)
We're hoping that this will help address the high CPU usage we've observed in production when there's a lot of new connections being created. See: golang/go#63516 (comment)
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes, with the latest releases of Go 1.20 and Go 1.21.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Some of our services started seeing severe performance degradation after upgrading to Go 1.20. Applications which verify many RS256 signed JWT tokens saw dramatic increases to CPU usage and profiles indicated that
rsa.VerifyPKCS1v15
was the problematic function call.I copied the
BenchmarkVerifyPKCS1v15
benchmark to Go 1.19 and ran benchmarks on Go 1.19.13, Go 1.20.10, Go 1.21.3 and Go 1.21.3+boringcrypto. I've filtered the benchmarks to only include theVerifyPKCS1v15
benchmark since that was the easiest to copy to Go 1.19, however, anything that callsrsa.go
'sencrypt()
is affected.I've been following these performance regressions (see #59442) and assumed they were fixed in Go 1.21. Go 1.21 mostly fixed the performance of RSA Private Key operations.
Our solution to retain performance, avoid massively scaling and remain on a recent version of Go is to compile with
GOEXPERIMENT=boringcrypto
which does not seem like a reasonable long term solution.The encrypt function contains a comment regarding a possible optimization using a
sync.Once
.I have a modification to my local Go 1.21 tree that contains this fix and caches
bigmod.NewModulusFromBig(pub.N)
. It brings performance closer to Go 1.19 (~67% worse on my M1 MacBook Pro). I do not think a patch containing this fix would be accepted. Adding async.Once
torsa.PublicKey
introduces a few issues:rsa.PublicKey
is no longerasn1.Marshal
able. This expectation is documented in a comment withinTestMarshalPKCS1PublicKey
sync.Once
which I don't commonly see in the stdlib.I think a proper change would require exposing some kind of
Precompute()
functionality onrsa.PublicKey
but I think that would still break theasn1.Marshal
expectation.What did you expect to see?
Similar performance to Go 1.19.
What did you see instead?
VerifyPKCS1v15
operations are ~25x slower in Go 1.20 and ~14x slower in Go 1.21.The text was updated successfully, but these errors were encountered: