-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
caddytls: Reuse certificate cache through reloads #5623
Conversation
I wonder if we could parameterize this, i.e. adding some optional flags to reloading to change behaviour. Something for later. Would mean HTTP headers in the API, possibly env vars and/or CLI flags for the reload command, then passing through those flags in context for provision/cleanup/start/stop 🤔 that would let users do a graceful reload with a forced cert reload. |
This actually avoids reloading managed certs from storage when already in the cache, d'oh.
We'll see what the need is first. I'm not sure what the use case would be for that kind of requirement at this stage. PS. Just pushed a commit that updates CertMagic so we don't reload managed certs from storage that are already in the cache 😅 |
certCache.RemoveManaged(noLongerManaged) | ||
certCache.Remove(noLongerLoaded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These names aren't symmetrical. Call the other one RemoveUnmanaged
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True! I did at first, but realized that technically this would remove any certificate by that hash, whether it's managed or not.
In practice, I suppose you'll only have the hash of unmanaged certificates since the managed ones don't expose that (unless you manually compute that yourself)...
I'll think about it!
Oh, and minor implementation detail: this change uses blake3 to hash certificates, instead of sha256. In my testing, Blake3 was nearly twice as fast, but with 2 allocations instead of 1:
(I think 64 B/op is still acceptable though. Blake3 apparently always computes a 64-byte digest but truncates it to 32 bytes by default, and we just use the default.) |
@francislavoie I think I'm ready to merge this; want to do any final review or does it LGTY? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep LGTM. Obviously there's still the question of the ActiveContext mutex but 🤷♂️ I haven't dug into it either
Yeah. Tests with the race detector don't raise any errors, so I'm good with trying this for now. We can rejigger the locking later if needed. Thanks! |
Previously, Caddy created a new certificate cache for every config load. This ensured that the state of the certificate cache was consistent with the newly-loaded config, no matter what changes were made.
However, this is a bottleneck for some users with large TLS deployments, some of which exceed 100K sites/certificates, as the entire cache was GC'ed and a new one had to be created by loading each certificate from storage and decoding it (not to mention allocating memory for this, which is not cheap -- certificate encodings are messy!) up to the cache capacity, which is configurable to an arbitrarily large size.
On-demand TLS can be used to relieve this pressure, as certificates are only loaded into the cache, when handshakes come in. But for busy servers, that turns out to be almost instantly. And it hammers the 'ask' endpoint which is now required, and we have received complaints about this in the beta version.
So, this is not a very small change, but it's not as big as I thought it'd be, to use only 1 certificate cache, and to persist it through config reloads.
When a config using the TLS app is loaded, the TLS app is provisioned and started; this loads certificates into the cache which are not already loaded or which are manually-loaded. We now keep a list of which subjects/certificates a config loads. This adds slightly more memory overhead but insignificant relative to the actual certificates.
Then, when a TLS app is "cleaned up" (decommissioned in preparation for being garbage-collected), it loads the currently-active Caddy context which has a reference to the new/current TLS app. Any certificates/subjects that are listed only in the "old" TLS app are removed from the cache.
This does have the side-effect that managed certificates can't be forcefully-reloaded from storage unless the config is completely unloaded first (or the server is stopped) and then reloaded. This is because managed certificates don't get evicted from the cache if the next config is also using them; however, when those certificates come due for renewal, the new config will apply when renewing them.
Manually-loaded certificates are still reloaded from storage at every config reload, because the server does not manage them, config reloads are how you tell the server that you updated the certificate. (This is the "classic" method of using TLS certificates with traditional web servers: update the cert on disk, then reload the server config to apply the change.) So if you are manually loading thousands of certificates, this change won't help you much. But then why are you using Caddy...
Anyway, I still need to finish this and do more testing to ensure it actually works. My hunch is this should drastically lighten config reloads when there's lots of certificates.