-
Notifications
You must be signed in to change notification settings - Fork 59
Description
After the release of R15 onto rack 3 (colo) we noticed that external API requests would periodically hang for ~8-10s before responding. While debugging this we realized that our certificate automation (cassette) was incorrectly pilling up certificates as it was failing to identify when it had successfully installed new certificates. This lead to ~10k certificates being stored in the database.
We know that the background task that reads and parses the certificates selects all non-deleted certificates, parses them, and stores them in memory for later use when selecting which certificates to use. We also know that removing all of the extraneous certificates from the database has resulted in API requests to no longer periodically hang for 8-10s.
We do not know the casual link (assuming there is one) between these two, but we suspect based on statemaps and speculative tracing from @bcantrill that we are spending a lot of CPU time on this. It is unclear why this would block serving a request and we will need further research to determine a root cause.
We do not believe R15 had any impact on this, and it was only coincidental in timing. We should be able to debug this on dev systems now with more tooling that we have available at the moment on rack 3