Skip to content

Large number of stored certificates correlated with hanging API requests #8334

@augustuswm

Description

@augustuswm

After the release of R15 onto rack 3 (colo) we noticed that external API requests would periodically hang for ~8-10s before responding. While debugging this we realized that our certificate automation (cassette) was incorrectly pilling up certificates as it was failing to identify when it had successfully installed new certificates. This lead to ~10k certificates being stored in the database.

We know that the background task that reads and parses the certificates selects all non-deleted certificates, parses them, and stores them in memory for later use when selecting which certificates to use. We also know that removing all of the extraneous certificates from the database has resulted in API requests to no longer periodically hang for 8-10s.

We do not know the casual link (assuming there is one) between these two, but we suspect based on statemaps and speculative tracing from @bcantrill that we are spending a lot of CPU time on this. It is unclear why this would block serving a request and we will need further research to determine a root cause.

We do not believe R15 had any impact on this, and it was only coincidental in timing. We should be able to debug this on dev systems now with more tooling that we have available at the moment on rack 3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions