Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certificate expiry monitoring #103

Closed
neilisfragile opened this issue Apr 15, 2020 · 7 comments · Fixed by #106
Closed

Certificate expiry monitoring #103

neilisfragile opened this issue Apr 15, 2020 · 7 comments · Fixed by #106
Assignees

Comments

@neilisfragile
Copy link

Sygnal should check the expiry date of its certificates and emit them as a metric which can trigger alerts.

@clokep clokep self-assigned this Apr 22, 2020
@clokep
Copy link
Member

clokep commented Apr 22, 2020

This should be pretty straight forward, I've done this previously.

@neilisfragile: Was there any discussions about how we want the metric to be exported? An integer number of seconds is what comes to mind to me...? We could also export the raw time of expiration, but that seems less ideal.

@clokep
Copy link
Member

clokep commented Apr 22, 2020

Also, just to make sure I'm not missing anything -- it seems this is only applicable when using APNs with a certfile (not a keyfile).

@neilisfragile
Copy link
Author

Sounds sane but @michaelkaye is the best person to ask.

@michaelkaye
Copy link
Contributor

Raw time of expiration will be independent of delays / clock skew on the target machine, and is what (eg) blackbox exporter reports currently, as the metric probe_ssl_earliest_cert_expiry. You can probably also optimize somewhat by setting the value once on certificate read (as it won't change), and avoiding a background task that needs to wake up and update it every 60s.

We alert on a derived metric that is like so:

(max(probe_ssl_earliest_cert_expiry{ job="blackbox_http"}) by (instance) - time())/(60*60*24)

to get 'days to expiry' for our https used certificates

@michaelkaye
Copy link
Contributor

i know it's a tiny task, but we do run it about 512,000 times between cert expiries ^^

It will still be useful the other way, but this is just a bit of context on other situations where we have this type of metric.

@richvdh
Copy link
Member

richvdh commented Apr 23, 2020

so for clarity: you're suggesting we report expiry date, in seconds since the unix epoch?

@michaelkaye
Copy link
Contributor

yes, to keep it in sync with similar things we have reporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants