-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Q4 2022 Goal: Monitoring & alerting for our infrastructure #1804
Comments
(I have a clear memory of already commenting here last week .. but perhaps I had neglected to click 'Comment' and lost my thought! Apologizes if I already typed this up somewhere else.) We've received a request to provide more analytics to our community partners. Specifically:
Since this Q4 2022 Goal is tackling 'monitoring', could it be expanded in scope to include analytics as well? I don't want to download extra work to @2i2c-org/tech-team so my request is to understand more about the scope of work is planned here and to offer my assistance in the development work, if appropriate. |
I would avoid expanding the scope.
Your help will be always welcome, @jmunroe. |
During Q4 health checks were deployed. |
I want us to feel confident that our infrastructure exists in a state that is not on fire. This means we need monitoring and alerting in place to make sure we have evidence to believe that - that 'everything is ok'.
For this, we need to have good monitoring and alerting that we trust. We should be able to realistically say 'there are currently no alerts, so we believe there are no outages right now'. And have enough trust in the process to believe that.
Here's a bunch of things that will help get us there:
Alerts are of two kinds - immediate outage (delivered via pagerduty) and a 'cliff alert' (something will go bad in a few days if you do not deal with something now) (delivered via freshdesk). Each alert should be clearly actionable - it's better to not have an alert at all than one we ignore.
The text was updated successfully, but these errors were encountered: