Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable disk size alerting for AWS EBS volumes on VEDA staging hub #5062

Open
4 of 6 tasks
Tracked by #5004
sgibson91 opened this issue Nov 13, 2024 · 9 comments
Open
4 of 6 tasks
Tracked by #5004

Enable disk size alerting for AWS EBS volumes on VEDA staging hub #5062

sgibson91 opened this issue Nov 13, 2024 · 9 comments
Assignees

Comments

@sgibson91
Copy link
Member

sgibson91 commented Nov 13, 2024

Context

Task list

Tasks

Definition of Done

  • The feature/service is technically complete
  • The feature/service been tested with one or more users (if applicable)
  • Deployed to VEDA staging hub
  • Feature has been tested to fire an alert
  • The feature/service is well documented, and the documentation is accessible for the target user base
@GeorgianaElena
Copy link
Member

@sgibson91, raising here for visibility so you're not caught by surprise in case you haven't seen #4923 (for more context https://2i2c.slack.com/archives/C055A1J1DRP/p1727942694562319)

@sgibson91
Copy link
Member Author

Thank you @GeorgianaElena I did see it and did remember about it, but hadn't done the work digging it out yet! Thank you!

@sunu, #4923 will need to be reverted for Grafana alerting :)

@sgibson91
Copy link
Member Author

sgibson91 commented Nov 27, 2024

@sunu found it difficult to programatically enable grafana alerting via grafonnet. I have proposed that we add the enablement of alerts as a one-time manual step in our hub deployment guide for now to close this issue out. I will open an issue to track a spike investigating grafonnet further in the new year, maybe with more 2i2c folks helping him.

@sgibson91
Copy link
Member Author

sgibson91 commented Nov 27, 2024

Grafana needs access to 2i2c's Freshdesk SMTP server to send emails to support[at]

@yuvipanda
Copy link
Member

@sgibson91 let's send these alerts to pagerduty rather than use smtp directly. I think the integration will be more straightforward this way, given we already do that in https://github.com/2i2c-org/infrastructure/blob/main/terraform/uptime-checks/pagerduty.tf

@sgibson91
Copy link
Member Author

Amazing, thank you for the suggestion @yuvipanda!

@yuvipanda
Copy link
Member

Another simpler suggestion is to try to use prometheus alertmanager. We already have it deployed, but disabled (

(with an inline comment that is not true anymore)).

It should be easier than trying to automatically do this in grafana:

  1. The pagerduty service key can be set in yaml config (https://prometheus.io/docs/alerting/latest/configuration/#pagerduty_config) in our secret config (https://github.com/2i2c-org/infrastructure/blob/main/helm-charts/support/enc-support.secret.values.yaml) as that's the same for all our clusters
  2. Alerts are generated by prometheus itself, using YAML based alerting rules (https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/). This would go in our support chart config
  3. The alert configuration itself is a prometheus expression, which is exactly the same as it is for grafana! So while what does the alerting and where it goes is different, the source of the data and how it's expressed is the same.

This would also allow us to add more alerting in the future without having to directly tie it to a grafana graph.

@yuvipanda
Copy link
Member

I have proposed that we add the enablement of alerts as a one-time manual step in our hub deployment guide for now to close this issue out.

These alerts may get overwritten when we deploy the grafana dashboard next, so will have to be checked to see if they persist.

@sunu
Copy link
Contributor

sunu commented Nov 28, 2024

Another simpler suggestion is to try to use prometheus alertmanager

Oh, that's a great suggestion! Let's go with Prometheus Alertmanager if that's an option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants