Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic deployment of Prometheus alerts #266

Open
jchristgit opened this issue Apr 30, 2024 · 10 comments
Open

Automatic deployment of Prometheus alerts #266

jchristgit opened this issue Apr 30, 2024 · 10 comments
Assignees
Labels
component: monitoring An issue relating to a monitoring component (e.g. Prometheus, Grafana) group: kubernetes Issues and pull requests related to the Kubernetes setup

Comments

@jchristgit
Copy link
Member

Right now, changes to our Prometheus alerts need to be deployed manually.

We should incorporate a deployment for this into GitHub actions on the main
branch such that any changes are automatically rolled out without requiring to
know the local setup.

@jchristgit jchristgit added component: monitoring An issue relating to a monitoring component (e.g. Prometheus, Grafana) group: kubernetes Issues and pull requests related to the Kubernetes setup labels Apr 30, 2024
@jchristgit
Copy link
Member Author

@jb3 Do you have an idea for how best to do this? Right now I'm not even sure
how to deploy alerts to Prometheus in Kubernetes in the first place. I think
for the documentation I will make a separate issue though.

@jb3
Copy link
Member

jb3 commented Apr 30, 2024

Noting to self, we can set the config map prefs to always query the apiserver for the latest changes, hence nullifying the propagation delay of changes.

@jb3
Copy link
Member

jb3 commented Apr 30, 2024

I lied, this is a kubelet option, we cannot set this per configmap, we will have to do some smart in-pod detection at Prometheus that the reload has gone through.

There is however always a timestamp in the mounted directory, we just need to keep checking this timestamp (probably with a recurring kubectl exec).

@jb3
Copy link
Member

jb3 commented Apr 30, 2024

/prometheus $ ls -la /opt/pydis/prometheus/alerts.d/
total 12
drwxrwsrwx    3 root     2000          4096 Apr 30 19:24 .
drwxr-xr-x    3 root     root          4096 Apr 26 21:41 ..
drwxr-sr-x    2 root     2000          4096 Apr 30 19:24 ..2024_04_30_19_24_46.1524242850
lrwxrwxrwx    1 root     2000            32 Apr 30 19:24 ..data -> ..2024_04_30_19_24_46.1524242850
lrwxrwxrwx    1 root     2000            24 Apr 26 21:39 alertmanager.yaml -> ..data/alertmanager.yaml
lrwxrwxrwx    1 root     2000            24 Apr 26 21:39 certificates.yaml -> ..data/certificates.yaml
lrwxrwxrwx    1 root     2000            19 Apr 26 21:39 coredns.yaml -> ..data/coredns.yaml
lrwxrwxrwx    1 root     2000            15 Apr 26 21:39 cpu.yaml -> ..data/cpu.yaml
lrwxrwxrwx    1 root     2000            18 Apr 26 21:39 django.yaml -> ..data/django.yaml
lrwxrwxrwx    1 root     2000            16 Apr 26 21:39 etcd.yaml -> ..data/etcd.yaml
lrwxrwxrwx    1 root     2000            16 Apr 26 21:39 jobs.yaml -> ..data/jobs.yaml
lrwxrwxrwx    1 root     2000            18 Apr 26 21:39 memory.yaml -> ..data/memory.yaml
lrwxrwxrwx    1 root     2000            17 Apr 26 21:39 nginx.yaml -> ..data/nginx.yaml
lrwxrwxrwx    1 root     2000            17 Apr 26 21:39 nodes.yaml -> ..data/nodes.yaml
lrwxrwxrwx    1 root     2000            16 Apr 26 21:39 pods.yaml -> ..data/pods.yaml
lrwxrwxrwx    1 root     2000            20 Apr 26 21:39 postgres.yaml -> ..data/postgres.yaml
lrwxrwxrwx    1 root     2000            22 Apr 26 21:39 prometheus.yaml -> ..data/prometheus.yaml
lrwxrwxrwx    1 root     2000            17 Apr 26 21:39 redis.yaml -> ..data/redis.yaml

@jb3
Copy link
Member

jb3 commented Apr 30, 2024

Another related issue for a potential future feature kubernetes/kubernetes#22368 (open for 7 years though, yikes!)

@shtlrs
Copy link
Member

shtlrs commented Jun 7, 2024

Can't we check for the git diffs when the ci runs, and if we find configmap files (that we will identify following some rule/logic), we apply them ?

@jchristgit
Copy link
Member Author

jchristgit commented Jun 7, 2024 via email

@jb3
Copy link
Member

jb3 commented Jun 8, 2024

Unfortunately the settling of configmap updates cannot be guaranteed on live pods during that window, it's a scheduled job on the kubelet from memory.

The Kubernetes solution is just to have a sidecar container running something like inotify or whatever the modern equivalents are and then upon detecting a change it can call out via the HTTP management API to Prometheus or (I think) send a signal to the process, I can't remember if sidecars share the same process namespace.

I'll investigate this one later today.

@jchristgit
Copy link
Member Author

jchristgit commented Jun 9, 2024 via email

@jb3
Copy link
Member

jb3 commented Jun 9, 2024

However, with automated reloads like this we should ensure we have an alert in case of config reload failures. We do not have this yet, do we?

We should be able to add an alert for this yes, I'll include it when I PR this feature in. prometheus_config_last_reload_successful should handle it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: monitoring An issue relating to a monitoring component (e.g. Prometheus, Grafana) group: kubernetes Issues and pull requests related to the Kubernetes setup
Projects
Status: In Progress
Development

No branches or pull requests

3 participants