Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter Prometheus Alerts by regexp is limited, add feature to query by labels #385

Closed
SimeonPoot opened this issue Jun 5, 2021 · 8 comments

Comments

@SimeonPoot
Copy link
Contributor

SimeonPoot commented Jun 5, 2021

Currently we'd like to query Prometheus for certain Active Alerts to block nodes from being rebooted, this functionality is present. It's great to have this functionality in place, However filtering by regex is limited in a way we can't use it.

The situation is that we have different priorities of workloads getting the same alerts, that is;
If teams are experimenting with their setup and it fires an alert, lets say 'PodCrashing', we don't want to Kured to be blocked by this Alert.. But we do want to block Kured when this alert is fired on the workloads of the kubernetes-infra team (as this hits all teams).
Rather than use different names for alerts per priority, it would be great if we could query Prometheus for active Alerts by 'labels'. This way we could utilize labels like: "severity":"critical","team":"kubernetes-infra" or something in line of this.

Because we'd like to only block Kured on specific situations, I think it a hit on the query of labels should block Kured to reboot, thus doing the opposite of the regex.

I'm not sure if this would be a good feature to introduce, but I've been working on implementing this in code (with a bunch of unit-tests) and at the moment starting to test it on KiND. I'll link this to a PR #386 .

@evrardjp
Copy link
Collaborator

Should we close this? If there is extra work to do, please mention it :)

@SimeonPoot
Copy link
Contributor Author

hi @evrardjp. thanks for coming back to this.
The issue is still relevant. The only change that was made with the PR #386 , was restructuring of PrometheusClient as well as adding unit-tests. I removed all of its 'query by label' code as this needed some more investigation / better way of implementing.

  • I saw that there's an prometheus endpoint /alerts, where we could make use of, rather than /query.
  • The way of querying by labels: Investigate how Prometheus does this; would be great if the way the alerts are queried, is the same as Prometheus. A uniform way will reduce complexity.

What do you think of this?

@evrardjp
Copy link
Collaborator

As you can guess with the delay in my answers: I am not really available to work on this.
Hence, if you do the work, you are more likely to get things merged ;)

I am not really familiar with prometheus internals, but I think it's worth investigating, and documenting WHY we want to change or NOT change. Example: Is the /alerts endpoing better? Why? Should we have a sidecar model instead, or should we keep this in code?

Agree on making things uniform, but not sure what you are proposing here.

@SimeonPoot
Copy link
Contributor Author

I can image, I'd like to work on it, but this comes between work/kids.

I have to say I don't know the internals of Prometheus either, I'm learning as I go, and by this I'm seeing things we can use, like the Alerts endpoint. In the end we'd like to filter by labels, regardless of endpoint.

About the uniform way, I meant.. The way Prometheus is queried:
Query by 1 label (eg.) {"severity":"critical"}, would get all the alerts with this label, and by querying more labels like (eg.) {"severity":"critical","team":"kubernetes-infra"}, we get more specific alerts.
Rather than get all the alerts with {"severity":"critical"} and all the alerts {"team":"kubernetes-infra"}.

Interesting point, how would you setup a sidecar model for this issue?

@evrardjp
Copy link
Collaborator

The blocker tools could be custom made (and we can provide a few defaults), which would apply a label on nodes, a label we can watch.

@github-actions
Copy link

This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).

@sebastiangaiser
Copy link

Can this one please be reopened.
It would really be helpful being able to query with labels like {"severity":"critical"}.

Copy link

This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants