Filter Prometheus Alerts by regexp is limited, add feature to query by labels #385

SimeonPoot · 2021-06-05T09:41:24Z

Currently we'd like to query Prometheus for certain Active Alerts to block nodes from being rebooted, this functionality is present. It's great to have this functionality in place, However filtering by regex is limited in a way we can't use it.

The situation is that we have different priorities of workloads getting the same alerts, that is;
If teams are experimenting with their setup and it fires an alert, lets say 'PodCrashing', we don't want to Kured to be blocked by this Alert.. But we do want to block Kured when this alert is fired on the workloads of the kubernetes-infra team (as this hits all teams).
Rather than use different names for alerts per priority, it would be great if we could query Prometheus for active Alerts by 'labels'. This way we could utilize labels like: "severity":"critical","team":"kubernetes-infra" or something in line of this.

Because we'd like to only block Kured on specific situations, I think it a hit on the query of labels should block Kured to reboot, thus doing the opposite of the regex.

I'm not sure if this would be a good feature to introduce, but I've been working on implementing this in code (with a bunch of unit-tests) and at the moment starting to test it on KiND. I'll link this to a PR #386 .

evrardjp · 2021-07-28T07:50:42Z

Should we close this? If there is extra work to do, please mention it :)

SimeonPoot · 2021-07-28T08:52:13Z

hi @evrardjp. thanks for coming back to this.
The issue is still relevant. The only change that was made with the PR #386 , was restructuring of PrometheusClient as well as adding unit-tests. I removed all of its 'query by label' code as this needed some more investigation / better way of implementing.

I saw that there's an prometheus endpoint /alerts, where we could make use of, rather than /query.
The way of querying by labels: Investigate how Prometheus does this; would be great if the way the alerts are queried, is the same as Prometheus. A uniform way will reduce complexity.

What do you think of this?

evrardjp · 2021-07-28T09:20:34Z

As you can guess with the delay in my answers: I am not really available to work on this.
Hence, if you do the work, you are more likely to get things merged ;)

I am not really familiar with prometheus internals, but I think it's worth investigating, and documenting WHY we want to change or NOT change. Example: Is the /alerts endpoing better? Why? Should we have a sidecar model instead, or should we keep this in code?

Agree on making things uniform, but not sure what you are proposing here.

SimeonPoot · 2021-07-28T10:00:29Z

I can image, I'd like to work on it, but this comes between work/kids.

I have to say I don't know the internals of Prometheus either, I'm learning as I go, and by this I'm seeing things we can use, like the Alerts endpoint. In the end we'd like to filter by labels, regardless of endpoint.

About the uniform way, I meant.. The way Prometheus is queried:
Query by 1 label (eg.) {"severity":"critical"}, would get all the alerts with this label, and by querying more labels like (eg.) {"severity":"critical","team":"kubernetes-infra"}, we get more specific alerts.
Rather than get all the alerts with {"severity":"critical"} and all the alerts {"team":"kubernetes-infra"}.

Interesting point, how would you setup a sidecar model for this issue?

evrardjp · 2021-07-28T10:55:04Z

The blocker tools could be custom made (and we can provide a few defaults), which would apply a label on nodes, a label we can watch.

github-actions · 2021-09-27T01:47:22Z

This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).

sebastiangaiser · 2023-11-20T10:27:17Z

Can this one please be reopened.
It would really be helpful being able to query with labels like {"severity":"critical"}.

github-actions · 2024-02-10T01:52:59Z

This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).

SimeonPoot mentioned this issue Jun 5, 2021

Restructuring Prometheus client, added unit-tests to regex-queries active alerts #386

Merged

dholbach added the question label Jul 28, 2021

github-actions bot added the no-issue-activity label Sep 27, 2021

github-actions bot closed this as completed Oct 18, 2021

justinrush mentioned this issue Aug 15, 2022

Silencing alerts in alertmanager should be ignored in kured #499

Open

jackfrancis reopened this Dec 11, 2023

github-actions bot removed the no-issue-activity label Dec 12, 2023

github-actions bot added the no-issue-activity label Feb 10, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter Prometheus Alerts by regexp is limited, add feature to query by labels #385

Filter Prometheus Alerts by regexp is limited, add feature to query by labels #385

SimeonPoot commented Jun 5, 2021 •

edited

Loading

evrardjp commented Jul 28, 2021

SimeonPoot commented Jul 28, 2021

evrardjp commented Jul 28, 2021

SimeonPoot commented Jul 28, 2021

evrardjp commented Jul 28, 2021

github-actions bot commented Sep 27, 2021

sebastiangaiser commented Nov 20, 2023

github-actions bot commented Feb 10, 2024

Filter Prometheus Alerts by regexp is limited, add feature to query by labels #385

Filter Prometheus Alerts by regexp is limited, add feature to query by labels #385

Comments

SimeonPoot commented Jun 5, 2021 • edited Loading

evrardjp commented Jul 28, 2021

SimeonPoot commented Jul 28, 2021

evrardjp commented Jul 28, 2021

SimeonPoot commented Jul 28, 2021

evrardjp commented Jul 28, 2021

github-actions bot commented Sep 27, 2021

sebastiangaiser commented Nov 20, 2023

github-actions bot commented Feb 10, 2024

SimeonPoot commented Jun 5, 2021 •

edited

Loading