Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancements/monitoring: User-defined alerting rules proposal #958

Closed
wants to merge 1 commit into from

Conversation

bison
Copy link
Contributor

@bison bison commented Nov 17, 2021

This adds an enhancment proposal for supporting user-defined alerting
rules and limited alert overrides in the OpenShift monitoring stack.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 17, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign simonpasquier after the PR has been reviewed.
You can assign the PR to them by writing /assign @simonpasquier in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bison
Copy link
Contributor Author

bison commented Nov 17, 2021

cc: @openshift/openshift-team-monitoring @openshift/sre-alert-sme

Comment on lines 96 to 100
- key:
name: "Watchdog" # Alerts are identified by name and severity tuple.
severity: "none"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name & severity together are not enough to uniquely identify rules (for example 2 rules in different rule groups with the same name and severity).

In order to uniquely identify rules in the Console UI, we use all of: (group file, group name, rule name, rule duration, rule query, rule labels). I think we will need to do the same or something similar here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To extend on this a bit: I think the enhancement proposal could benefit from explicitly stating whether overrides must uniquely identify a single alert or not. Overriding or dropping multiple alerts with one spec could certainly be useful.
However this adds significant complexity from the UI perspective. How can a user ensure their override does what they intend.
If we want an explicit one to one matching lets call that out and sketch how a user interaction would look like if they provide an insufficient spec (i.e. one that doesn't uniquely identify one alert).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a point that absolutely needs clarification, thanks.

On a fresh 4.10-nightly cluster on AWS, I only see two alerts with this problem. One is AlertmanagerClusterFailedToSendAlerts which is due to cluster-monitoring-operator explicitly bringing the critical version of the alert down to warning and the other is KubeAPIErrorBudgetBurn which is a bit more complicated. That one legitimately seems to have two versions each at the warning and critical severities. Of course there could be other instances shipped by add-ons.

I suppose this is a trade-off between preventing an override matching multiple rules, and requiring the user to specify a bunch of fields to match against -- ones that often aren't the most visible to users. Given the above, I don't think either of these options are unreasonable:

  1. Consider multiple matches of name + severity as an error condition. Can't override that, sorry. We could couple this with amending the style-guide to disallow this, adding e2e tests for it in origin, and fixing the offending alerts.

  2. Just apply the override to all matching alerts. Though, like you said, that may present UX challenges.

I guess I would lean towards the first option personally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a clear an concise failure mode definitely beats a murky UI, so +1 for 1. Though it might be valuable to at least mention both approaches in this document?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another example here is Kube* rules that are generated with a namespace= field. SRE use this currently to determine what should/could be ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another example here is Kube* rules that are generated with a namespace= field. SRE use this currently to determine what should/could be ignored.

Sorry, not sure I follow. Do you mean that you'd like to be able to match on the namespace label in addition to name and severity when overriding an alert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some content about this failure mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KubeAPIErrorBudgetBurn is interesting because it is a "valid" case and it spans 2 PrometheusRules resources (kube-apiserver-slos-basic and kube-apiserver-slos-extended both in the openshift-kube-apiserver namespace).
Maybe we should allow an arbitrary set of static labels for defining the alert's key and patch the existing rule only if there's a unique match? The name of the alerting rule would be required at least.
For instance to patch all KubeAPIErrorBudgetBurn rules, you'd had to write

    overrides:
    - key:
        name: "KubeAPIErrorBudgetBurn"
        severity: "critical"
        long: "1h"
    - key:
        name: "KubeAPIErrorBudgetBurn"
        severity: "warning"
        long: "6h"
    - key:
        name: "KubeAPIErrorBudgetBurn"
        severity: "warning"
        long: "1d"
    - key:
        name: "KubeAPIErrorBudgetBurn"
        severity: "warning"
        long: "3d"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. I've updated the proposal with a CRD structure that allows matching against an arbitrary set of static labels, but requires an exact match.

Comment on lines 124 to 131
This leaves the original alerting rules present and untouched. These, as well
as any rules listed with the `drop` action, must be ignored by the platform
Prometheus instance. A `Secret` containing [alert_relabel_configs][1] data for
Prometheus is generated to accomplish this:
Copy link
Member

@kyoto kyoto Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the reasons for keeping the overridden rules?

Conceptually it seems simpler to actually remove an overridden rule and replace it with the new rule. That would avoid needing the alert_relabel_configs secret and needing to handle a magic label for dropped and patched rules and should reduce load on the system too. What are the benefits that make it worth keeping the old rules?

Copy link
Contributor Author

@bison bison Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to allow limited overrides of existing rules without the operators having to do anything. We can't modify or remove existing alerts shipped by any OpenShift operator, because the operator will just revert the changes. So the only option is to generate a new alert and somehow ignore the original one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think it would be good to briefly explain that background in the proposal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note about this to the proposal section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The othe reason is that we want to collect the "product" alerts via Telemetry. Otherwise we would lose the ability to catch regressions and such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note about telemetry. Thanks.

@jan--f
Copy link
Contributor

jan--f commented Nov 18, 2021

Nice work!
I'm wondering if it makes sense to call out the upgrade scenario where an upgrade changes an overridden alert more explicitly. What happens if the spec no longer matches an alert or now matches multiple? Even listing this as "can happend, considered out of scope" would be beneficial I think.


TBD

## Alternatives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to describe here why a ConfigMap is being used, instead of defining a new CRD for a regular API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, to be honest I'm not sure there's a reason other than that's what we prototyped with. It's called out as an open question. If the consensus is to just go with a CRD from the start, we can do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ended up deciding a CRD is best. I've added some detail about the proposed structure.

Copy link
Contributor

@dofinn dofinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job here @bison ! this stuff seemingly gets really complicated quickly. Maybe too many options here become un-productive. I know for OSD we simply do:

  • Silence an alert
  • Create an alert (if the silence needs replacing/customizing)

This has served us well thus far and we do this via https://github.com/openshift/configure-alertmanager-operator/blob/master/pkg/controller/secret/secret_controller.go#L136-L249 . This is not pretty :) but gives you an example of what SRE is silencing based on X,Y,Z values that are avaialbe within alert metadata.

For usability + console, I think two functionalities.

  • silence alert
  • create alert (the create could enable the user to pick an current alert to template from, but this in no way links them together)
  • created alerts could be stored in PrometheusRule called 'custom-admin-alerts'

Comment on lines 96 to 100
- key:
name: "Watchdog" # Alerts are identified by name and severity tuple.
severity: "none"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another example here is Kube* rules that are generated with a namespace= field. SRE use this currently to determine what should/could be ignored.

found, and the alert extracted.

For each override a new alerting rule is generated with the overrides applied on
top of the existing rule, and this new rule is added to the "alert-overrides"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having a hard time grok'ing this. Could you please clarify?

  • CM exists with identifiers and actions
  • Existing PrometheusRules are identified using the identifiers found in the CM
  • Actions are applied (Patch,Drop?) to these found PrometheusRules
  • Replacement rules are added to a PrometheusRules object called "alert-overrides"
  • label is add to each new replacement rule
    ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's basically it. For each entry in the overrides section of the ConfigMap, find the matching rule in the existing PrometheusRule objects, then apply the action -- that is either drop or patch.

If the action is patch, apply the user's changes on top of the existing rule, and add the result to the "alert-overrides" PrometheusRule. These always get a static label identifying them as a generated override rule. That label is what allows us to drop the original rule in the relabel config, because we can match only rules that don't have the label.

So, the last step is to then generate the alert_relabel_configs which drops the original version of any overridden alerts and any alerts the user specified the drop action for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a bit trying to summarize this better in the proposal section.


### User Stories

- As an OpenShift user, I want to raise the severity of an alert from `Warning`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- As an OpenShift user, I want to raise the severity of an alert from `Warning`
- As an OpenShift user, I want to manage the severity of an alert choosing between ("Info", "Warning", "Critical")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I made this a bit more general.


### API Extensions

None, but this is an open question. Is there good reason to make this a CRD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to CRD, could build status into suppress/changed alerts so as a user I have a summary of what I have changed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A CRD also provides better UX since it has built-in validation and documentation mechanisms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I've added a proposed structure for a CRD now.

@bison
Copy link
Contributor Author

bison commented Nov 23, 2021

I'm wondering if it makes sense to call out the upgrade scenario where an upgrade changes an overridden alert more explicitly. What happens if the spec no longer matches an alert or now matches multiple? Even listing this as "can happend, considered out of scope" would be beneficial I think.

@jan--f, yeah, this is definitely something to think about. I guess in the case of OpenShift simply removing an alert, it's no worse than it would have been. The original alert and the generated alert with the overrides both go away.

I think the UX is a little stranger if OpenShift renames an alert or changes its severity. In that case, my overrides are no longer applied, and I guess I have to go look at the cluster-monitoring-operator (or the CRD if we do that) to know why. In the worst case, that could lead to a critical alert not being routed the way I expect. Though, again, that's maybe not too different from the case without this feature.

@bison
Copy link
Contributor Author

bison commented Nov 23, 2021

Nice job here @bison ! this stuff seemingly gets really complicated quickly. Maybe too many options here become un-productive. I know for OSD we simply do:

Silence an alert
Create an alert (if the silence needs replacing/customizing)

Thanks for the feedback, @dofinn. So, does this feature sound useful to you? Or are you saying you think silences and copying / creating new alerts is enough, and this is too complicated?

@dofinn
Copy link
Contributor

dofinn commented Nov 24, 2021

Nice job here @bison ! this stuff seemingly gets really complicated quickly. Maybe too many options here become un-productive. I know for OSD we simply do:
Silence an alert
Create an alert (if the silence needs replacing/customizing)

Thanks for the feedback, @dofinn. So, does this feature sound useful to you? Or are you saying you think silences and copying / creating new alerts is enough, and this is too complicated?

I think the ability to silence and create alerts combined with ease of management:

  • what alerts have I silenced?
  • what alerts have I created?
    has great merit.

As mentioned previously, SRE already does this using workarounds. These workarounds are not
necessarily bad but Im thinking this enhancement could/would provide a better way to acheive this.
Silences/modificiations - > https://github.com/openshift/configure-alertmanager-operator/blob/master/pkg/controller/secret/secret_controller.go#L136
Creates -> https://github.com/openshift/managed-cluster-config/tree/master/deploy/sre-prometheus

Managing "whats replacing what" or "what alert have i modified" (especially across releases) seems like it would introduce large complexities.

enhancements/monitoring/alert-overrides.md Outdated Show resolved Hide resolved
Comment on lines 96 to 100
- key:
name: "Watchdog" # Alerts are identified by name and severity tuple.
severity: "none"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KubeAPIErrorBudgetBurn is interesting because it is a "valid" case and it spans 2 PrometheusRules resources (kube-apiserver-slos-basic and kube-apiserver-slos-extended both in the openshift-kube-apiserver namespace).
Maybe we should allow an arbitrary set of static labels for defining the alert's key and patch the existing rule only if there's a unique match? The name of the alerting rule would be required at least.
For instance to patch all KubeAPIErrorBudgetBurn rules, you'd had to write

    overrides:
    - key:
        name: "KubeAPIErrorBudgetBurn"
        severity: "critical"
        long: "1h"
    - key:
        name: "KubeAPIErrorBudgetBurn"
        severity: "warning"
        long: "6h"
    - key:
        name: "KubeAPIErrorBudgetBurn"
        severity: "warning"
        long: "1d"
    - key:
        name: "KubeAPIErrorBudgetBurn"
        severity: "warning"
        long: "3d"

Comment on lines 124 to 131
This leaves the original alerting rules present and untouched. These, as well
as any rules listed with the `drop` action, must be ignored by the platform
Prometheus instance. A `Secret` containing [alert_relabel_configs][1] data for
Prometheus is generated to accomplish this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The othe reason is that we want to collect the "product" alerts via Telemetry. Otherwise we would lose the ability to catch regressions and such.

enhancements/monitoring/alert-overrides.md Outdated Show resolved Hide resolved
Comment on lines 171 to 173
OpenShift platform alerting rules -- there seem to be only two cases. We may
want to add the requirement that these be unique to the [style-guide][2]
however, and enforce this in e2e tests going forward.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote above, the use case of multiple alerting rules with the same name and severity is valid IMHO. In practice there should be at least 1 static label that differentiate between rules with the same name/severity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have changed this to a CRD that allows matching an arbitrary set of static labels, but requires a unique match.

namespace: openshift-monitoring
spec:
# List of overrides for alerts in platform namespaces.
overrides:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be redundant because the resource name is already alertoverrides

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's basically just there because it differentiates the list of overrides from the list of new additional rules. Not sure what else to call it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I forgot about the additional rules. Then this makes sense 👍

matchLabels: # optional, but multiple matches is an error reported in status.
severity: "critical"
long: "1h"
action: "patch"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having the action as a value, have we considered having it as a key? This would allow us to define different parameters for different actions in the future if such a need arises. E.g.

spec:
  patch:
    ...
  drop:
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a super strong opinion on this. The way it is now somehow feels like a better middle ground between being Kubernetes-ish, and looking a bit like the existing Prometheus / Alertmanager configs to me. But if the consensus is to have a separate key for each action, that works too.

Good point on changing the parameters. Though, I guess we could already add new optional parameters in the existing version, and anything else probably needs a bump to the API resource version anyway, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the advantage of separate keys is that we can have different configuration options for each operation. This makes it more visible to the user which option can be used for each action, and also adds some "typing" to action parameters.

But anyway, it's just an option to consider. I don't think it's really a show stopper to have the action as a value.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. This is easier to read when we divide the list into patch, drop, add sections.

@bison
Copy link
Contributor Author

bison commented Nov 30, 2021

I also wanted to ping @openshift/team-cnv and @openshift/openshift-team-storage -- I think there were people on those teams that had expressed interest in this.

@jsafrane
Copy link
Contributor

From storage team perspective, we're interested for sure. This proposal looks OK. It would be nice to have if users could see in some "advanced view":

  • What alerts definitions are patched / dropped.
  • What alerts would be firing right now if they were not patched / dropped.

I think both are really hard to achieve, just leaving here for consideration.

@raptorsun
Copy link

It would be great if more details can be given on validation of the rule specification. for example

  • how to validate the expression is valid, every referenced rule exists
  • when dropping a rule, other rules are not referencing to it

Comment on lines +112 to +115
action: "patch"
labels:
patched: "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaiu any patched rule will have a automatic label add to it, so that we can identify its as a patched alert. If a patch directive attempts to overwrite this generated label, we'll run into issues where users can mess with the ability to discern whether an alert stems from a shipped rule or from an overwritten one. For example assume here the generated labelname is patched and a patch rule adds patched: "false".
The proposal should specifically call out this (error) scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, but having thought about it again, we add the label identifying the alert as patched to the newly generated rule, and we do it after applying the user-supplied overrides. So in practice, it's not possible to change it.

Also, in the relabel config, we match the original alerts by testing for the absence of the label. We don't actually care what the label's value is. So I don't think this presents a problem in the current design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note about the label being added after patches are applied.

@bison
Copy link
Contributor Author

bison commented Dec 8, 2021

It would be great if more details can be given on validation of the rule specification. for example

  • how to validate the expression is valid, every referenced rule exists
  • when dropping a rule, other rules are not referencing to it

Good points, @raptorsun. I added a note about validation with a link to the code Prometheus uses. Should be easy to reuse and allow validation of each generated rule.

I'm not sure I fully understand what you mean about dropped rules being referenced. Though, keep in mind dropped rules are only prevented from being sent to Alertmanager. They still exist in Prometheus. Can you maybe give an example?

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 5, 2022
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 12, 2022
Comment on lines 190 to 201
- Watch for a user-defined list of overrides and new rules.
- For each override, find the existing rule matching the name and labels, and
apply the action specified by the user, i.e. `patch` or `drop`.
- If the action is `patch`, generate a new rule with supplied overrides.
- If the action is `drop`, simply add the alert to the alert relabel config.
- If multiple alerts match the name and labels, notify the user and skip
this override.
- Generate new `PrometheusRule` objects for any new user-defined rules.
- Generate the [alert_relabel_configs][1] to drop original alerts that have been
replaced with a patched version, and any that were explicitly dropped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the operator approach. I wonder if a scenario is missing here:
Should the operator not also watch PrometheusRule objects for changes? The original object might change requiring the generated object to be regenerated? I'm not really sure if the PrometheusRule object that CMO creates is an input to the object generated by this new operator.

In any case it might be worth calling this out either way here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that is already happening in the proof-of-concept branch. I added a note here about it:

Processing of overrides is also triggered by PrometheusRule object changes, and changes to the set of platform namespaces.

@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Jan 20, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 20, 2022

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bison
Copy link
Contributor Author

bison commented Jan 20, 2022

/reopen

@openshift-ci openshift-ci bot reopened this Jan 20, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 20, 2022

@bison: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bison
Copy link
Contributor Author

bison commented Jan 20, 2022

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 20, 2022
This adds an enhancment proposal for supporting user-defined alerting
rules and limited alert overrides in the OpenShift monitoring stack.
@bison
Copy link
Contributor Author

bison commented Feb 7, 2022

I've updated openshift/cluster-monitoring-operator#1473 to match the implementation described here -- using a CRD and matching on label sets.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 7, 2022

@bison: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@bison
Copy link
Contributor Author

bison commented Feb 8, 2022

I started a possible alternative proposal here: #1030 -- Not sure if that's the best way to go about that, but I didn't know what else to do.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 8, 2022
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 15, 2022
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Mar 22, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 22, 2022

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jan--f
Copy link
Contributor

jan--f commented May 11, 2022

#1030 supersedes this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants