-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to override priority #28
Comments
This is very much in line with my thinking on this topic as well. The default values (if needed) are configured for the handler, and thus apply to any/all OpsGenie alerts/interactions. The Sensu check configuration can then supply any configurations that override the handler values. And then the check command itself can also supply the configurations. And indeed, this is what we see happening (intended, even?), here. The "missing" bits, in my mind, are the lookup and use of these various fields, when making the requests to OpsGenie's APIs, via the And indeed, the driving use case for my needs is the fact that some of my development environment checks require high priority treatment, and some of the production environment checks should be using low priority treatment (and currently are not). We can work around this by creating separate "teams" in OpsGenie, each with different escalation rules and separate API keys -- but that is working around limitations in this gem's implementation; I wanted to address that root cause, if possible. |
I think we are mostly in agreement but from your comment I do think one nuance was not conveyed or understood properly. When I mean default and override options I mean a
Does that make anymore sense? Keep in mind that while in many orgs the people writing sensu checks are the same people as the admins that is not the case in all orgs. I know quite a few large places that basically offer sensu as a service and they might install say a deb/rpm to install checks created by someone else. |
Well, I understand that use case, but I am not sure how necessary it is, to address that, right now -- as opposed to iterating on a smaller change. Adding multiple levels of fallbacks and defaults leads to things like this, which I would very very much like to avoid, if possible. |
OK, I was actually thinking of a simplified version of that, we can skip for now as long as we do it in backwards compatible way. Honestly while chef's attribute precedence is somewhat complicated it is the reason they are so powerful, every attribute precedence has a real use. I have used every level of attributes and am grateful of them because it allows me to remain flexible. I do think ours would look a lot simpler but still would be a bit complex but is because alerting is complex. When I have some time tonight I will review the code, thanks for talking this through with me. |
This was created to discuss #27 by @Castaglia at a high level on the methodology before jumping into specific code.
Currently the handler creates incidents/events/alerts (whatever the right terminology for opsgenie is) in the default Priority which is
P3
. Ideally we should be able to change that based on various factors.When I consider an incidents urgency there are usually several factors at play and as such I think we should outline how we would like this to work as there are different schools of thought on this matter. To best illustrate how I think we should proceed I pose the following scenarios and ask for feedback.
We should consider overrides beyond at the check level to include the handler and client level config. When supplied more than one location (such as check + handler) who's config takes precendence. To better illustrate I pose the following scenarios:
Environments
Not all environments are equal I might want to ensure that non production environments have a low default priority and a higher default in production.
Checks
Not all checks have equal priority, a caching service failing and my database service failing are two very different things.
Some clients are more important than others
My database nodes and application nodes very differently in how I respond to incidents. A disk at 90% on an app node that does not write much to disk and the same percentage on my database node are different.
My thoughts
I think given the above I think we want to create 2 types of priorities:
Why default priorities and the precedence proposed
Well to allow an admin to say default in production I want things to be
P4
but allow overriding(higher|lower)
on per client or check basis. In this case the most specific is the check definition so it wins that one.Why override priorities and the precedence proposed
In non production environments you might say nothing is ever above a
P4
as that might mean drop whatever you are doing and this prevents configuration on a check level being overridden as you specified this at the handler level.Feedback
This was just my initial reaction to the feature and look forward to hearing what others have to say on this.
The text was updated successfully, but these errors were encountered: