-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Meta] UI ability to assign alert actions per action group #64077
Comments
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
One of the questions that was raised when we previously looked at supporting multiple action groups, was how the parameters might be changed to deal with action groups. For example, say you have an threshold-styled alert type that has two action groups - Now thinking that trying to put the parameters near the action groups isn't right. There is only one set of config/parameters for an alert, they should be grouped together in terms of input controls. For the case of a threshold-style alert with two action groups, the simplest case is just to two threshold values - and they should probably be named relevant to the action group. So, A more complicated example could be a threshold-style alert with two action groups that took different thresholds AND expressions. In that case, the parameters could have paired parameter values for the threshold, the comparator, the field compared, etc. Still all in the "parameter" section of the UI, and again named relevant to their actionGroup. This makes the UI a bit wonky, with all the params in one section, and then the action groups in another section, but I'm not sure a pleasing UI is possible where you split these kind of "paired" parameter values so they are closer to the action group they related to. I think this is the simplest conceptual model though (all parameters together), so we should try a design like that to see if we think it's reasonable. |
I'm wondering if we could convert the existing index threshold alert to allow multiple action groups, in a "progressive" manner. Eg, add an optional second threshold value and second action group. Seems do-able, kinda wondering how confusing it will be though ... Some interesting UI considerations as well - will need potentially two threshold lines in the preview graph. In general we need to figure out the constraints between comparators and thresholds. If comparator is |
Below are some questions I keep running into as I try to create concepts for this.
I know some of these overlap Patrick's comments, but was just outlining the ones I had |
This would be defined by the alert type, but it seems like it would be hard to allow the UI of an alert type to specify a set of parameters applicable to a subset of action groups. So it feels like for now, we shouldn't consider "grouping" the params around certain action groups - let's just leave them all in the same visual location.
Again, up to the alert type, but could certainly be multiple.
Yes, looking at it from the alerting point of view, the actions in each group are separate. Howevers, it seems likely that folks would want to define the same connector to run in multiple groups. Maybe in one condition (for one action group) you do X, but as the condition worsens you want to do an additional thing (for another action group). It would be nice to "copy" the connector from one group to another. Another way of looking at this is that you could have a list of all the connectors you wanted to run, across all action groups, and then for each connector you could indicate which action group it ran in. Except that doesn't quite work as the connectors in the action groups are ordered, so there wouldn't be a way to indicate the order. I think for now, the simplest thing in this case is to force the customer to create the same connector in all the action groups. We can optimize the UX for this later when we find out how this stuff is used in practice.
Per my notes in 3. ^^^, the customer will have to add a new connector for that group, and they can choose whatever connectors they want, including connectors not in the main group. Every connector would be under a single action group, and an action group can have multiple connectors. Also, in case it wasn't clear, the alert type defines how many action groups it supports (it's static), and also indicates the "default" one to use, if there are multiple.
Not clear how common it will be for alerts to have multiple action groups, but as action groups are defined by the alert type, some will have 1, some will have multiple. We could optimize the case for an alert with a single action group defined, and not show any grouping, just as we do today. If the alert does support multiple action groups, then the area under the graph, where you list the actions, would now have a separate section for each action group (the default one first, and the remaining ones are ordered I believe), where actions can be added to each action group, just like today.
Critical/Alert/Warning was the original reason for creating action groups. The user cannot add/change/rename them - they're defined by the alert type. Ack/unack is - I think - out of scope here. I can see how it could potentially fit in here, but there's a lot of other stuff to think about with acks. I've also been thinking about having "built in" action groups - action groups every alert would have - "resolved" and "no data" are the two that I've been thinking of. They're basically common behaviours/states an alert / alert instance could be in, that we'd like to handle generically, if possible. Jury is still out though, it's not coming in 7.8 anyway. As a somewhat contrived example, but based on a real need, there's a new SSL certificate check alert. It will fire when a cert is going to expire within 30 days. The thought is to allow for additional actions once it's going to expire in 7 days. So, think two action groups "early warning" and "about to expire". As alert parameters, you'd be able to set the 30 and 7 values for those, in the usual parameter section. In the alert groups, you add an email action to "early warning", and an email AND slack action to "about to expire" - the idea being to bug them a little bit more the closer you get to the expiration date. You could also just decide you don't want the 30 day warning at all, and not have any actions associated with it. If you had email connectors in both groups, they're independent - they can have different messages. |
Here is the current iteration of the wireframes we've been discussing. Please let me know if anyone sees misalignments with what we've agreed upon. @arisonl |
@mdefazio Some early feedback from Observability: Metrics expect to need two or three levels in most cases, with two (alert and warning) being the most usual use case and three (e.g. major, minor and warning) being also possible, yet less usual. They would also like the option to receive an alert for each transition, hence not just "resolved" but also once the value falls from warning to normal, from major to minor etc. |
Some initial thoughts on this: I'm assuming that 'Warning --> Resolved' or 'Alerting --> Resolved' are the same as simply 'Resolved'. Is this correct? If an alert goes from Warning to Alert, will it simply run the actions that are associated with 'Alert'? Or do we also need to provide transitional actions here as well? |
Technically, these are "action groups" - [alert, warning], [major, minor], where I'm not sure a toggle for "Run actions when resolved" makes sense as a toggle, if it's just another action group.
That was my understanding. There is only "resolved", we won't have "minor->resolved" and "major->resolved" as separate things. I think technically, we could, but not sure we need it, and it makes things more complicated, so I'd say at best we defer that (and open a new issue if we think we need that).
Likewise, my understanding is that we won't have transitional actions like that, so you'd only see the 'Alert' actions run in that case. And as before, we could, but it's just more stuff, so worthy of a new issue (probably one issue for this and the previous note ^^^). |
Say 1 is threshold for warning and 2 is threshold for alert. The current thinking is that, you will get:
What I am hearing might be needed is a resolved notification when you get from 1.5 to 0.5, i.e. drop from warning to normal or from any other level to normal (e.g. from minor to normal). |
The dropdown would then show the following I'm showing 'Minor' as disabled with the thought that if they have not setup the condition in the trigger section, then they cannot choose it from the dropdown. But they would see all the available groups. And to re-state what @pmuellr was saying (so I understand correctly), The The dropdown options could probably use some better ordering than what i'm showing in the screenshot. |
Only Or maybe I'm misunderstanding what the group |
Moving the discussion about single vs multi select for action group into #67863. It was discussed during the last iteration that we would start with single select when choosing the group and implement multi select capability in the future. |
@mdefazio @gmmorris @mikecote @pmuellr a couple of questions on this design:
I understand that there are other factors that come into the design but I would like to understand better the answers to these questions too. Maybe I am not reading this design correctly too, so please correct my parsing as described above. |
I get that, it is way easier. Unless you wanted to reuse an email action across action groups AND customize who it's sent to. Eg, send to a small # of people on alert, more on warning, even more error. But I assume you'd still have the opportunity to create a new action per action group anyway, so you could decide to reuse or create new, which ever you want. I know we've talked about this action "reuse" before, and there is one little wrinkle I haven't given much thought recently (or maybe ever). We don't have a concept of "reusing" actions at the API level, we would HAVE to make a copy. Or add a new way in the API to refer to other actions within an alert definition. If we don't add some kind of reference, then we'd HAVE to copy, and the UI would then have to figure out the "reuse" itself. Probably not hard, just look for equivalent action definitions, treat those as "shared". But it also means if you create two actions that were the same, the UI would end up redisplaying this as "shared". That might be ok, but of course could also be very confusing. |
By "state change" do you mean state as in ok | active | error, or action group? Since we recently discussed action groups, I'm guessing you mean ok | active | error. Seems interesting, because it would be way easier for a customer to add this kind of notification if you are just interested in the change - you wouldn't need to add actions for action groups / resolved etc. Just one thing to create. One potential issue with this is that the mustache variables available for ok and error are going to be wildly different than active - we already know 'ok' (aka 'resolved') won't get ANY of the "context" variables. Error likely won't have any either. Which would make creating a "nice" message to handle all these situations difficult. |
@arisonl We raised some of these questions prior to the implemented PR - I'd suggest catching up on the recording. :) I personally think we should be grouping the actions by their action group in the UI as well, but that would clash with the concern around the fact that I need to duplicate an action for each action group... which isn't great either. We need to figure out how to balance these two problems and then we'll probably have a follow up issue. |
@gmmorris I am aware that some of the above has been raised in the past, ref: "I understand that there are other factors that come into the design". However my understanding is that a big part might be driven by technical decisions, and that's absolutely fine but I am discussing the UX aspects here. Duplicating actions could be relatively easy with some type of cloning functionality. The cognitive overhead of resolving what happens might be more important from a UX perspective. Absolutely fine to be a followup issue. |
@arisonl Sorry, sounds like I gave you the impression I'm brushing your concerns off - if so, I apologise, that wasn't my intention. Quite the opposite. I'm confirming we are also worried about this and have been discussing it, just haven't done a good job of documenting it here.
Perhaps @mikecote & @mdefazio can weigh in here, but my understanding was that it's the UX challenges we're currently stuck on, not technical ones. |
Not at all, we are all good and I am ++ with all you are writing.
It's not to me either, that's why I posted these questions.
+1000 on that too, that's the right approach, getting to the right place incrementally and I trust the team 1000% on that. |
@pmuellr I know that you've done a lot of thinking around this and I also cannot agree more with the [release it as fast as possible in order to add value and then iterate to optimise it] approach. And btw thank you for taking the time every time to get into the details and specifics :) My point is that consolidating things does not necessarily make it easier, e.g. if you need to unpack them in order to make sense of what happens under a number of conditions. That's a question for me, I don't have an answer. I would love to revisit in one of the next design meetings and I am sure we will get feedback and learn more as we move forward. I would think that there should be UX options to easily reuse and customise, in order to counter to a certain extent the disadvantage of the alternative approach.
Sorry for not being clear: I meant "action groups" and some of the recent discussions triggered the questions I posted above. But that's a very good point too: I think that the term "action group" is not very descriptive from a user perspective in the context of alerts with multiple "levels" or severity (e.g. warning, alert etc.). As a user I want to be notified when an alert changes level/severity, as opposed to As a user I want to be notified when an alert changes "action groups". I feel that the latter is less intuitive and it might be inspired by how it is implemented technically. If that's so (please correct me if I am wrong), we shouldn't require users to be familiar with it and we shouldn't bias our UX based on that either. Again these are all questions. |
Closing now that each individual issue is closed and merged. |
@arisonl I read through the description and comments and wasn't sure if the use case below is met or not, hence the explicit question below: For a given rule definition, I need to specify two ( or sometimes three) separate thresholds with a separate severity label and action associated with each threshold. Does this issue enable that? if it does, what changes observability UI needs to make to incorporate this feature? |
Yes, it does. Broadly speaking the changes need to be made in both the UI (you can spin your own or use our UI Components) and the AlertType itself (at the moment the alert, presumably, schedules all actions on the The dev docs detail how to do this, but we're always happy to guide y'all through it. |
@mukeshelastic The need for supporting two and sometimes three levels and attaching different actions has been discussed in the past with @sorantis over a couple or more occasions. This functionality supports multiple levels as per these discussions. It is a framework level capability and solutions can build on top of it in order to provide the specific types that they need, similarly to how the existing solution-specific types have been created. Please ping us if you need more guidance at this stage. |
An alert can use one or many action groups to fire actions.
Currently the UI is limited to assign alert actions to
defaultActionGroupId
only.This is a meta issue to support different action groups like the API already supports.
Individual Issues:
TBD
The text was updated successfully, but these errors were encountered: