Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] Add a required, programmatic message to actions #64349

Open
Zacqary opened this issue Apr 23, 2020 · 3 comments
Open

[Alerting] Add a required, programmatic message to actions #64349

Zacqary opened this issue Apr 23, 2020 · 3 comments
Assignees
Labels
enhancement New value added to drive a business result estimate:needs-research Estimated as too large and requires research to break down into workable issues Feature:Alerting/RuleActions Issues related to the Actions attached to Rules on the Alerting Framework Feature:Alerting NeededFor:logs-metrics-ui Project:ImproveAlertingManagementUX Alerting team project for improving the management experience of alerting. Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@Zacqary
Copy link
Contributor

Zacqary commented Apr 23, 2020

Summary

Alert executors should be able to send whatever message they want when firing an action. The user-defined message should be appended to the executor's programmatic message, and the user can use this to provide additional context. This is because the information that we need to convey in an alert is often complex, dynamic, and requires product design in order to be effective.

Context

From discussions on implementing #64080, the Metrics team has realized we need to be able to have more control over what messages get sent to users. Right now the message field relies entirely on the user to configure a useful message with all relevant information, and not to delete anything that's required.

This becomes especially precarious in a case like the Logs alerts (#62806), which have a default message of:

{{context.matchingDocuments}} log entries have matched the following conditions: {{context.conditions}}

which becomes something like:

24 log entries have matched the following conditions: message matches ASL Sender Statistics

context.conditions is a highly dynamic value, and deleting it would make the alert message effectively useless.

Because of the complexity of potential alert states, conditions, and configurations, we're exploring using something even more dynamic than context.conditions in metric alerts. Perhaps removing all context variables and just writing a single context.message that formats all relevant information:
Screen Shot 2020-04-22 at 2 46 16 PM

The alternative would quickly get too advanced and out of hand:
Screen Shot 2020-04-22 at 2 59 02 PM
(Note the condition0 naming convention, which we already use in the 7.7 release. Users have to manually add references to condition1, condition2, etc. every time they add additional conditions, and that's aggravating and error-prone. And you may notice I already made a syntax error in my pseudocode)

We can implement the context.message approach with the alerting plugin today. The problem is, what happens if the user deletes context.message from their alert?

We don't want to rely on the user just realizing that they shouldn't do that.

Under this change, the user-defined message would no longer be to manually format and present the data coming from the alert. It would be to provide additional context relevant to whatever the user is using alerting for: e.g. instructions for the on-call person who's getting this alert about how to respond to it.

@Zacqary Zacqary added enhancement New value added to drive a business result Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Apr 23, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@pmuellr
Copy link
Member

pmuellr commented Apr 27, 2020

Trying to boil down the requirements here - seems like there's a desire for two messages - one coming from the alert, which may be non-trivial (contain lists of things) - and one that could be set in the action params when editing the alert, specific to the usage of that alert. The customer would see both - presumably the one from the alert, followed by the one set in the action params - in an email/slack message, separated by a blank line.

I've been kind of thinking about something like this in reference to figuring out how to have an notification that would include the result of another action. Eg, a theoretic GitHub issue action that would create an issue. You'd like to get the issue number / url from that action, and add it as another part of the message. Maybe at the bottom?

At some point we need to look into better Slack messaging, which means using their "blocks" stuff. Perhaps we can settle on a generic shape that looks similar, and for messaging systems that don't have "blocks" like this, we just do the best we can - eg, join the blocks with a blank line between them.

The other thing to think about, as these messages get more complex, is the formatting supported by the various actions. Today we have plain text for most services, but Slack messages can use THEIR version of markdown-like markup, and for email we expect the message to be a more typical version of markdown - and their are differences. How should an alert render a message so that it can be consumed by either? Should it create a "slack" version of a message, and a "markdown" version? Is markdown good enough to use in plain text situations as well? A simple hack is to export allow context variables like message_slack and message_markdown, and then let the action executor figure out which of the message* variables to use. Or expose all of them, let the customer decide.

@Zacqary
Copy link
Contributor Author

Zacqary commented Apr 27, 2020

seems like there's a desire for two messages - one coming from the alert, which may be non-trivial (contain lists of things) - and one that could be set in the action params when editing the alert, specific to the usage of that alert. The customer would see both - presumably the one from the alert, followed by the one set in the action params - in an email/slack message, separated by a blank line.

Yep, that's about what I was thinking.

As for action types that are more complex than plain text, I feel like that makes having an opinionated message from the alert even more important. Slack blocks, especially, feel like they could benefit from specific product design choices. For Metrics, just basing off what Datadog does (which is admittedly where I'm basing most of my alerting opinions), we might want to include a thumbnail of a graph, a different color depending on how far the metric has crossed over the threshold, links to the metric explorer, several other things that would be difficult to build a user-facing UI to customize.

That level of complexity could benefit emails too, if we want to start sending rich HTML.

IMO there's a large subset of action types which are basically, in some way, shape, or form, "send an alert message." Whether it's a server log, an email, a Slack message, a PagerDuty message, we can cover most bases with:

  1. Let the user edit a plain text Title and a Message with reasonable defaults and enable some {{context variables}}
  • Title covers email subject, Slack block heading, etc.
  1. Have the alert type handle styling, formatting, rich features, and non-trivial information.
    • For server logs, this just means generating a text string explaining what happened in the alert
    • For Slack messages and emails, we can decide to convey some of this information with graphics instead of the same text string

On the other hand, there are some action types that don't fit the bill of "send an alert message," like creating a Github issue in response to an alert. That's something a little more complicated that I don't have a frame of reference for.

@gmmorris gmmorris added NeededFor:logs-metrics-ui Project:ImproveAlertingManagementUX Alerting team project for improving the management experience of alerting. Feature:Alerting/RuleActions Issues related to the Actions attached to Rules on the Alerting Framework labels Jun 30, 2021
@gmmorris gmmorris added the loe:needs-research This issue requires some research before it can be worked on or estimated label Jul 14, 2021
@gmmorris gmmorris added the estimate:needs-research Estimated as too large and requires research to break down into workable issues label Aug 18, 2021
@gmmorris gmmorris removed the loe:needs-research This issue requires some research before it can be worked on or estimated label Sep 2, 2021
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result estimate:needs-research Estimated as too large and requires research to break down into workable issues Feature:Alerting/RuleActions Issues related to the Actions attached to Rules on the Alerting Framework Feature:Alerting NeededFor:logs-metrics-ui Project:ImproveAlertingManagementUX Alerting team project for improving the management experience of alerting. Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
No open projects
Development

No branches or pull requests

5 participants