Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What activity from alerts and actions should be tracked in the event log? #62303

Closed
mikecote opened this issue Apr 2, 2020 · 5 comments
Closed
Labels
discuss Feature:Actions Feature:Alerting Feature:EventLog Meta Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@mikecote
Copy link
Contributor

mikecote commented Apr 2, 2020

Let's use this issue to come up with data that should be captured into the event log that users would care about. This will give us a better idea how to design the UIs displaying such information and handle relationships if any.

There is basic work already underway to support the alert details page here: #55636. Though, understanding the bigger picture of what information we want to gather and how to display it will help us structure the the event log data #55640.

Example:

  • Should the user be able to see their alert actions that failed to execute?
  • Should the user be able to see how many attempts an alert action took before executing successfully?
  • Should the user be able to see when the alert executor failed to run?

It's a fine line between gathering the information for the user versus gathering information for debugging. If it's mostly for debugging, we can use the server logs.

@mikecote mikecote added discuss Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Apr 2, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@peterschretlen
Copy link
Contributor

There was an internal discussion around diagnostic APIs recently, and this came up:

"Actions" Alerting statistics breakdown by type: Email/Pagerduty/Slack/Webhook success & error counts at the connector level, if possible per error type (i.e. webhook error response codes and counts).

I feel like we are probably pretty close to being able to track this already. The only additional ask here was that we expose it as metrics/diagnostics through an API. A basic aggregation API might be a useful addition to the event log API.

cc @kostasb

@kostasb
Copy link

kostasb commented Dec 11, 2020

On top of logging these errors in server logs, an aggregation API for this activity will help the user compare outputs from points in time to quickly identify failing actions. As noted above, breakdown by error type by connector would be really useful for this.

@pmuellr
Copy link
Member

pmuellr commented Dec 17, 2020

Great examples, thx all!

We also have a need to do searches based on "visible alerts/actions" where visible means "RBAC indicates you can read this". Which complicates things. The event log only knows about saved object ids, not RBAC, so you need to do an alert/action query first to get application alert/action ids that are "visible", which would end up being a filter in the event log query.

But ideally, at an API level, it would be nice to allow some arbitrary DSL queries (or perhaps es sql would be good enough and easier) where we apply some "RBAC filter" internally, which would allow a pretty open-ended programatic API over the event log, across "visible" alerts/action/future-event-log-objects.

@gmmorris gmmorris added the Meta label Jul 14, 2021
@ymao1
Copy link
Contributor

ymao1 commented Jul 29, 2021

Closing as we'd prefer to receive requirements rather than speculating what might be needed.

@ymao1 ymao1 closed this as completed Jul 29, 2021
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Feature:Actions Feature:Alerting Feature:EventLog Meta Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

No branches or pull requests

8 participants