Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting][Event log] Persisting duration information for active alerts in event log #101387

Merged
merged 17 commits into from
Jun 9, 2021

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented Jun 4, 2021

Resolves #93704

Summary

This PR calculates and persists duration information for alert (eg instance) events in the event log. When a new alert is triggered, a start time is recorded in the new-instance event, with an initial duration of 0. A uuid is also generated. While this alert remains active, each active-instance event log document will be written with the start time, the uuid and an updated duration in nanoseconds. When the alert recovers, the end time will be recorded in the recovered-instance document in the event log. The uuid will remain the same for the active span of an alert but will change if the alert recovers and then becomes active again.

Checklist

Delete any items that are not applicable to this PR.

@ymao1 ymao1 changed the title Event log/alert duration [Alerting][Event log] Persisting duration information for active alerts in event log Jun 7, 2021
@ymao1 ymao1 self-assigned this Jun 7, 2021
@ymao1 ymao1 added Feature:Alerting Feature:EventLog release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.14.0 v8.0.0 labels Jun 7, 2021
@ymao1 ymao1 marked this pull request as ready for review June 7, 2021 17:01
@ymao1 ymao1 requested a review from a team as a code owner June 7, 2021 17:01
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, posted a few questions though.

I think we should probably open a new issue, given we now have this capability, that we can make the alert details page a little better by showing the REAL alert duration (for alerts that started after the 7.14 migration), rather than trying to calculate it based on previous records. I suspect that means the API we use in the event log will change as well, but we'd need to see.

x-pack/plugins/alerting/server/task_runner/task_runner.ts Outdated Show resolved Hide resolved
// Inject start time into instance state of new instances
for (const id of newAlertIds) {
const state = currentAlerts[id].getState();
currentAlerts[id].replaceState({ ...state, start: currentTime, uuid: uuidv4() });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side thought, for @YulNaumenko - does this new UUID value for the alert id belong in one of the new rule. fields we're writing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense to me. Need to check with @elastic/security-detections-response

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to write alert information into the rule object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good philosophical question. An alert is rule-related, so it seems fine to me to. As an example, the process. fields have a thread.id - kinda seems like the same kind of relationship. We might be able to get some historic context from the RFC that introduced this.

I guess the RAC field for this value is kibana.rac.alert.uuid right now? Per previous comment ^^^, I'm hesitant to add kibana.rac fields right now, especially if it's something that should be in rule. (or some other existing field). Maybe we need to get a new ECS RFC together to add such a field to rule.? Let's bring this up with the RAC alerts-as-data folk ...

@ymao1
Copy link
Contributor Author

ymao1 commented Jun 8, 2021

I think we should probably open a new issue, given we now have this capability, that we can make the alert details page a little better by showing the REAL alert duration (for alerts that started after the 7.14 migration), rather than trying to calculate it based on previous records. I suspect that means the API we use in the event log will change as well, but we'd need to see.

Created this issue: #101662

Copy link
Contributor

@YulNaumenko YulNaumenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ymao1 ymao1 added the auto-backport Deprecated - use backport:version if exact versions are needed label Jun 9, 2021
@ymao1
Copy link
Contributor Author

ymao1 commented Jun 9, 2021

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

@ymao1 ymao1 merged commit a12ff5d into elastic:master Jun 9, 2021
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Jun 9, 2021
…ts in event log (elastic#101387)

* WIP

* Storing start, duration and end in alert state

* Writing to event log

* Updating unit tests

* Adding unit tests

* Fixing uuid in tests

* Updating functional test

* Adding functional test

* Removing console logs

* Fixing unit tests

* PR fixes

* Removing uuid from alert information

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Jun 9, 2021
…ts in event log (#101387) (#101784)

* WIP

* Storing start, duration and end in alert state

* Writing to event log

* Updating unit tests

* Adding unit tests

* Fixing uuid in tests

* Updating functional test

* Adding functional test

* Removing console logs

* Fixing unit tests

* PR fixes

* Removing uuid from alert information

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: ymao1 <ying.mao@elastic.co>
gmmorris added a commit to gmmorris/kibana that referenced this pull request Jun 9, 2021
* master:
  clarify which parts of TM are experimental (elastic#101757)
  Add sh scripts with _bulk_action route usage examples (elastic#101736)
  [Uptime] Only register route in side nav if uptime show capability is true (elastic#101709)
  Use KIBANA_DOCS in doc link service (elastic#101667)
  [Alerting][Event log] Persisting duration information for active alerts in event log (elastic#101387)
  Address design issues in Discover/Graph (elastic#101584)
  Optimize performance for document table (elastic#101715)
  Change file data visualizer links to point to new location in home application (elastic#101393)
  [Fleet] Tighten policy permissions, take II (elastic#97366)
  [ML] Add debounce to the severity control update  (elastic#101581)
  [Fleet] Fix routing issues with `getPath` and `history.push` (elastic#101658)
  [APM] Add link-to/transaction route (elastic#101731)
  [Index Patterns] Runtime fields CRUD REST API  (elastic#101164)
  [ILM] Refactor types and fix missing aria labels (elastic#101518)
  [Lens] New summary row feature for datatable (elastic#101075)
  Blocks save event filter with a white space name (elastic#101599)
  Improve security server types (elastic#101661)
  [APM] Replace side nav with tabs on Settings page (elastic#101460)
  [APM] Only register items in side nav if user has permissions to see app (elastic#101707)
  [Security solution][Endpoint] Add back button when to the event filters list (elastic#101280)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed Feature:Alerting Feature:EventLog release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.14.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[discuss] extending event log for faster/easier access to active instance date information
5 participants