-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[alerts] add executionStatus to event log doc for action execute #82401
Conversation
c356721
to
3ca7c23
Compare
eb34d38
to
52bb9c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted a couple of anomolies in the current tests
05759b9
to
1822d4d
Compare
resolves elastic#79785 Until now, the execution status was available in the the event log document for the execute action. In this PR we add it. The event log is extended to add the following fields: - `kibana.alerting.status` - from executionStatus.status - `event.reason` - from executionStatus.error.reason The date from the executionStatus and start date in the event log will be set to the same value. Previously, errors encountered while trying to execute an alert executor, eg decrypting the alert, would not end up with an event doc generated. Now they will. In addition, there were a few places where events that could have had the action group in them did not, and one where the instance id was undefined - those were fixed up.
1822d4d
to
a4599f9
Compare
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
Wanted to point out some additional issues to open after working with this code; unrelated to this PR directly, but straight-forward and we should clean these up.
|
@@ -184,11 +184,15 @@ describe('Task Runner', () => { | |||
expect(eventLogger.logEvent).toHaveBeenCalledTimes(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes at the top of this file are a dog's breakfast, sorry. The order the events get logged has changed, where the execute
event is written AFTER all the other events now, but post-dated to the start of the execution process (and will be <= the other events generated), since we now have to write it AFTER all the execution processing. So, the diff is not very pleasing ...
@@ -153,7 +155,8 @@ export class TaskRunner { | |||
alert: SanitizedAlert, | |||
params: AlertExecutorOptions['params'], | |||
executionHandler: ReturnType<typeof createExecutionHandler>, | |||
spaceId: string | |||
spaceId: string, | |||
event: Event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now pass a partially populated event in to these "inner" functions of the execution logic, as we write the event at the "top level" of the execution logic, instead of in this "inner" one.
@@ -166,24 +169,10 @@ export class TaskRunner { | |||
alertRawInstances, | |||
(rawAlertInstance) => new AlertInstance(rawAlertInstance) | |||
); | |||
const originalAlertInstances = cloneDeep(alertInstances); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need the last set of alert instances (from the previous execution), so we can get the action group for "resolved" events, not just the original instance ids. Also, need to clone this, as alertInstances
is mutated during the processing.
@@ -45,6 +45,10 @@ | |||
"outcome": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are the two fields from the alert execution status, that weren't already populated somehow in the event, so added as new fields, in what I think are appropriate places - reason
is a pre-existing. ECS field, but status
is a new field in the custom ECS kibana.alerting
field
@@ -102,16 +102,16 @@ describe('EventLogger', () => { | |||
event: { provider: 'test-provider', action: 'a' }, | |||
}); | |||
|
|||
const ignoredTimestamp = '1999-01-01T00:00:00Z'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we are now "post-dating" the execute event log docs, we need to allow the timestamp to be overridable when we merge fixed and initial event fields - previously you could NOT override it. That logic is in the file below, x-pack/plugins/event_log/server/event_logger.ts
@@ -0,0 +1,84 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to test a decrypt error status in the event log, we need to run with security on - all the other eventLog-specific tests are in the spaces_only tests, so we reference some of the utilities used there via import.
} | ||
|
||
function validateEvent(event: IValidatedEvent, params: ValidateEventLogParams): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this diff section, best viewed with ignore whitespace, as I lifted some functions out of a function scope, to make them top-level, so they could be exported, so they could be reused by the security_and_spaces tests
I've added a number of comments to the PR as guide posts - lots of stuff that seems a bit obscure in this PR :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Verified that for event log where action = execute
, there is now a status field with executionStatus
and that if the status is error, a reason is populated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This is awesome thanks very much! Tested it out locally I'm hoping to switch over from using our rule status saved object to just reading errors through the getAlertInstanceSummary
on the alertsClient. Looks like that will work. This will really help detections clean things up in our own codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@elasticmachine merge upstream |
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: |
…stic#82401) resolves elastic#79785 Until now, the execution status was available in the the event log document for the execute action. In this PR we add it. The event log is extended to add the following fields: - `kibana.alerting.status` - from executionStatus.status - `event.reason` - from executionStatus.error.reason The date from the executionStatus and start date in the event log will be set to the same value. Previously, errors encountered while trying to execute an alert executor, eg decrypting the alert, would not end up with an event doc generated. Now they will. In addition, there were a few places where events that could have had the action group in them did not, and one where the instance id was undefined - those were fixed up.
* master: [Advanced Settings] Introducing telemetry (elastic#82860) [alerts] add executionStatus to event log doc for action execute (elastic#82401) Add additional sources routes (elastic#83227) [ML] Persisted URL state for the "Anomaly detection jobs" page (elastic#83149) [Logs UI] Add pagination to the log stream shared component (elastic#81193) [Index Management] Add an index template link to data stream details (elastic#82592) Add maps_oss folder to code_owners (elastic#83204) fix truncation issue (elastic#83000) [Ingest Manger] Move asset getters out of registry (elastic#83214)
) (#83289) resolves #79785 Until now, the execution status was available in the the event log document for the execute action. In this PR we add it. The event log is extended to add the following fields: - `kibana.alerting.status` - from executionStatus.status - `event.reason` - from executionStatus.error.reason The date from the executionStatus and start date in the event log will be set to the same value. Previously, errors encountered while trying to execute an alert executor, eg decrypting the alert, would not end up with an event doc generated. Now they will. In addition, there were a few places where events that could have had the action group in them did not, and one where the instance id was undefined - those were fixed up.
… alerts/action-groups-as-conditions * origin/alerts/stack-alerts-public: (91 commits) removed import from plugin code as it causes FTR to fail [Advanced Settings] Introducing telemetry (elastic#82860) [alerts] add executionStatus to event log doc for action execute (elastic#82401) Add additional sources routes (elastic#83227) [ML] Persisted URL state for the "Anomaly detection jobs" page (elastic#83149) [Logs UI] Add pagination to the log stream shared component (elastic#81193) [Index Management] Add an index template link to data stream details (elastic#82592) Add maps_oss folder to code_owners (elastic#83204) fix truncation issue (elastic#83000) [Ingest Manger] Move asset getters out of registry (elastic#83214) make defaulted field non maybe Remove unused asciidoc file (elastic#83228) [Lens] Remove background from lens embeddable (elastic#83061) [Discover] Unskip flaky tests based on discover fixture index pattern (elastic#82991) Removing unnecessary trailing slash in CODEOWNERS Trying to fix CODEOWNERS again, where was a non-existent team prior (elastic#83236) Trying to fix CODEOWERS, missing a starting slash (elastic#83233) skip flaky suite (elastic#83231) Add enzyme rerender test helper (elastic#83208) Move Elasticsearch type definitions out of APM (elastic#83081) ...
Summary
resolves #79785
Until now, the execution status was not available in the the event
log document for the execute action. In this PR we add it.
The event log is extended to add the following fields:
kibana.alerting.status
- from executionStatus.statusevent.reason
- from executionStatus.error.reasonThe date from the executionStatus and start date in the event
log will be set to the same value.
Previously, errors encountered before trying to execute an
alert executor, eg decrypting the alert, would not end up
with an event doc generated. Now they will.
Checklist