Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] the alert updated_at field is no representative of the last update by a user #82701

Closed
gmmorris opened this issue Nov 5, 2020 · 12 comments · Fixed by #83578
Closed
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@gmmorris
Copy link
Contributor

gmmorris commented Nov 5, 2020

In this issue we made the assumption that the SavedObject's updated_at field is a correct representation of when an Alert has last been updated.
As of the introduction of Execution Status this is no longer correct.

@gmmorris gmmorris added bug Fixes for quality problems that affect the customer experience Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Nov 5, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@pmuellr
Copy link
Member

pmuellr commented Nov 5, 2020

The assumption being that updated_at is the last time a human updated the SO, compared to the code that now updates the execution status.

The possible solutions mentioned in #82658 are:

  1. Don't mutate updated_at on rule run, only on user edit
  2. Add a secondary field that specifies timestamp of user edits

I'll add

  1. Stop updating the alert with the alert status, get it from the event log instead, dependent on [alerts] add executionStatus to event log doc for action execute #82401

Option 1 doesn't sound good - not clear what downstream effects might be if we do actually update the SO, but the updated_at doesn't reflect that. Maybe nothing today, perhaps something in the future?

Option 2 sounds like the most straight-forward thing to do. Add a new field in the alert SO to track this date, and update it on all the existing "update" call sites, but obviously don't update it when we update just the execution status.

Option 3 sounds intriguing. Since we're basically adding the execution status bits to the event log for the execute event action, presumably we could pull these out of a query. There's likely a lot more code involved in doing this, including multi-SO queries against the event log, which we don't currently support. Perhaps this could be a long-term goal, with whatever we do sooner being a tactical solution.

@pmuellr
Copy link
Member

pmuellr commented Nov 5, 2020

I'm interested to understand how the update_at value is being used today. For informative purposes in UIs to show customers when the alert was last changed? Or in the alert executor? Sounds like the first.

Keep in mind that the alert will also be updated for enable/disable/muteAll/unmuteAll/updateApiKey. Also, updating the throttle can only be done via update() anyway, so that's perhaps another cases of "not really updating the alert", but instead modifying some "meta data" about the alert.

If the intent is to show customers the last time a "non meta data" aspect of the alert was updated, that I think we'd certainly want something like a new secondary field that only dealt with the "non meta data" updates. Eg, say the new field is lastEditedAt (name to be debated, obviously), we could arrange to not update this value for cases like muteAll. Trying to not update it for throttling changes would involve diffing the update to see if it's JUST a throttle change, might not be worth it, or might be wrong to not update it anyway, given the throttle can only be updated via update() anyway.

@mikecote
Copy link
Contributor

mikecote commented Nov 5, 2020

Option 3 sounds intriguing. Since we're basically adding the execution status bits to the event log for the execute event action, presumably we could pull these out of a query. There's likely a lot more code involved in doing this, including multi-SO queries against the event log, which we don't currently support. Perhaps this could be a long-term goal, with whatever we do sooner being a tactical solution.

One downside with this approach is that we'd lose the ability to filter alerts by status or by error reason. Some teams are depending on this capability for UX purposes

@pmuellr
Copy link
Member

pmuellr commented Nov 5, 2020

One downside with this approach is that we'd lose the ability to filter alerts by status or by error reason. Some teams are depending on this capability for UX purposes

We could still do those, but it would require adding filtering on the event log query (not supported right now), and presumably this would be a secondary query; eg, in the alert list we'd need to do a query on the SOs themselves for filters associated with SO things, and a second one against the EL for filters associated with execution status, and "join" the results. Sounds complicated. :-)

I did want to mention the option of "woops, we shouldn't have added the execution status in the alert SO, let's get the data from somewhere else", since if there was a good answer there, we could go back to the way things used to be. Sounds like that's not going to be feasible short-term anyway.

@gmmorris
Copy link
Contributor Author

gmmorris commented Nov 5, 2020

The assumption being that updated_at is the last time a human updated the SO, compared to the code that now updates the execution status.

This was the assumption when we delivered #52738 (comment), which is why I marked this as a bug. When we delivered that original issue, the behaviour that we promised siem was that it would represent the last user update.

@pmuellr
Copy link
Member

pmuellr commented Nov 6, 2020

That PR #53793 is confusing to me, as it mentions adding an updatedAt field, which I assumed meant to the SO, but it looks like it's just a RESTy API field copied from the SO's own updated_at. Also mention in that PR that we can't sort on that field like other alert SO fields, since it's not really a field (but you should be able to sort on the SO's updated_at some how, rite?)

Anyhoo, feels like we're going to need to add a real updatedAt field, which the execution status updates would specifically not update (they're already partial updates, so guessing no change to that code anyway).

@gmmorris
Copy link
Contributor Author

It's worth keeping in mind that this fix will probably require a Saved Object migration.
As we're being cautious with migrations, it might be worth validating whether we need to backport this into the 7.10.1 patch version or not.
If we can avoid a migration at this point we probably want to, but if this causes broken behaviour in 7.10, then we might not have a choice.

@pmuellr
Copy link
Member

pmuellr commented Nov 10, 2020

Yeah, I'm curious whether we need a backport to 7.10, given the referenced issue #82658 doesn't seem to indicate this really "breaks" anything. Eg, if SIEM rules depended on this value in their alert type executors, I'd imagine this could be breaking things. It sounds like it's only visible in the UI though, so may be survivable through 7.10.x releases.

@ymao1 ymao1 self-assigned this Nov 16, 2020
@ymao1
Copy link
Contributor

ymao1 commented Nov 16, 2020

So for this issue it sounds like:

  • we want a field to represent the last time a user actively edited an alert
  • it cannot be the existing updated_at field because that belongs to the Saved Object and we can't control when that is updated (so we can't prevent it from being updated when the execution status is updated)
  • we will need a migration to add this new field (default to either the createdAt date or the lastExecutionDate date)
  • we will not need to backport this to 7.10 (gleaned from the "thumbs up" on @pmuellr's comment about maybe not needing to backport)

Will it be too confusing to use updatedAt for the field name? We would then have an updated_at field and an alert.updatedAt field, which might be a little confusing but would be consistent with existing the alert.createdAt, alert.createdBy and alert.updatedBy fields

@mikecote
Copy link
Contributor

The points above make sense to me 👍 I am indifferent on the naming, updatedAt makes sense to me for consistency 👍

@pmuellr
Copy link
Member

pmuellr commented Nov 16, 2020

we will need a migration to add this new field (default to either the createdAt date or the lastExecutionDate date)

Sadly, createdAt or lastExecutionDate aren't great, but we don't really have anything else to go on, do we? But lastExecutionDate seems better than createdAt, if both are set. For 7.10, I think lastExecutionDate is the same as the SO updated_at because of the new update on the execution status field. So we can probably use that existing date updated_at as the date to populate the new field. So, default value could be updated_at || createdAt?

Other than that, 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants