-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement wanted error handling for alerting and actions #41917
Conversation
Pinging @elastic/kibana-stack-services |
This comment has been minimized.
This comment has been minimized.
88012ae
to
d145f25
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
cad9add
to
13ae86a
Compare
This comment has been minimized.
This comment has been minimized.
x-pack/legacy/plugins/actions/server/action_type_registry.test.ts
Outdated
Show resolved
Hide resolved
return error.retry == null ? true : error.retry; | ||
} | ||
// Retry other kinds of errors | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to have this return false by default. Idea being that an action will know when something is retry-able - setting retry to true
or a Date - and any other case, should not be retryable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess by looking here https://github.com/elastic/kibana/blob/master/x-pack/legacy/plugins/actions/server/lib/execute.ts#L29-L74 any code that could throw here is already being handled and wrapped into a proper ExecutionError
. Some may need to return retry: false
ex validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and made it false
for other kinds of errors. I can't think of a reason to retry totally unhandled errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a few comments - just one in the review, a few "single" comments.
My only real concern is defaulting getRetry()
to true
in x-pack/legacy/plugins/actions/server/action_type_registry.ts
; perhaps I'm not thinking about that right. And we could always tweak it later ...
@@ -85,23 +94,22 @@ test('successfully executes the task', async () => { | |||
const runnerResult = await runner.run(); | |||
expect(runnerResult).toMatchInlineSnapshot(` | |||
Object { | |||
"runAt": 2019-06-03T18:55:30.982Z, | |||
"runAt": 1970-01-01T00:00:10.000Z, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting the date is set to such an old epoch-y date. My only thought is that at some point in the future, if we ended up needing to calculate a previous date from this, somehow, the epoch millis might go negative. Which node should handle, but I wonder if some other part of the system might choke on negative numbers.
$ node -p 'new Date(-1)'
1969-12-31T23:59:59.999Z
OTOH, I guess would actually be a good test, so ... leave it in!?!?!?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this seems to be what happens when you use fake timers, everything becomes new Date(0)
by default 🤔
…rmat is returned with status: error
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM, just wondering about defaulting to retrying.
💚 Build Succeeded |
) * Implement wanted error handling * Cleanup * Add retry logic to actions * Leverage startedAt from task manager * Fix broken jest tests * Add missing unit test * Add unit tests for getRetry * Add test for rate limit * Remove fake timers * Don't retry errors by default for actions unless the proper result format is returned with status: error * Don't retry unless attribute specified * Fix tests
… (#42255) * Implement wanted error handling for alerting and actions (#41917) * Implement wanted error handling * Cleanup * Add retry logic to actions * Leverage startedAt from task manager * Fix broken jest tests * Add missing unit test * Add unit tests for getRetry * Add test for rate limit * Remove fake timers * Don't retry errors by default for actions unless the proper result format is returned with status: error * Don't retry unless attribute specified * Fix tests * Increase retry timeout to prevent flaky tests (#42291)
This PR implements what is wanted for error handling in alerting and actions as outlined in #39349.
The following changes have been made:
maxAttempts
attribute that is optionalretry
attributescheduledRunAt
andpreviousScheduledRunAt
. Those are replaced withstartedAt
andpreviousStartedAt
.