-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task manager enhancements for error handling in alerting and actions #39829
Task manager enhancements for error handling in alerting and actions #39829
Conversation
…k started running
Pinging @elastic/kibana-stack-services |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I didn't do a full review, but @mikecote asked me to have a look at the migration changes, and those look peachy. 💯 |
Hi, sorry I need a little more time looking at this. I pulled it down, and getting a server crash, but there's a chance it's because my ES is messed up or other reasons. Also, there are 2 conflicts that need to be resolved on this PR |
I think I keep hitting stuck migrations, even when I clear all my documents out.
I can fix the problems with missing shards in the indices (hence why they are red) but for now, this is causing Kibana to completely crash. I'll need to look a little more to see if this is just a distraction |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Looked at the changes from the latest commits to clear my "requested changes."
I'd feel way more comfortable if the SavedObjects/Migration changes were split into a separate PR from this one. |
@joshdover I have moved it to another PR #41815, will update this one once merged. 👍 |
This comment has been minimized.
This comment has been minimized.
@joshdover The migration changes have been removed from this PR now that they are in master. Let me know if this looks good to you now. |
💚 Build Succeeded |
…lastic#39829) * Allow mtask definitions to overwrite default setting maxAttemps * Leverage scheduledAt from task manager * Treat maxAttempts like attempts and not retries * Add support for second intervals * Min 1 attempt * Reverse relying on scheduledAt * Add new startedAt attribute in task manager that keeps track when task started running * Don't extend runAt when claiming a task * Remove startedAt from state * Attempt trying to define custom getBackpressureDelay function * Pass error object to getBackpressureDelay * Cleanup processResultForRecurringTask code * Add backpressure to timed out tasks * Change default timeout backpressure calculation * getBackpressureDelay to return seconds instead of milliseconds * Add comment for task store query * Compress query * Revert alert / actions specific code * Add more interval tests * Fix failing jest tests * Fix test * Add more unit tests * Fix integration tests * Fix sorting of tasks to process * WIP * Always provide error when getBackpressureDelay is called * Rename getBackpressureDelay to getRetryDelay * retryAt to be calculated from timeout time by default * Remove invalid test * Add unit tests * Consider timeout before scheduling a retryAt * Remove backpressure terminology * Remove support for 0 based intervals and timeouts * Apply PR feedback * Fix last place using Math.abs * Modify migrations to allow running a script when converting an index to an alias * Convert task manager to use saved objects * Fix broken test * Fix broken tests pt1 * Remove index from task manager config schema * Accept platform changes * PR feedback * Apply PR feedback * Apply PR feedback pt2 * Apply PR feedback pt3 * Apply PR feedback pt4 * Fix feedback pt3 * Rename RawSavedObjectDoc to SavedObjectsRawDoc
…39829) (#42004) * Allow mtask definitions to overwrite default setting maxAttemps * Leverage scheduledAt from task manager * Treat maxAttempts like attempts and not retries * Add support for second intervals * Min 1 attempt * Reverse relying on scheduledAt * Add new startedAt attribute in task manager that keeps track when task started running * Don't extend runAt when claiming a task * Remove startedAt from state * Attempt trying to define custom getBackpressureDelay function * Pass error object to getBackpressureDelay * Cleanup processResultForRecurringTask code * Add backpressure to timed out tasks * Change default timeout backpressure calculation * getBackpressureDelay to return seconds instead of milliseconds * Add comment for task store query * Compress query * Revert alert / actions specific code * Add more interval tests * Fix failing jest tests * Fix test * Add more unit tests * Fix integration tests * Fix sorting of tasks to process * WIP * Always provide error when getBackpressureDelay is called * Rename getBackpressureDelay to getRetryDelay * retryAt to be calculated from timeout time by default * Remove invalid test * Add unit tests * Consider timeout before scheduling a retryAt * Remove backpressure terminology * Remove support for 0 based intervals and timeouts * Apply PR feedback * Fix last place using Math.abs * Modify migrations to allow running a script when converting an index to an alias * Convert task manager to use saved objects * Fix broken test * Fix broken tests pt1 * Remove index from task manager config schema * Accept platform changes * PR feedback * Apply PR feedback * Apply PR feedback pt2 * Apply PR feedback pt3 * Apply PR feedback pt4 * Fix feedback pt3 * Rename RawSavedObjectDoc to SavedObjectsRawDoc
In this PR, we're making the following changes to the task manager to support having better error handling for alerting and actions:
10s
)10s
)maxAttempts
configurationmaxAttempts
has a minimum of 1getRetryDelay
function to return delay in seconds to wait before attempting the task againmaxAttempts
configuration now behaves like maximum total attempts and not how many retries after first failurerunAt
is no longer extended by timeout when the task runsFixes #40872
Dev-Docs
Task manager now uses saved objects
Starting up kibana will convert
.kibana_task_manager
to an alias and indices will follow the.kibana_task_manager_1
syntax. A migration will execute to prefix the ids withtask:
as it converts to saved objects.