-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Task Manager] Changed alerts schedule logic to use Task Manager internals #80149
[Task Manager] Changed alerts schedule logic to use Task Manager internals #80149
Conversation
…-tm-intervals # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
…-tm-intervals # Conflicts: # x-pack/plugins/task_manager/server/task_runner.ts
…ed right after task was complete previous execution
…-tm-intervals # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
…kibana into alerts/convert-to-tm-intervals * 'alerts/convert-to-tm-intervals' of github.com:gmmorris/kibana: (88 commits) fixed jest APM Experiments settings (elastic#81554) [Resolver] Enable resolver test plugin tests (elastic#81339) Add TS project references for inspector (elastic#81792) Add uri decode to es_ui_shared and fix navigation issues with special characters (elastic#80835) [Fleet] Rename ingestManager translations fleet (elastic#81837) [Logs UI] Transmit and render array field values in log entries (elastic#81385) Audit Logging: use the original url (elastic#81282) [User experience] Fix JS error rate (elastic#81512) [UX] Add median/percentile info in titles (elastic#79824) Support export for SO with circular refs (elastic#81582) Get rid of global types (elastic#81739) [APM] Fix precommit script (elastic#81594) skips overview tests (elastic#81877) [Security Solution][Case] Fix connector's labeling (elastic#81824) Added simple test, which only covers successful case when edit happened right after task was complete previous execution [Maps] Fix EMS test (elastic#81856) [Security Solutions][Detections] - Fix bug, last response not showing for disabled rules (elastic#81783) skip flaky suite (elastic#81853) Fixed type checks and unit tests ...
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
* master: (71 commits) [Chrome] Extension to append an element to the last breadcrumb (elastic#82015) [Monitoring] Thread pool rejections alert (elastic#79433) [Actions] Fix actionType type on registerType function (elastic#82125) [Security Solution] Modal for saving timeline (elastic#81802) add tests for index pattern switching (elastic#81987) TS project references for share plugin (elastic#82051) [Graph] Fix problem with duplicate ids (elastic#82109) skip 'returns a single bucket if array has 1'. related elastic#81460 Add a link to documentation in the alerts and actions management UI (elastic#81909) [Fleet] fix duplicate ingest pipeline refs (elastic#82078) Context menu trigger for URL Drilldown (elastic#81158) SO management: fix legacy import index pattern selection being reset when switching page (elastic#81621) Fixed dead links (elastic#78696) [Search] Add "restore" to session service (elastic#81924) fix Lens heading structure (elastic#81752) [ML] Data Frame Analytics: Fix feature importance cell value and decision path chart (elastic#82011) Remove legacy app arch items from codeowners. (elastic#82084) [TSVB] Renamed 'positive rate' to 'counter rate' (elastic#80939) Expressions/migrations2 (elastic#81281) [Telemetry] [Schema] remove number type and support all es number types (elastic#81774) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I might have found an issue that would come back with this PR (see comments).
I want to do some testing on Monday around error handling (similar to what I did here: #53650). I will finalize my review based on the results.
runAt: resolveErr<Date | undefined, Error>(runAt, (err) => { | ||
return isAlertSavedObjectNotFoundError(err, alertId) | ||
schedule: resolveErr<IntervalSchedule | undefined, Error>(schedule, (error) => { | ||
return isAlertSavedObjectNotFoundError(error, alertId) | ||
? undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some local testing, it seems to break the following fix we did a while ago #63093.
Previously returning undefined
as the next runAt
would cause task manager to delete the task. When returning undefined
to the schedule, I believe task manager no longer deletes these tasks and re-uses previous schedule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch... that's because we relied on an implicit behaviour there... because the task was not a scheduled task.
Considering we decided that scheduled tasks should keep running forever (from tTM's perspective) even if they fail and get deleted... so, arguably this is still correct.
What is missing, I guess, is a clean up process in the Alerting framework itself. I'll look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed here: 16729cd
(cc @YulNaumenko )
…alerting where needed
* master: Add derivative function (elastic#81178) [Discover] Deangularize context_app.html, part 3 (elastic#81838) [Visualize] Vis listing page breaks on unknown vis type (elastic#82018) Rename `batchSize` parameter to `batch_size` to be consisten with the API namings guidelines. (elastic#82123) Minor edits in Single Metric Viewer (elastic#82159) [Actions] Fix type contract (elastic#82168) Upgrade EUI to v30.1.1 (elastic#81499) Skip failing ES snapshot test (elastic#82207) Skip ES snapshot failing suite (elastic#82206) [Alerting UI] Grouped list of alert types using producers in Types filter of Alerts tab (elastic#81876) [Maps] convert vector style component to typescript round 1 (elastic#81961) Fix link to upgrade assistant (elastic#82138) Rename "service overview" to "service inventory" (elastic#81933) adjust policy test to drop test for server addresses (elastic#82120) Cleanup/codeowners (elastic#82146) [DOCS] Updates add data content (elastic#81093) [DOCS] Remove index mgmt docs (elastic#82099) [Search] fix cancelation related memory leaks (elastic#81996)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM!
…rnals (elastic#80149) * spiked intervals in alerts * ensure scheduled tasks dont get wiped * Fixed type checks and unit tests * Added simple test, which only covers successful case when edit happened right after task was complete previous execution * fixed jest * fallback to existing task schedule when possible * added missing test * Added support for day and hour schedule interval values * added docs for new schedule run result * fixed doc * added UnrecoverableError support for task runners nad pluged it into alerting where needed * typo Co-authored-by: Yuliia Naumenko <yuliia.naumenko@elastic.com>
…rnals (#80149) (#82330) * spiked intervals in alerts * ensure scheduled tasks dont get wiped * Fixed type checks and unit tests * Added simple test, which only covers successful case when edit happened right after task was complete previous execution * fixed jest * fallback to existing task schedule when possible * added missing test * Added support for day and hour schedule interval values * added docs for new schedule run result * fixed doc * added UnrecoverableError support for task runners nad pluged it into alerting where needed * typo Co-authored-by: Yuliia Naumenko <yuliia.naumenko@elastic.com> Co-authored-by: Gidi Meir Morris <github@gidi.io>
💔 Build Failed
Failed CI Steps
Test FailuresChrome X-Pack UI Functional Tests.x-pack/test/functional/apps/discover/feature_controls/discover_security·ts.discover feature controls discover feature controls security "before all" hook in "discover feature controls security"Standard Out
Stack Trace
X-Pack Accessibility Tests.x-pack/test/accessibility/apps/dashboard_edit_panel·ts.Dashboard Edit Panel "before all" hook for " A11y test on dashboard edit panel menu options"Standard Out
Stack Trace
Firefox XPack UI Functional Tests.x-pack/test/functional/apps/canvas.Canvas app "before all" hook in "Canvas app"Standard Out
Stack Trace
and 9 more failures, only showing the first 3. Metrics [docs]Distributable file count
Saved Objects .kibana field count
History
To update your PR or re-run it, just comment with: |
Decided to remove the |
Summary
Current PR should resolve #46001 and include thew following changes:
throwUnrecoverableError
which allows a Task runner to indicate to TM that it would like to fail the task and avoid rescheduling the task in the future.Addresses the issue identified in Tested the #53650 issue:
and as a result timeout between executions is 5 minutes + alerts schedule interval (2 sec)
Note for docs
Document task manager API changes. Explain how returning a schedule from the runner will update the task's schedule.
Checklist
Delete any items that are not applicable to this PR.
For maintainers