-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch Reporting to Task Manager #64853
Conversation
215e3b7
to
6b4a034
Compare
323fac3
to
347f61a
Compare
694524a
to
c9fed97
Compare
c9fed97
to
848bae9
Compare
c796f3f
to
d17e97a
Compare
65b569a
to
07dae6f
Compare
ba31b8a
to
8af9656
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the TM code and it looks right.
I started Kibana and saw that the Execute Report task did in fact have the limited concurrency expected.
I haven't tested that reports are generated or anything along those lines.
@streamich shared logs that appeared as though reporting jobs are running twice, so I'm holding off on this for a bit while I investigate. |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tested on Mac/Chrome, everything seems to work, could not reproduce the errors I was seeing a few weeks ago.
This PR adds a breaking change: if you have pending reports in the cluster, they will not be continued after upgrading to a version with these changes. Because of this fact, @kobelb and I have chatted and decided it's best for this change to roll out in a major version. Therefore this PR will not be backported to 7.x. |
💚 Build Succeeded
Metrics [docs]Page load bundle
History
To update your PR or re-run it, just comment with: |
Summary
Remove the ES-Queue sub-service of Reporting and replace the functionality with Task Manager integration.
Part of #53900
Closes #69340
Closes #75603
Closes #75605
Closes #76411
Closes #87074
Blocked on:
Summary
Using Task Manager for Reporting can be done by registering 2 new task types:
Execute Report Task
The Execute Report task is a "one-off" task that Task Manager will trigger as soon as it is executed. The Reporting API that handles requests to generate new reports will be responsible for scheduling these tasks, and also storing a historical document for the report output in the
.reporting-*
indices. The scheduled task references this placeholder report document.\This design splits report execution into 2 phases for async report generation:
CreateJobFn
of an Export Type.RunTaskFn
of an Export type.Generate Report: Create Job Phase:
Generate Report: Run Task Phase:
Monitor for Pending Reports Task
It is possible for a task in the run task phase to fail without rescheduling itself. This will happen in situations such as:
ESQueue was able to self-heal against this problem since it used the
.reporting-*
as the queue index and the historical storage index. That means "stuck" jobs pending would be discovered in the historical data in the same cycle ESQueue uses to find queued work that needs to be claimed. ESQueue would immediately claim the stuck/expired jobs it found.In Task Manager, we will need to define a 2nd task that discovers stuck jobs, by running searches on "Pending" or "Processing" jobs in the historical data and filtering on the
expiration_time
field. Unlike ESQueue, it will not immediately claim these jobs once discovered; instead it will re-schedule the execute report task using the job parameters in the historical document.Monitor for Pending Reports:
Checklist
Delete any items that are not applicable to this PR.
For maintainers
Release Note for breaking change
Make sure there are no pending reporting jobs before upgrading to 8.0. If there are any, they will remain in pending state indefinitely.