[Scheduled Reports] error handling (timeouts, errors, kbn restarts...) #75603
Labels
(Deprecated) Feature:Reporting
Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead
discuss
Note: Start with #75605, which contains info about:
reporting:execute
type of taskProblem statement:
In case there are errors in a report execution, scheduled retries should be triggered immediately. Task Manager schedules retries with an exponential backoff for timeouts, which would create a bad user experience when someone has manually requested a report.
Proposed solution:
Reporting will handle tracking retries and marking jobs as failed when the the number of retries goes hardcore to the max.
How it would work:
We can preserve the current ESQueue-like retry behavior when switching Reporting to Task Manager. Our
reporting:execute
run function will do the following:Why it makes sense
We already have the reporting index that contains all the reports and have fields to describe their state in a queue: number of attempts, time that processing jobs expire, etc. By continuing to use those documents to describe the state in the queue, we can preserve the behavior to retry immediately in case of an error.
What are the risks
Alternative options
We could work with the Task Manager owners to work on an enhancement that would let us override its retry logic and not use exponential backoff. Doing so would avoid us ending up in a state where a server crash leaves processing jobs get "stuck" as Task Manager would hold on to those tasks and run the retries.
The text was updated successfully, but these errors were encountered: