-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Task Manager] Support for limited concurrency Task Types #54916
Comments
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
For actions, and I think alerts, we create a new task type per action type. It may make sense to be able to set the Alternatively, we could probably also just have one taskType for all actions, and plumb more data into it - not sure what the pros/cons are to that. |
Part of the complication is in how TM claims tasks - we don't want to lose cycles where we claim 10 and then drop them because's we're at capacity with that specific type, but have capacity for others. |
Would it also be possible to use these settings to configure TM to completely disable itself from claiming a certain task type? Maybe that could be the same as setting the allowed concurrent tasks of a type to If Reporting uses Task Manager and I have an instance that I don't want to be able to execute Reports, this setting would give me what I need. |
That makes sense, but we will probably want an info message about this on at startup, for diagnostic purposes. Eg, someone uses 0 on all instances, and then wonders why those tasks never run. |
Came up with a possible direction, details over here: If @tsullivan & @joelgriffith feel this adequately addresses their needs and @elastic/kibana-alerting-services like the direction, then we can consider pulling this issue into the To Do list I think. |
Having discussed the issue with Alerting Services and Reporting, we've decided to go the route of adding limited support for concurrency which will specifically support Reporting, but we won't allow other task types to utilise it for the time being to avoid adding too many additional pollers. We feel comfortable adding a second poller for Reporting as they'll be removing their use of ES queue in that same version, meaning that, in effect, there's the same number of polls running in parallel as before. This work will follow the path spiked over here: #74883 |
I'm wondering if we would want to reframe this as a "one concurrent task poller", compared to just reporting. Would be for "large/expensive" tasks. Reporting today, probably more tomorrow ... |
That makes sense to me. Allow any app or service to register a "large/expensive" task definition, and the secondary poller could search for these tasks with a size of 1. Whichever large task has been waiting the longest would get singularly claimed with each poll interval. Scaling up with multiple instances of Kibana would help with keeping a backlog down. Perhaps the interval duration could be configurable if the machine has the hardware to do more work on the backlog. |
I spent a couple of days on this last week, and came to the conclusion that the direction the POC from a few months ago took was right, but the "fork" point, where we duplicate the mechanism was quite a bit off. In the POC we forked at the root of the Poller - which means the entire interval mechanism was duplicated.
Since then, we've also added a variety of other mechanisms into that stream that get duplicated as a result:
Duplicating these processes makes it much harder to reason about what's happening in TM, and over cplicates our monitoring solution. Once I realized that our original; approach was no longer suitable I spiked another POC and found a much better place to fork the process. In addition, a list of other complications we hadn't quite considered revealed themselves (these are true no matter the forking point):
I think we're now on the right path. I've been able to get the entire e2e test suite green on my spike (but it's hacked together so all unit tests are red 😆 ), and need to add some additional ones to test for edge cases, but I'm feeling confident about this direction. |
Great read, Gidi! Thank you for the hard work going into this. |
Describe the feature:
Task Manager used to be able to limit how many concurrent instances of a specific task type run on a single Kibana instance.
We have also identified that there might be need to limit the concurrency of specific tasks (or groups of tasks), as alert types also want to synamically limit how many instances of a certain type can run concurrently.
Describe a specific use case for the feature:
We need to bring this feature back for Scheduled tasks and possibly others such as SIEM.
Edit / Note: There isn't currently a need to support this at the alert type level but definitely at the task manager level for reporting purposes.
The text was updated successfully, but these errors were encountered: