Modify default Task Manager configuration for better throughput? #78851

mikecote · 2020-09-29T19:23:04Z

After the performance improvements below are completed. Should we change the default maxWorkers and pollInterval that task manager currently has to provide a better throughput out of the box?

Batch the update and delete operations in Task Manager Batch the update and delete operations in Task Manager #65551
Eliminate the downtime between tasks completing and the next polling interval Eliminate the downtime between tasks completing and the next polling interval #65552
Apply back pressure in Task Manager whenever Elasticsearch responds with a 429 Apply back pressure in Task Manager whenever Elasticsearch responds with a 429 #65553

Note from @kobelb:
We should also assess the impact that increasing these default values have on the rest of Kibana. Since task manager is sharing a process with the rest of Kibana, we don't want to inhibit other operations from occurring. Ex: route handlers to service end-user generated HTTP requests.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-09-29T19:23:06Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

mikecote · 2020-10-02T18:00:11Z

This question can be asked in context of alerting GA to determine what performance benchmarks we want to deliver for GA. What is the optimal default throughput do we want out of the box?

mikecote · 2020-10-26T20:46:13Z

It was mentioned during today's sync that it can be complex trying to figure out what the recommended change is for this. Changing the configuration will have an impact on the remainder of Kibana (http requests, ingestion, etc) and makes it hard to come up with a recommendation at the cost of background CPU usage that may not be acceptable on small deployments.

The current theoretical throughput is 200 tasks per minute [10 tasks (max_workers) running every 3 seconds (poll_interval) = 200 tasks per minute if they complete before the next poll interval]. Since there is no complaint at this time on the current throughput, it was agreed that it's better to document how to increase throughput with mentions of what the values depend on and what impact changing them will have.

I have added a note to the documentation meta issue (#81532) to document this and will work with @bmcconaghy to come up with content once we've completed our performance benchmark issue (#40264).

I will now close this issue and we can re-open if ever we think otherwise.

mikecote added Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Sep 29, 2020

mikecote changed the title ~~Modify default Task Manager configuration for better throughput~~ Modify default Task Manager configuration for better throughput? Sep 29, 2020

mikecote mentioned this issue Oct 13, 2020

Alerting GA #74788

Closed

36 tasks

mikecote self-assigned this Oct 21, 2020

mikecote closed this as completed Oct 26, 2020

kobelb added the needs-team Issues missing a team label label Jan 31, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify default Task Manager configuration for better throughput? #78851

Modify default Task Manager configuration for better throughput? #78851

mikecote commented Sep 29, 2020

elasticmachine commented Sep 29, 2020

mikecote commented Oct 2, 2020

mikecote commented Oct 26, 2020

Modify default Task Manager configuration for better throughput? #78851

Modify default Task Manager configuration for better throughput? #78851

Comments

mikecote commented Sep 29, 2020

elasticmachine commented Sep 29, 2020

mikecote commented Oct 2, 2020

mikecote commented Oct 26, 2020