-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large amount of triggers and executions lead to performance issues #6024
Comments
Hi, As I understand you have almost 2k executions created by minute, which is a lot, but still something Kestra should be able to handle. Do you know how many tasks you have by executions? All services are started once in a big node, and you have a big database, regarding the database spec it should be enough so I need to look at the query performances. I also see that there is queries on the executions_queued table so you should have one flow with concurrency limit, is it the one triggered by webhook? |
4 of the slow queries, the ones with an execution time of more than 1s, are from dashboards (home, flow, execution), if you have a lot of executions it is normal that those queries are slow, there is today only one possibility is to purge more often the executions inside the database. Reducing the dashboard period would also makes things better but we didn't offer a global settings for that yet. The first query is to pop a queued execution from the list of queued executions, we use a lock to be able to select then delete the record from the database. I checked and it correctly use the index. Anyway, adding a database with more resources would also help as it seems the database is not able to cope with the high number of executions you are creating. |
Each flow has only 2 tasks: 1 to log which topic, and another to produce a kafka message
All 4 of my flows had concurrency limit set in the screenshot above.
I set the concurrency limit on each of the 4 flows to about 125 (500 total)
So is the bottleneck here only the database? And to simply increase the specs? I'd imagine |
Can you try to create an index to see if it provides any improvements? create index execution_queued__flow_date on execution_queued(tenant_id, namespace, flow_id, "date");
It depends how many executions you keep in the database, we have nice dashboards that shows execution overview for the last 30 days, those can consume a lot of database resource if they are displayed frequently. That's why I talked about purging executions.
I would not expect that! |
By default, it will use 4 times the number of CPU cores, which is a sane default. If you see low CPU utilization during load, you can increase the number of threads but the default configuration should be a good compromise. |
Anyway, as you validate that concurrency limit is what caused the issue it help us to find some performance improvements in this area! Thanks for your detailed feedback, it help us ;) |
Describe the issue
When a large amount of triggers/executions occur the application starts to slow down immensely.
Executions are created, but no tasks actually start even when it's in the
Running
state.And If tasks start they take a very long time to actually execute.
If tried setting up concurrency. With that I just get a huge backlog of executions to process.
This is a screenshot showing the current situation:
Examples showing execution created hours ago and it currently in the
Running
state yet no tasks are created:My current Setup for the triggers include:
-- 1 constant load (200+ per minute)
-- 1 hourly bulk load (30K+ per hour)
-- 1 minimal load (50+ per hour)
-- Constant load (Roughly 1500+ per minute)
Using
pg_stat_statements
postgresql extension with the following query:I was able to identify the top slowest queries which you can find in this google sheet linked HERE
Screenshot for quick glance:
Just in case this is needed, I've included the database details:
Environment
I'm using AWS:
The text was updated successfully, but these errors were encountered: