-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Open
Copy link
Labels
area:DAG-processingarea:MetaDBMeta Database related issues.Meta Database related issues.area:Schedulerincluding HA (high availability) schedulerincluding HA (high availability) schedulerarea:corearea:performanceneeds-triagelabel for new issues that we didn't triage yetlabel for new issues that we didn't triage yet
Milestone
Description
Description
When the TI table is huge(~ 10M records), , _end_spans_of_externally_ended_ops method(Link) takes very long to finish(in our case, about 75 seconds).
This particular sqlalchemy query was taking too long:
tis_should_end: list[TaskInstance] = session.scalars(
select(TaskInstance).where(TaskInstance.span_status == SpanStatus.SHOULD_END)
).all()
After adding the following index, performance improved significantly
create index idx_span_status on task_instance (id, span_status);
Along with above, creating an index on dag_version_id(create index idx_dag_version_id on task_instance (dag_version_id);) also helped to improve DAG Processor performance.
Use case/motivation
Improve scheduler and Dag Processor performance when TI table has lot of records
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:DAG-processingarea:MetaDBMeta Database related issues.Meta Database related issues.area:Schedulerincluding HA (high availability) schedulerincluding HA (high availability) schedulerarea:corearea:performanceneeds-triagelabel for new issues that we didn't triage yetlabel for new issues that we didn't triage yet