Skip to content

create a new index for span_status, dag_version_id in task_instance table #53401

@rawwar

Description

@rawwar

Description

When the TI table is huge(~ 10M records), , _end_spans_of_externally_ended_ops method(Link) takes very long to finish(in our case, about 75 seconds).

This particular sqlalchemy query was taking too long:

tis_should_end: list[TaskInstance] = session.scalars(
            select(TaskInstance).where(TaskInstance.span_status == SpanStatus.SHOULD_END)
        ).all()

After adding the following index, performance improved significantly

create index idx_span_status on task_instance (id, span_status);

Along with above, creating an index on dag_version_id(create index idx_dag_version_id on task_instance (dag_version_id);) also helped to improve DAG Processor performance.

Use case/motivation

Improve scheduler and Dag Processor performance when TI table has lot of records

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions