Skip to content

Scheduler still runs span cleanup queries when tracing is disabled #53405

@pankajkoti

Description

@pankajkoti

Apache Airflow version

main (development)

If "Other Airflow 2 version" selected, which one?

No response

What happened?

While further looking at #53401, I am trying to understand more the spans and traces config. It came to observance that span queries are still happening although traces is disabled.

The Airflow scheduler continues to:
update span_status on dag_run and task_instance, and
execute, every scheduling loop, a query that scans those tables for

WHERE span_status = 'should_end'

This adds an unnecessary SELECT … every loop; on large task_instance tables, it showed up in pg_stat_activity with noticeably slow queries until we added an index on span_status

Impact:

  • Unnecessary DB queries during Scheduler operations
  • Performance degradation in environments where tracing is not used
  • Missed heartbeats due to blocking/slow queries

What you think should happen instead?

I was checking the latest main branch and noticed, for example, that the call to the _end_spans_of_externally_ended_ops() method doesn’t appear to check whether tracing is enabled in the config. From what I can see, the Scheduler calls this method unconditionally.

There may be other similar cases, but in general, it would be good to ensure that no span-related logic or queries are executed when tracing is disabled (which is the default setting).

How to reproduce

  1. Start up an Airflow deployment with the default configuration (i.e. tracing disabled).
  2. Observe the database activity — span-related queries are executed during the Scheduler operation, even though tracing is disabled
  3. If the task_instance table contains a large number of records, these span-related queries can become slow and degrade performance (until this issue is addressed).
  4. These queries can be easily identified in the query logs or by profiling the database.
  5. Adding info logs to the _end_spans_of_externally_ended_ops() method would further confirm that it’s being invoked unconditionally (I haven’t tried this yet, but it’s straightforward to trace through the code and verify the call path).

Operating System

Linux

Versions of Apache Airflow Providers

No response

Deployment

Astronomer

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions