Skip to content

Refactor DAG file queuing and fix redundant processing#60124

Merged
kaxil merged 1 commit intoapache:mainfrom
astronomer:cleanup_add_files_to_queue
Jan 6, 2026
Merged

Refactor DAG file queuing and fix redundant processing#60124
kaxil merged 1 commit intoapache:mainfrom
astronomer:cleanup_add_files_to_queue

Conversation

@jedcunningham
Copy link
Member

Renamed add_files_to_queue to _add_new_files_to_queue in DagFileProcessorManager to reduce confusion with _add_files_to_queue and better reflect its internal usage for newly discovered files.

We also call the method only after a bundle has refreshed - we can't find new files without that, so doing it in every loop is wasteful.

The method now checks _processors in addition to _file_stats before adding files. This prevents a race condition where files currently being processed (which don't yet have stats) were erroneously re-added to the parsing queue.

The method also now results in the dag_processing.file_path_queue_size gauge being emitted after adding new files to the queue, and reduces log noise by having a single log line vs one per file.

Renamed `add_files_to_queue` to `_add_new_files_to_queue` in
`DagFileProcessorManager` to reduce confusion with `_add_files_to_queue`
and better reflect its internal usage for newly discovered files.

We also call the method only after a bundle has refreshed - we can't
find new files without that, so doing it in every loop is wasteful.

The method now checks `_processors` in addition to `_file_stats` before
adding files. This prevents a race condition where files currently being
processed (which don't yet have stats) were erroneously re-added to the
parsing queue.

The method also now results in the `dag_processing.file_path_queue_size`
gauge being emitted after adding new files to the queue, and reduces log
noise by having a single log line vs one per file.
@kaxil kaxil added this to the Airflow 3.1.6 milestone Jan 6, 2026
@kaxil kaxil merged commit 3cfe4b9 into apache:main Jan 6, 2026
70 checks passed
@kaxil kaxil deleted the cleanup_add_files_to_queue branch January 6, 2026 18:33
chirodip98 pushed a commit to chirodip98/airflow-contrib that referenced this pull request Jan 9, 2026
stegololz pushed a commit to stegololz/airflow that referenced this pull request Jan 9, 2026
ephraimbuddy pushed a commit that referenced this pull request Jan 27, 2026
@ephraimbuddy ephraimbuddy added the type:misc/internal Changelog: Misc changes that should appear in change log label Jan 28, 2026
jhgoebbert pushed a commit to jhgoebbert/airflow_Owen-CH-Leung that referenced this pull request Feb 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:DAG-processing type:misc/internal Changelog: Misc changes that should appear in change log

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants