fix(amazon): flush file buffer in S3Hook.download_file() before returning path by pippo995 · Pull Request #62078 · apache/airflow

pippo995 · 2026-02-17T15:23:11Z

Summary

S3Hook.download_file() writes S3 object content to a file via download_fileobj() but never calls flush() before returning the file path
When the caller immediately reads the returned path, the file may contain 0 bytes because data is still in Python's write buffer
Added file.flush() after download_fileobj() to ensure buffered content is written to disk

Details

The original implementation used a with context manager which auto-closes (and flushes) the file. When preserve_file_name support was added, the with was removed and the file is now left open and unflushed.

This particularly affects small files (< ~8KB) that fit entirely in the buffer. The bug is latent in all environments but was exposed by apache-airflow-providers-common-compat==1.13.1 (PR #61157), which changed the execution timing of get_hook_lineage_collector() between download_fileobj() and return file.name.

…ning path S3Hook.download_file() writes S3 object content to a file via download_fileobj() but never flushes the write buffer before returning the file path. When the caller immediately opens the returned path, the file may contain 0 bytes because the data is still in Python's write buffer. This particularly affects small files (< ~8KB) that fit entirely in the buffer, and was exposed by apache-airflow-providers-common-compat 1.13.1 which changed execution timing of get_hook_lineage_collector(). See also: boto/boto3#1304

boring-cyborg · 2026-02-17T15:23:21Z

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
Be sure to read the Airflow Coding style.
Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
Apache Airflow is a community-driven project and together we are making it better 🚀.
In case of doubts contact the developers at:
Mailing List: dev@airflow.apache.org
Slack: https://s.apache.org/airflow-slack

vincbeck · 2026-02-17T17:02:15Z

Tests are failing

The tests for download_file() mocked NamedTemporaryFile to return a PosixPath instead of a file-like object. PosixPath lacks flush() and its .name property returns just the filename, not the full path like a real NamedTemporaryFile. Use the default MagicMock return value which properly supports file-like operations.

pippo995 · 2026-02-17T17:55:46Z

The tests were mocking NamedTemporaryFile to return a PosixPath instead of a file-like object. PosixPath has no .flush() and its .name returns just the filename (not the full path like a real file object). I'm going to change tests

boring-cyborg · 2026-02-17T20:26:53Z

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

pippo995 requested a review from o-nikolas as a code owner February 17, 2026 15:23

boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Feb 17, 2026

vincbeck approved these changes Feb 17, 2026

View reviewed changes

Merge branch 'main' into fix/s3hook-download-file-flush

f1d7128

pippo995 requested a review from vincbeck February 17, 2026 18:05

vincbeck merged commit 78dccfd into apache:main Feb 17, 2026
90 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix(amazon): flush file buffer in S3Hook.download_file() before returning path#62078

fix(amazon): flush file buffer in S3Hook.download_file() before returning path#62078
vincbeck merged 3 commits intoapache:mainfrom
pippo995:fix/s3hook-download-file-flush

pippo995 commented Feb 17, 2026

Uh oh!

boring-cyborg bot commented Feb 17, 2026

Uh oh!

vincbeck commented Feb 17, 2026

Uh oh!

pippo995 commented Feb 17, 2026

Uh oh!

Uh oh!

boring-cyborg bot commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

pippo995 commented Feb 17, 2026

Summary

Details

Uh oh!

boring-cyborg bot commented Feb 17, 2026

Uh oh!

vincbeck commented Feb 17, 2026

Uh oh!

pippo995 commented Feb 17, 2026

Uh oh!

Uh oh!

boring-cyborg bot commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants