fix(amazon): flush file buffer in S3Hook.download_file() before returning path#62078
Merged
vincbeck merged 3 commits intoapache:mainfrom Feb 17, 2026
Merged
Conversation
…ning path S3Hook.download_file() writes S3 object content to a file via download_fileobj() but never flushes the write buffer before returning the file path. When the caller immediately opens the returned path, the file may contain 0 bytes because the data is still in Python's write buffer. This particularly affects small files (< ~8KB) that fit entirely in the buffer, and was exposed by apache-airflow-providers-common-compat 1.13.1 which changed execution timing of get_hook_lineage_collector(). See also: boto/boto3#1304
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
vincbeck
approved these changes
Feb 17, 2026
Contributor
|
Tests are failing |
The tests for download_file() mocked NamedTemporaryFile to return a PosixPath instead of a file-like object. PosixPath lacks flush() and its .name property returns just the filename, not the full path like a real NamedTemporaryFile. Use the default MagicMock return value which properly supports file-like operations.
Contributor
Author
|
The tests were mocking NamedTemporaryFile to return a PosixPath instead of a file-like object. PosixPath has no .flush() and its .name returns just the filename (not the full path like a real file object). I'm going to change tests |
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
S3Hook.download_file()writes S3 object content to a file viadownload_fileobj()but never callsflush()before returning the file pathfile.flush()afterdownload_fileobj()to ensure buffered content is written to diskDetails
The original implementation used a
withcontext manager which auto-closes (and flushes) the file. Whenpreserve_file_namesupport was added, thewithwas removed and the file is now left open and unflushed.This particularly affects small files (< ~8KB) that fit entirely in the buffer. The bug is latent in all environments but was exposed by
apache-airflow-providers-common-compat==1.13.1(PR #61157), which changed the execution timing ofget_hook_lineage_collector()betweendownload_fileobj()andreturn file.name.