Skip to content

Conversation

@jason810496
Copy link
Member

related: #49470, #54813

Why

After Resolve OOM When Reading Large Logs in Webserver #49470 and Add stream method to RemoteIO #54813, we now support memory efficient stream-based read interface (RemoteIO.stream method) when reading TaskInstance Logs, but we still need to implement the stream method for corresponding RemoteIO on provider side to make the whole reading path memory efficient.

What

  • Add stream method on GCSRemoteIO to make TaskInstance Log reading path memory efficient
  • Refactor read method to call stream method instead of duplicating common logic

Verification

I tested the change across the following Airflow versions.

  • 3.2.0 ( main branch )
    • call GCSRemoteIO.stream method
    • command: breeze start-airflow --backend postgres --mount-sources providers-and-tests --use-airflow-version apache/airflow:main
    • apache/airflow:main
  • 3.1.5
    • call GCSRemoteIO.read method
    • command: breeze start-airflow --backend postgres --mount-sources providers-and-tests --use-airflow-version 3.1.5
    • 3.1.5
  • 2.11.0
    • call GCSRemoteIO.read method
    • command: breeze start-airflow --backend postgres --mount-sources providers-and-tests --use-airflow-version 2.11.0
    • 2.11.0
  • Screenshot of Google Cloud Storage
    • GCS Screenshot

@boring-cyborg boring-cyborg bot added area:logging area:providers provider:google Google (including GCP) related issues labels Dec 23, 2025
@jason810496 jason810496 force-pushed the refactor/logging/add-stream-method-for-gcs branch 2 times, most recently from ceb0aa3 to f6a6b6f Compare December 24, 2025 02:22
@jason810496 jason810496 force-pushed the refactor/logging/add-stream-method-for-gcs branch from f6a6b6f to 7600c5d Compare December 24, 2025 07:32
@jason810496 jason810496 marked this pull request as ready for review December 24, 2025 08:36
@jason810496 jason810496 merged commit 12f6fbd into apache:main Dec 26, 2025
87 checks passed
amoghrajesh pushed a commit to astronomer/airflow that referenced this pull request Dec 29, 2025
* Add stream method for GCSRemoteLogIO

* Fix TestGCSTaskHandler, add error handling for read

* Add test_upload

* Open stream outside of _get_log_stream, early return for read if logs is
None

* Add test_write and test_stream_and_read_methods

* Fix mistook import of RawLogStream

* Fix mypy error

Fix missing mock for get_credentials_and_project_id

Fix mypy error

Fix test

* Fix mypy and unit test

Skip 2.11 test

* Fix compat test

* Fix review comment
Subham-KRLX pushed a commit to Subham-KRLX/airflow that referenced this pull request Jan 2, 2026
* Add stream method for GCSRemoteLogIO

* Fix TestGCSTaskHandler, add error handling for read

* Add test_upload

* Open stream outside of _get_log_stream, early return for read if logs is
None

* Add test_write and test_stream_and_read_methods

* Fix mistook import of RawLogStream

* Fix mypy error

Fix missing mock for get_credentials_and_project_id

Fix mypy error

Fix test

* Fix mypy and unit test

Skip 2.11 test

* Fix compat test

* Fix review comment
stegololz pushed a commit to stegololz/airflow that referenced this pull request Jan 9, 2026
* Add stream method for GCSRemoteLogIO

* Fix TestGCSTaskHandler, add error handling for read

* Add test_upload

* Open stream outside of _get_log_stream, early return for read if logs is
None

* Add test_write and test_stream_and_read_methods

* Fix mistook import of RawLogStream

* Fix mypy error

Fix missing mock for get_credentials_and_project_id

Fix mypy error

Fix test

* Fix mypy and unit test

Skip 2.11 test

* Fix compat test

* Fix review comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:logging area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants