Skip to content

Conversation

@kaxil
Copy link
Member

@kaxil kaxil commented Oct 16, 2025

The remote logging connection cache was using @lru_cache with the API client instance as a parameter. This caused client references to be retained in the cache indefinitely, preventing garbage collection and causing memory leaks when tasks created multiple client instances.

The new implementation ensures connection details are cached for performance while allowing client instances to be properly garbage collected after use.

In Airflow 3.0.6 various tasks running on Celery failed with OOMs as the memory leaks were significant. After applying changes in this PR, the memory stayed mostly-flat and there were 0 task failures.


Celery Worker with 4 GB memory on Airflow 3.0.6
Image

Celery Worker with 4 GB memory with the changes in this PR**
Image

Part of #56641


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

The remote logging connection cache was using `@lru_cache` with the API
client instance as a parameter. This caused client references to be
retained in the cache indefinitely, preventing garbage collection and
causing memory leaks when tasks created multiple client instances.

The new implementation ensures connection details are cached for
performance while allowing client instances to be properly garbage
collected after use.
@kaxil kaxil added this to the Airflow 3.1.1 milestone Oct 16, 2025
Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good investigation!

@wjddn279
Copy link
Contributor

@kaxil
Thank you for the fix. I’ll apply it along with the previous PR (#56692) and run another round of memory testing, then share the results.

@potiuk potiuk added the backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch label Oct 16, 2025
@potiuk potiuk merged commit 416c73e into apache:main Oct 16, 2025
83 checks passed
@github-actions
Copy link

Backport failed to create: v3-1-test. View the failure log Run details

Status Branch Result
v3-1-test Commit Link

You can attempt to backport this manually by running:

cherry_picker 416c73e v3-1-test

This should apply the commit to the v3-1-test branch and leave the commit in conflict state marking
the files that need manual conflict resolution.

After you have resolved the conflicts, you can continue the backport process by running:

cherry_picker --continue

@potiuk
Copy link
Member

potiuk commented Oct 16, 2025

Looks like some earlier change needs to be cherry-picked first

snreddygopu pushed a commit to Teradata/airflow that referenced this pull request Oct 16, 2025
The remote logging connection cache was using `@lru_cache` with the API
client instance as a parameter. This caused client references to be
retained in the cache indefinitely, preventing garbage collection and
causing memory leaks when tasks created multiple client instances.

The new implementation ensures connection details are cached for
performance while allowing client instances to be properly garbage
collected after use.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 17, 2025
The remote logging connection cache was using `@lru_cache` with the API
client instance as a parameter. This caused client references to be
retained in the cache indefinitely, preventing garbage collection and
causing memory leaks when tasks created multiple client instances.

The new implementation ensures connection details are cached for
performance while allowing client instances to be properly garbage
collected after use.
abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 19, 2025
The remote logging connection cache was using `@lru_cache` with the API
client instance as a parameter. This caused client references to be
retained in the cache indefinitely, preventing garbage collection and
causing memory leaks when tasks created multiple client instances.

The new implementation ensures connection details are cached for
performance while allowing client instances to be properly garbage
collected after use.
kaxil added a commit that referenced this pull request Oct 21, 2025
The remote logging connection cache was using `@lru_cache` with the API
client instance as a parameter. This caused client references to be
retained in the cache indefinitely, preventing garbage collection and
causing memory leaks when tasks created multiple client instances.

The new implementation ensures connection details are cached for
performance while allowing client instances to be properly garbage
collected after use.

(cherry picked from commit 416c73e)
TyrellHaywood pushed a commit to TyrellHaywood/airflow that referenced this pull request Oct 22, 2025
The remote logging connection cache was using `@lru_cache` with the API
client instance as a parameter. This caused client references to be
retained in the cache indefinitely, preventing garbage collection and
causing memory leaks when tasks created multiple client instances.

The new implementation ensures connection details are cached for
performance while allowing client instances to be properly garbage
collected after use.
vatsrahul1001 added a commit to astronomer/airflow that referenced this pull request Nov 4, 2025
potiuk pushed a commit that referenced this pull request Dec 1, 2025
* Revert "Fix memory leak in remote logging connection cache (#56695)"

This reverts commit 416c73e.

* enable e2e ui test to install pnpm if not installed
RoyLee1224 pushed a commit to RoyLee1224/airflow that referenced this pull request Dec 3, 2025
* Revert "Fix memory leak in remote logging connection cache (apache#56695)"

This reverts commit 416c73e.

* enable e2e ui test to install pnpm if not installed
Copilot AI pushed a commit to jason810496/airflow that referenced this pull request Dec 5, 2025
* Revert "Fix memory leak in remote logging connection cache (apache#56695)"

This reverts commit 416c73e.

* enable e2e ui test to install pnpm if not installed
itayweb pushed a commit to itayweb/airflow that referenced this pull request Dec 6, 2025
* Revert "Fix memory leak in remote logging connection cache (apache#56695)"

This reverts commit 416c73e.

* enable e2e ui test to install pnpm if not installed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:task-sdk backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch Memory issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants