Skip to content

Conversation

@wjddn279
Copy link
Contributor

related: #58143

Body

As discussed (not confirmed yet), this resolves the issue of sudden memory usage spikes in worker processes when using LocalExecutor. Memory increases due to unnecessary copying of read-only shared memory through COW caused by gc. By applying gc.freeze and moving existing objects to the permanent generation, we prevent COW from occurring.

When using fork mode, we create many worker processes at once to minimize gc.freeze and unfreeze calls. When using spawn mode, we maintain the existing approach to ensure stability.

Benchmark

memory usage

Comparison of per-process memory usage in LocalExecutor before and after applying this PR. Measured in the same environment running 500 tasks per minute for 12 hours.

AS-IS TO-BE
smem_as-is.txt smem_to-be.txt
AS-IS TO-BE

gc.freeze / unfreeze performance (elapsed time)

We measured the elapsed time of gc.freeze and gc.unfreeze for each scheduler loop iteration. Most operations took microseconds, confirming virtually no impact. The actual operation is a very lightweight process that simply marks objects in the current generation as permanent generation without any memory copying.https://github.com/python/cpython/pull/3705/files

[airflow.jobs.scheduler_job_runner.SchedulerJobRunner] loc=scheduler_job_runner.py:1375
2025-11-15T05:13:36.911715Z [info     ] freeze: 18.83 μs               [airflow.jobs.scheduler_job_runner.SchedulerJobRunner] loc=scheduler_job_runner.py:1366
2025-11-15T05:13:36.912238Z [info     ] freeze_cpu: 10.08 μs           [airflow.jobs.scheduler_job_runner.SchedulerJobRunner] loc=scheduler_job_runner.py:1367
2025-11-15T05:13:36.912517Z [info     ] unfreeze: 10.92 μs             [airflow.jobs.scheduler_job_runner.SchedulerJobRunner] loc=scheduler_job_runner.py:1374
2025-11-15T05:13:36.913268Z [info     ] unfreeze_cpu: 6.87 μs  

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added the area:Executors-core LocalExecutor & SequentialExecutor label Nov 16, 2025
@wjddn279 wjddn279 changed the title Fix local executor issue caused by cow Fix LocalExecutor memory spike by applying gc.freeze Nov 16, 2025
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. but there is the comment on the immediate ramping up of remaining workers, and definitely we need more than one pair of eyes to take a look.

@potiuk potiuk added the backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch label Nov 16, 2025
@potiuk
Copy link
Member

potiuk commented Nov 16, 2025

I also marked it as backportable to 3-1-test. That would be a fantastic bugfix for 3.1.4 Local Executor memory usage.

@jscheffl
Copy link
Contributor

 __________________
< Wow! I like COWs >
 ------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

@wjddn279 wjddn279 force-pushed the fix-local-executor-cow-issue branch from 0728297 to e3188f9 Compare November 18, 2025 11:22
@wjddn279 wjddn279 force-pushed the fix-local-executor-cow-issue branch 3 times, most recently from 9f32807 to 83d1381 Compare November 24, 2025 02:04
@wjddn279 wjddn279 force-pushed the fix-local-executor-cow-issue branch from 435be23 to 88b1153 Compare November 24, 2025 08:56
@wjddn279
Copy link
Contributor Author

@ashb I’ve incorporated the feedback!

@potiuk
Copy link
Member

potiuk commented Dec 1, 2025

@ashb - any comment? I would love to merge this one today in preparation for upcoming 3.1.4

@potiuk potiuk merged commit baec49a into apache:main Dec 2, 2025
64 checks passed
@potiuk
Copy link
Member

potiuk commented Dec 2, 2025

I merged for 3.1.4, We can always iterate in the future and the results of tests sounds very plausible . Thanks @wjddn279 and we can I think work on other memory optimisations after that.

github-actions bot pushed a commit that referenced this pull request Dec 2, 2025
)

* fix local executor issue caused by cow

* fix test

* fix test

* remove gc utils

* fix test to prevent timeout

* fix tests

* fix tests

* fix tests
(cherry picked from commit baec49a)

Co-authored-by: Jeongwoo Do <48639483+wjddn279@users.noreply.github.com>
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Backport successfully created: v3-1-test

Status Branch Result
v3-1-test PR Link

potiuk pushed a commit that referenced this pull request Dec 2, 2025
) (#58934)

* fix local executor issue caused by cow

* fix test

* fix test

* remove gc utils

* fix test to prevent timeout

* fix tests

* fix tests

* fix tests
(cherry picked from commit baec49a)

Co-authored-by: Jeongwoo Do <48639483+wjddn279@users.noreply.github.com>
@potiuk potiuk added this to the Airflow 3.1.4 milestone Dec 2, 2025
@ephraimbuddy ephraimbuddy added the type:bug-fix Changelog: Bug Fixes label Dec 2, 2025
ephraimbuddy pushed a commit that referenced this pull request Dec 3, 2025
) (#58934)

* fix local executor issue caused by cow

* fix test

* fix test

* remove gc utils

* fix test to prevent timeout

* fix tests

* fix tests

* fix tests
(cherry picked from commit baec49a)

Co-authored-by: Jeongwoo Do <48639483+wjddn279@users.noreply.github.com>
RoyLee1224 pushed a commit to RoyLee1224/airflow that referenced this pull request Dec 3, 2025
* fix local executor issue caused by cow

* fix test

* fix test

* remove gc utils

* fix test to prevent timeout

* fix tests

* fix tests

* fix tests
@wjddn279
Copy link
Contributor Author

wjddn279 commented Dec 5, 2025

#protm self nominating
The memory usage improvement achieved by this is at a satisfactory level. #59033 (comment)

itayweb pushed a commit to itayweb/airflow that referenced this pull request Dec 6, 2025
* fix local executor issue caused by cow

* fix test

* fix test

* remove gc utils

* fix test to prevent timeout

* fix tests

* fix tests

* fix tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Executors-core LocalExecutor & SequentialExecutor backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants