Add option to disable weakref conversion for last piecewise cudagraph in a module #22282

sarckk · 2025-08-05T22:07:32Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

#21044 shows an example where multiple submodules in a model are compiled instead of the top-level model being compiled.

This has a subtle bug currently. For example, if we have 2 submodules compiled in a model: module A followed by module B, then call to module B will overwrite the outputs returned by module A.

This is because:

Each compilation unit (module A and module B) will independently set self.is_last_graph=True for the last graph in its local compiled submodules
vLLM's piecewise cudagraph backend will convert the output to weakref if self.is_last_graph=True here as it assumes that for the last graph, its output will not be used another other cuda graph. So it converts final output of module A to a weakref.
Once converted to weakref, the output will be immediately released to save memory.
vLLM uses a single global graph pool amongst all its cuda graphs, so it will try to reuse free memory where it can
Subsequent cuda graph replays for module B will reuse memory freed from weakref output of module A.
Value of module A output is overwritten

This PR adds a new argument to the @support_torch_compile decorator, no_weak_ref_output, which can be set to True to disable the weakref conversion for non-last submodules in a model.

Test Plan

Correctness test:

pytest tests/compile/piecewise/test_multiple_graphs.py::test_multi_graph_piecewise_compile_outputs_equal

Test Result

Before this PR, the test fails due to:

piecewise compile + cudagraph not allclose with eager
piecewise compile + cudagraph not bitwise equal to piecewise compile

After this PR, test passes.

github-actions · 2025-08-05T22:07:40Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request effectively addresses a critical bug in the piecewise CUDA graph compilation where intermediate outputs could be prematurely deallocated when multiple submodules are compiled. The introduction of a global last-graph check, distinguished from a local one, is a solid approach to fix this. The changes are well-contained and the added test case correctly verifies the fix. I have one high-severity comment regarding a protocol definition mismatch that should be addressed.

vllm/compilation/base_piecewise_backend.py

ProExpertProg

Minor question otherwise LGTM

vllm/compilation/backends.py

sarckk · 2025-08-22T23:52:22Z

@ProExpertProg the PR is now ready for review (failing CI test is unrelated, due to gateway error from HF hub)

ProExpertProg

Looks good, thank you for providing extensive comments. One note about what happens with the flag when we redecorate.

vllm/compilation/decorators.py

mergify · 2025-08-26T02:43:23Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sarckk.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

ProExpertProg

Not sure I understand the tests

ProExpertProg · 2025-08-26T20:28:58Z

vllm/compilation/counter.py

I think this is technically tracking whole submodels, not all piecewise graphs

For each module (either entire model or submodels of a model), it may or may not return a weakref for the last piecewise graph, so I think number of piecewise graphs w/ weakref output is accurate.

Do you think num_weakref_cudagraph_captured would be clearer since there is compilation_counter.num_cudagraph_captured a few lines below?

vllm/vllm/compilation/cuda_graph.py

Line 175 in 006477e

compilation_counter.num_cudagraph_captured += 1

vllm/compilation/decorators.py

tests/compile/test_decorator.py

sarckk · 2025-08-29T03:38:32Z

@ProExpertProg updated PR to address comments.

also cc @youkaichao if you have any thoughts, i think you initially added this weakref logic

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

mergify · 2025-09-21T23:15:52Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sarckk.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

sarckk force-pushed the fix-weakref-multiple-graphs branch from 34eb180 to db57b76 Compare August 5, 2025 22:08

gemini-code-assist bot reviewed Aug 5, 2025

View reviewed changes

vllm/compilation/base_piecewise_backend.py Outdated Show resolved Hide resolved

sarckk mentioned this pull request Aug 11, 2025

[V1] Enable prefill optimization for Gemma3n #22628

Merged

6 tasks

sarckk force-pushed the fix-weakref-multiple-graphs branch from 50bd0b4 to 8f19d19 Compare August 22, 2025 01:07

sarckk marked this pull request as ready for review August 22, 2025 01:11

sarckk requested review from ProExpertProg, youkaichao and zou3519 as code owners August 22, 2025 01:11

sarckk force-pushed the fix-weakref-multiple-graphs branch from 52123ce to 7d62e0a Compare August 22, 2025 01:29

ProExpertProg reviewed Aug 22, 2025

View reviewed changes

vllm/compilation/backends.py Outdated Show resolved Hide resolved

sarckk requested review from WoosukKwon, alexm-redhat, comaniac, hmellor, houseroad, mgoin, njhill, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and ywang96 as code owners August 22, 2025 22:39

mergify bot added the v1 label Aug 22, 2025

sarckk changed the title ~~Only convert output to weakref for last graph across all compilation units~~ Add option to disable weakref conversion for last piecewise cudagraph in a module Aug 22, 2025

sarckk force-pushed the fix-weakref-multiple-graphs branch 2 times, most recently from 195916a to 0ea4adb Compare August 22, 2025 23:07

sarckk force-pushed the fix-weakref-multiple-graphs branch from da4cb11 to 8532b62 Compare August 25, 2025 18:18

ProExpertProg requested changes Aug 26, 2025

View reviewed changes

vllm/compilation/decorators.py Outdated Show resolved Hide resolved

vllm/compilation/decorators.py Outdated Show resolved Hide resolved

vllm/compilation/decorators.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Aug 26, 2025

sarckk force-pushed the fix-weakref-multiple-graphs branch from 8532b62 to 4c7b9b8 Compare August 26, 2025 19:01

mergify bot removed the needs-rebase label Aug 26, 2025

ProExpertProg reviewed Aug 26, 2025

View reviewed changes

sarckk added 7 commits September 19, 2025 12:49

Only weakref very last graph

2117c45

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

Add no_weak_ref_output to support_torch_compile

ae2237a

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

Fix no_weak_ref_output should default to False

4ee5feb

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

Update comments

ab65d27

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

Fix no_weak_ref_output not working with subclasses

a50c7bc

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

Fix lint

e1aef1d

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

Address comments

0396bff

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

sarckk force-pushed the fix-weakref-multiple-graphs branch from 7f465a5 to 0396bff Compare September 19, 2025 19:49

mergify bot added the needs-rebase label Sep 21, 2025

Uh oh!

Uh oh!

Add option to disable weakref conversion for last piecewise cudagraph in a module #22282

Are you sure you want to change the base?

Add option to disable weakref conversion for last piecewise cudagraph in a module #22282

Conversation

sarckk commented Aug 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Aug 5, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sarckk commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Aug 26, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

sarckk Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sarckk commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Sep 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sarckk commented Aug 5, 2025 •

edited by github-actions bot

Loading

sarckk commented Aug 22, 2025 •

edited

Loading

sarckk Aug 29, 2025 •

edited

Loading

sarckk commented Aug 29, 2025 •

edited

Loading