Skip to content

Conversation

@sarckk
Copy link
Collaborator

@sarckk sarckk commented Aug 5, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

#21044 shows an example where multiple submodules in a model are compiled instead of the top-level model being compiled.

This has a subtle bug currently. For example, if we have 2 submodules compiled in a model: module A followed by module B, then call to module B will overwrite the outputs returned by module A.

This is because:

  • Each compilation unit (module A and module B) will independently set self.is_last_graph=True for the last graph in its local compiled submodules
  • vLLM's piecewise cudagraph backend will convert the output to weakref if self.is_last_graph=True here as it assumes that for the last graph, its output will not be used another other cuda graph. So it converts final output of module A to a weakref.
  • Once converted to weakref, the output will be immediately released to save memory.
  • vLLM uses a single global graph pool amongst all its cuda graphs, so it will try to reuse free memory where it can
  • Subsequent cuda graph replays for module B will reuse memory freed from weakref output of module A.
  • Value of module A output is overwritten

This PR adds a new argument to the @support_torch_compile decorator, no_weak_ref_output, which can be set to True to disable the weakref conversion for non-last submodules in a model.

Test Plan

Correctness test:

pytest tests/compile/piecewise/test_multiple_graphs.py::test_multi_graph_piecewise_compile_outputs_equal

Test Result

Before this PR, the test fails due to:

  • piecewise compile + cudagraph not allclose with eager
  • piecewise compile + cudagraph not bitwise equal to piecewise compile

After this PR, test passes.

@github-actions
Copy link

github-actions bot commented Aug 5, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@sarckk sarckk force-pushed the fix-weakref-multiple-graphs branch from 34eb180 to db57b76 Compare August 5, 2025 22:08
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a critical bug in the piecewise CUDA graph compilation where intermediate outputs could be prematurely deallocated when multiple submodules are compiled. The introduction of a global last-graph check, distinguished from a local one, is a solid approach to fix this. The changes are well-contained and the added test case correctly verifies the fix. I have one high-severity comment regarding a protocol definition mismatch that should be addressed.

@sarckk sarckk force-pushed the fix-weakref-multiple-graphs branch from 50bd0b4 to 8f19d19 Compare August 22, 2025 01:07
@sarckk sarckk marked this pull request as ready for review August 22, 2025 01:11
@sarckk sarckk force-pushed the fix-weakref-multiple-graphs branch from 52123ce to 7d62e0a Compare August 22, 2025 01:29
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor question otherwise LGTM

@mergify mergify bot added the v1 label Aug 22, 2025
@sarckk sarckk changed the title Only convert output to weakref for last graph across all compilation units Add option to disable weakref conversion for last piecewise cudagraph in a module Aug 22, 2025
@sarckk sarckk force-pushed the fix-weakref-multiple-graphs branch 2 times, most recently from 195916a to 0ea4adb Compare August 22, 2025 23:07
@sarckk
Copy link
Collaborator Author

sarckk commented Aug 22, 2025

@ProExpertProg the PR is now ready for review (failing CI test is unrelated, due to gateway error from HF hub)

@sarckk sarckk force-pushed the fix-weakref-multiple-graphs branch from da4cb11 to 8532b62 Compare August 25, 2025 18:18
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you for providing extensive comments. One note about what happens with the flag when we redecorate.

@mergify
Copy link

mergify bot commented Aug 26, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sarckk.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Aug 26, 2025
@sarckk sarckk force-pushed the fix-weakref-multiple-graphs branch from 8532b62 to 4c7b9b8 Compare August 26, 2025 19:01
@mergify mergify bot removed the needs-rebase label Aug 26, 2025
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the tests

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is technically tracking whole submodels, not all piecewise graphs

Copy link
Collaborator Author

@sarckk sarckk Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each module (either entire model or submodels of a model), it may or may not return a weakref for the last piecewise graph, so I think number of piecewise graphs w/ weakref output is accurate.

Do you think num_weakref_cudagraph_captured would be clearer since there is compilation_counter.num_cudagraph_captured a few lines below?

compilation_counter.num_cudagraph_captured += 1

@sarckk
Copy link
Collaborator Author

sarckk commented Aug 29, 2025

@ProExpertProg updated PR to address comments.

also cc @youkaichao if you have any thoughts, i think you initially added this weakref logic

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
@sarckk sarckk force-pushed the fix-weakref-multiple-graphs branch from 7f465a5 to 0396bff Compare September 19, 2025 19:49
@mergify
Copy link

mergify bot commented Sep 21, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sarckk.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants