-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Add test case for compiling multiple graphs #21044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
cc: @zou3519 @youkaichao |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new ignore_torch_compile decorator and adds test cases for compiling multiple graphs. There is a potential issue with the inheritance of the IGNORE_COMPILE_KEY that needs to be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. One minor comment
|
cc @BoyuanFeng |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! a few minor comments
c178193 to
0080859
Compare
|
@houseroad the failing tests don't seem related to this PR, can we merge it? e.g. for model tests (voxtral unrecognized), the test also fails on main |
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
… and compile Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
0080859 to
7ece902
Compare
| # InductorAdaptor (unfortunately) requires AOTAutogradCache | ||
| # to be turned off to run. It will fail to acquire the hash_str | ||
| # and error if not. | ||
| # StandaloneInductorAdaptor (PyTorch 2.8+) fixes this problem. | ||
| stack.enter_context( | ||
| torch._functorch.config.patch(enable_autograd_cache=False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw @houseroad @yeqcharlotte this will fix a lot of the compiler cache issues people report
Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Signed-off-by: qizixi <qizixi@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Signed-off-by: x22x22 <wadeking@qq.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
In some cases, we might want to compile multiple subgraphs, e.g. if some submodules in the model are not traceable with torch dynamo. See #14913 and #19719 for motivating scenarios.
This PR adds a test case showcasing how this can be supported in vLLM for piecewise compilation. This requires several components:
set_model_tag(tag)to ensure compile cache is not shared between these subgraphs (initially added in [torch.compile] reorganize the cache directory to support compiling multiple models #19064 to support compiling multiple models)ignore_torch_compilehelper decorator in scenarios where a class inherits a parent module that hassupport_torch_compiledecorator applied but this class does not want to compile its forward(). Addedtest_ignore_torch_compile_decoratorthat tests this.Also disables AOTAutogradCache in
StandaloneInductoradaptor which is needed for compilation caching to work.StandaloneInductorAdaptorin PyTorch 2.8+ will fix this problem (thanks @zou3519)Test Plan
Test Result
Both newly added unit tests (
test_ignore_torch_compile_decoratorandtest_multi_graph_piecewise_compile) pass