[torch.compile] A simple solution to recursively compile loaded model: using phi3-small as an example #8398

wschin · 2024-09-12T04:08:39Z

It's often hard to apply torch.compile directly to a model because many limitations. However, there are only a few key nn.Modules we really need to compile to get max speed. This PR presents a short solution to solve the problem. Phi3-small got 15% latency improvement with this.

This PR contains 3 major functions:

register_module_to_compile: it registers which torch.nn.Module to compile when scanning child modules from top to bottom (e.g., from Phi3 to MLP, layernorm).
compile_child_modules: it is called after model is loaded to visit child nn.Module's from top (e.g., GPT2) to bottom (e.g., MLP). If a nn.Module is compiled, its child nn.Module's won't be compiled again.
unregister_module_to_compile: it undoes register_module_to_compile.
To enable this feature set COMPILE_CHILD_NN_MODULES=1.

github-actions · 2024-09-12T04:08:54Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

vllm/model_executor/model_loader/loader.py

format

…ile-phi

wschin · 2024-09-25T01:53:36Z

@youkaichao, did you get a chance to review?
@comaniac, it'd nice if you can help reviewing. At least, let me know if I should continue writing test. This solution is extremely simple and effective.

Thank you.

Simple infra to recursively torch.compile nn.Module's

c0d30e9

wschin changed the title ~~[torch.compile] A simple recursive solution to recursively compile loaded model: using phi3-small as an example~~ [torch.compile] A simple recursive to recursively compile loaded model: using phi3-small as an example Sep 12, 2024

wschin changed the title ~~[torch.compile] A simple recursive to recursively compile loaded model: using phi3-small as an example~~ [torch.compile] A simple solution to recursively compile loaded model: using phi3-small as an example Sep 12, 2024

wschin commented Sep 12, 2024

View reviewed changes

vllm/model_executor/model_loader/loader.py Outdated Show resolved Hide resolved

wschin commented Sep 12, 2024

View reviewed changes

vllm/model_executor/model_loader/loader.py Outdated Show resolved Hide resolved

mgoin requested a review from youkaichao September 12, 2024 15:34

wschin commented Sep 12, 2024

View reviewed changes

vllm/model_executor/model_loader/loader.py Outdated Show resolved Hide resolved

wschin mentioned this pull request Sep 13, 2024

[RFC]: A Graph Optimization System in vLLM using torch.compile #6378

Open

wschin commented Sep 13, 2024

View reviewed changes

vllm/model_executor/model_loader/loader.py Outdated Show resolved Hide resolved

wschin added 2 commits September 24, 2024 18:08

Refine infra

2b7f382

format

Merge branch 'main' of https://github.com/vllm-project/vllm into comp…

716ca36

…ile-phi

wschin force-pushed the compile-phi branch from 974b420 to 716ca36 Compare September 25, 2024 01:31

Rename control env var

50d0647

wschin mentioned this pull request Oct 1, 2024

[issue tracker] make vllm compatible with dynamo #8821

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torch.compile] A simple solution to recursively compile loaded model: using phi3-small as an example #8398

[torch.compile] A simple solution to recursively compile loaded model: using phi3-small as an example #8398

wschin commented Sep 12, 2024 •

edited

Loading

github-actions bot commented Sep 12, 2024

wschin commented Sep 25, 2024 •

edited

Loading

[torch.compile] A simple solution to recursively compile loaded model: using phi3-small as an example #8398

Are you sure you want to change the base?

[torch.compile] A simple solution to recursively compile loaded model: using phi3-small as an example #8398

Conversation

wschin commented Sep 12, 2024 • edited Loading

github-actions bot commented Sep 12, 2024

wschin commented Sep 25, 2024 • edited Loading

wschin commented Sep 12, 2024 •

edited

Loading

wschin commented Sep 25, 2024 •

edited

Loading