[Compile] Avoid compiling the same module definition many times #27833

Lucaskabela · 2025-10-30T20:36:18Z

Purpose

As we expand our usage of torch.compile (for instance, in Qwen2_5_vl ViT in #23207), we found the optimal place to compile for performance was around VisionBlock; however, this particular module is instantiated in the Model 32 times;

The current integration attaches the compiled_codes and compilation to the instance as opposed to the class level; we change this abstraction in this PR to the class level in order to avoid these compilations and reduce compilation time

For our example model, we cut the compile time from 18s to 9s

Test Plan

Unit Test

python examples/offline_inference/vision_language.py -m qwen2_5_vl

Test Result

Unit Test

Before

(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:52 [backends.py:618] Using cache directory: /home/lucaskabela/.cache/vllm/torch_compile_cache/607a872fb9/rank_0_0/Qwen2_5_VisionBlock for vLLM's torch.compile
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:52 [backends.py:634] Dynamo bytecode transform time: 0.29 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:53 [backends.py:279] Compiling a graph for dynamic shape takes 0.12 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:53 [monitor.py:34] torch.compile takes 1.02 s in total
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:53 [backends.py:634] Dynamo bytecode transform time: 0.21 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:53 [backends.py:207] Directly load the compiled graph(s) for dynamic shape from the cache, took 0.026 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:53 [monitor.py:34] torch.compile takes 1.25 s in total
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:53 [backends.py:634] Dynamo bytecode transform time: 0.21 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:53 [backends.py:207] Directly load the compiled graph(s) for dynamic shape from the cache, took 0.027 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:53 [monitor.py:34] torch.compile takes 1.49 s in total
...
(EngineCore_DP0 pid=2599546) INFO 10-30 13:31:05 [monitor.py:34] torch.compile takes 9.04 s in total
(EngineCore_DP0 pid=2599546) INFO 10-30 13:31:06 [backends.py:634] Dynamo bytecode transform time: 0.21 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:31:06 [backends.py:207] Directly load the compiled graph(s) for dynamic shape from the cache, took 0.028 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:31:06 [monitor.py:34] torch.compile takes 9.28 s in total
che directory: /home/lucaskabela/.cache/vllm/torch_compile_cache/607a872fb9/rank_0_0/backbone for vLLM's torch.compile
(EngineCore_DP0 pid=2599546) INFO 10-30 13:31:12 [backends.py:634] Dynamo bytecode transform time: 5.46 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:31:15 [backends.py:279] Compiling a graph for dynamic shape takes 3.51 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:31:16 [monitor.py:34] torch.compile takes 18.46 s in total
...
--------------------------------------------------
The image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The tower is surrounded by cherry blossom trees in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are in various stages of bloom, with pink and white petals covering the branches, and the clear blue
--------------------------------------------------
This image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The tower is surrounded by cherry blossom trees in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are in various stages of bloom, with pink and white flowers covering the branches. The sky is clear
--------------------------------------------------
The image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The skytree is surrounded by cherry blossoms, which are in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are pink and appear to be in full bloom, indicating that the image was likely
--------------------------------------------------
The image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The tower is surrounded by cherry blossom trees in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are in various stages of bloom, with pink and white flowers covering the branches. The clear blue sky

After

(EngineCore_DP0 pid=2372353) INFO 10-30 13:25:33 [backends.py:618] Using cache directory: /home/lucaskabela/.cache/vllm/torch_compile_cache/3a7a33d338/rank_0_0/Qwen2_5_VisionBlock for vLLM's torch.compile
(EngineCore_DP0 pid=2372353) INFO 10-30 13:25:33 [backends.py:634] Dynamo bytecode transform time: 0.32 s
(EngineCore_DP0 pid=2372353) INFO 10-30 13:25:34 [backends.py:207] Directly load the compiled graph(s) for dynamic shape from the cache, took 0.031 s
(EngineCore_DP0 pid=2372353) INFO 10-30 13:25:34 [monitor.py:34] torch.compile takes 0.84 s in total
...
(EngineCore_DP0 pid=2372353) INFO 10-30 13:25:40 [backends.py:618] Using cache directory: /home/lucaskabela/.cache/vllm/torch_compile_cache/3a7a33d338/rank_0_0/backbone for vLLM's torch.compile
(EngineCore_DP0 pid=2372353) INFO 10-30 13:25:40 [backends.py:634] Dynamo bytecode transform time: 5.55 s
(EngineCore_DP0 pid=2372353) INFO 10-30 13:25:42 [backends.py:207] Directly load the compiled graph(s) for dynamic shape from the cache, took 1.997 s
(EngineCore_DP0 pid=2372353) INFO 10-30 13:25:42 [monitor.py:34] torch.compile takes 8.57 s in total
...
--------------------------------------------------
The image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The tower is surrounded by cherry blossom trees in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are in various stages of bloom, with pink and white petals covering the branches, and the clear blue
--------------------------------------------------
This image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The tower is surrounded by cherry blossom trees in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are in various stages of bloom, with pink and white flowers covering the branches. The sky is clear
--------------------------------------------------
The image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The skytree is surrounded by cherry blossoms, which are in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are pink and appear to be in full bloom, indicating that the image was likely
--------------------------------------------------
The image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The tower is surrounded by cherry blossom trees in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are in various stages of bloom, with pink and white flowers covering the branches. The clear blue sky

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Lucas Kabela <lucaskabela@meta.com>

Lucaskabela · 2025-11-03T23:08:01Z

This is actually correct w.r.t torch.compile - since each visionblock is a different instantiation of the nn.module, we need to dynamo trace them distinctly. Then, if they are identical to previous definitions, we will have a fx cache hit, which is what we observe here when looking at tlparse.

Closing as such

Use self.class instead

fa02425

Signed-off-by: Lucas Kabela <lucaskabela@meta.com>

Lucaskabela changed the title ~~[Bugfix][Multimodal][Torch Compile] Avoid compiling the same module definition many times~~ [Compile] Avoid compiling the same module definition many times Oct 30, 2025

Lucaskabela closed this Nov 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Compile] Avoid compiling the same module definition many times #27833

[Compile] Avoid compiling the same module definition many times #27833

Uh oh!

Lucaskabela commented Oct 30, 2025 •

edited by github-actions bot

Loading

Uh oh!

Lucaskabela commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[Compile] Avoid compiling the same module definition many times #27833

[Compile] Avoid compiling the same module definition many times #27833

Uh oh!

Conversation

Lucaskabela commented Oct 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Unit Test

Test Result

Unit Test

Uh oh!

Lucaskabela commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Lucaskabela commented Oct 30, 2025 •

edited by github-actions bot

Loading