Skip to content

Conversation

@Lucaskabela
Copy link
Contributor

@Lucaskabela Lucaskabela commented Oct 30, 2025

Purpose

As we expand our usage of torch.compile (for instance, in Qwen2_5_vl ViT in #23207), we found the optimal place to compile for performance was around VisionBlock; however, this particular module is instantiated in the Model 32 times;

The current integration attaches the compiled_codes and compilation to the instance as opposed to the class level; we change this abstraction in this PR to the class level in order to avoid these compilations and reduce compilation time

For our example model, we cut the compile time from 18s to 9s

Test Plan

Unit Test

python examples/offline_inference/vision_language.py -m qwen2_5_vl

Test Result

Unit Test

Before

(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:52 [backends.py:618] Using cache directory: /home/lucaskabela/.cache/vllm/torch_compile_cache/607a872fb9/rank_0_0/Qwen2_5_VisionBlock for vLLM's torch.compile
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:52 [backends.py:634] Dynamo bytecode transform time: 0.29 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:53 [backends.py:279] Compiling a graph for dynamic shape takes 0.12 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:53 [monitor.py:34] torch.compile takes 1.02 s in total
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:53 [backends.py:634] Dynamo bytecode transform time: 0.21 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:53 [backends.py:207] Directly load the compiled graph(s) for dynamic shape from the cache, took 0.026 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:53 [monitor.py:34] torch.compile takes 1.25 s in total
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:53 [backends.py:634] Dynamo bytecode transform time: 0.21 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:53 [backends.py:207] Directly load the compiled graph(s) for dynamic shape from the cache, took 0.027 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:30:53 [monitor.py:34] torch.compile takes 1.49 s in total
...
(EngineCore_DP0 pid=2599546) INFO 10-30 13:31:05 [monitor.py:34] torch.compile takes 9.04 s in total
(EngineCore_DP0 pid=2599546) INFO 10-30 13:31:06 [backends.py:634] Dynamo bytecode transform time: 0.21 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:31:06 [backends.py:207] Directly load the compiled graph(s) for dynamic shape from the cache, took 0.028 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:31:06 [monitor.py:34] torch.compile takes 9.28 s in total
che directory: /home/lucaskabela/.cache/vllm/torch_compile_cache/607a872fb9/rank_0_0/backbone for vLLM's torch.compile
(EngineCore_DP0 pid=2599546) INFO 10-30 13:31:12 [backends.py:634] Dynamo bytecode transform time: 5.46 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:31:15 [backends.py:279] Compiling a graph for dynamic shape takes 3.51 s
(EngineCore_DP0 pid=2599546) INFO 10-30 13:31:16 [monitor.py:34] torch.compile takes 18.46 s in total
...
--------------------------------------------------
The image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The tower is surrounded by cherry blossom trees in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are in various stages of bloom, with pink and white petals covering the branches, and the clear blue
--------------------------------------------------
This image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The tower is surrounded by cherry blossom trees in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are in various stages of bloom, with pink and white flowers covering the branches. The sky is clear
--------------------------------------------------
The image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The skytree is surrounded by cherry blossoms, which are in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are pink and appear to be in full bloom, indicating that the image was likely
--------------------------------------------------
The image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The tower is surrounded by cherry blossom trees in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are in various stages of bloom, with pink and white flowers covering the branches. The clear blue sky

After

(EngineCore_DP0 pid=2372353) INFO 10-30 13:25:33 [backends.py:618] Using cache directory: /home/lucaskabela/.cache/vllm/torch_compile_cache/3a7a33d338/rank_0_0/Qwen2_5_VisionBlock for vLLM's torch.compile
(EngineCore_DP0 pid=2372353) INFO 10-30 13:25:33 [backends.py:634] Dynamo bytecode transform time: 0.32 s
(EngineCore_DP0 pid=2372353) INFO 10-30 13:25:34 [backends.py:207] Directly load the compiled graph(s) for dynamic shape from the cache, took 0.031 s
(EngineCore_DP0 pid=2372353) INFO 10-30 13:25:34 [monitor.py:34] torch.compile takes 0.84 s in total
...
(EngineCore_DP0 pid=2372353) INFO 10-30 13:25:40 [backends.py:618] Using cache directory: /home/lucaskabela/.cache/vllm/torch_compile_cache/3a7a33d338/rank_0_0/backbone for vLLM's torch.compile
(EngineCore_DP0 pid=2372353) INFO 10-30 13:25:40 [backends.py:634] Dynamo bytecode transform time: 5.55 s
(EngineCore_DP0 pid=2372353) INFO 10-30 13:25:42 [backends.py:207] Directly load the compiled graph(s) for dynamic shape from the cache, took 1.997 s
(EngineCore_DP0 pid=2372353) INFO 10-30 13:25:42 [monitor.py:34] torch.compile takes 8.57 s in total
...
--------------------------------------------------
The image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The tower is surrounded by cherry blossom trees in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are in various stages of bloom, with pink and white petals covering the branches, and the clear blue
--------------------------------------------------
This image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The tower is surrounded by cherry blossom trees in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are in various stages of bloom, with pink and white flowers covering the branches. The sky is clear
--------------------------------------------------
The image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The skytree is surrounded by cherry blossoms, which are in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are pink and appear to be in full bloom, indicating that the image was likely
--------------------------------------------------
The image depicts the Tokyo Skytree, a tall broadcasting tower located in Sumida, Tokyo, Japan. The tower is surrounded by cherry blossom trees in full bloom, creating a picturesque and vibrant scene. The cherry blossoms are in various stages of bloom, with pink and white flowers covering the branches. The clear blue sky

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
@Lucaskabela Lucaskabela changed the title [Bugfix][Multimodal][Torch Compile] Avoid compiling the same module definition many times [Compile] Avoid compiling the same module definition many times Oct 30, 2025
@Lucaskabela
Copy link
Contributor Author

This is actually correct w.r.t torch.compile - since each visionblock is a different instantiation of the nn.module, we need to dynamo trace them distinctly. Then, if they are identical to previous definitions, we will have a fx cache hit, which is what we observe here when looking at tlparse.

Closing as such

@Lucaskabela Lucaskabela closed this Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant