Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
6df3516
feat: preserve user splitting_ops in inductor graph partition
baonudesifeizhai Sep 27, 2025
454c7e6
Merge branch 'vllm-project:main' into feature/dynamic-inductor-partit…
baonudesifeizhai Sep 28, 2025
08bdb8e
debugZ
baonudesifeizhai Sep 28, 2025
0dc121a
change torch version
baonudesifeizhai Sep 28, 2025
37a07a6
change torch version
baonudesifeizhai Sep 28, 2025
32127e5
change torch
baonudesifeizhai Sep 28, 2025
6349450
update
baonudesifeizhai Sep 28, 2025
8c31eda
change
baonudesifeizhai Sep 28, 2025
02dc0f2
debug
baonudesifeizhai Sep 28, 2025
6b6e24d
debug
baonudesifeizhai Sep 28, 2025
e89fc82
debug
baonudesifeizhai Sep 28, 2025
04244bd
feat: complete dynamic partition rules implementation
baonudesifeizhai Sep 28, 2025
10458e0
debug
baonudesifeizhai Sep 28, 2025
0d6821c
change
baonudesifeizhai Sep 28, 2025
195ca5e
fix ruff and yapf
baonudesifeizhai Sep 28, 2025
aef7f80
change test
baonudesifeizhai Sep 28, 2025
5a23f4f
fix test
baonudesifeizhai Sep 28, 2025
5901e88
fix user
baonudesifeizhai Sep 28, 2025
d5f6b7c
fix ruff and yapf
baonudesifeizhai Sep 28, 2025
7e780f7
fix: apply isort formatting to compilation.py
baonudesifeizhai Sep 29, 2025
dc29d19
Merge branch 'vllm-project:main' into feature/dynamic-inductor-partit…
baonudesifeizhai Sep 30, 2025
0582321
debuging
baonudesifeizhai Sep 30, 2025
81f81c7
debug
baonudesifeizhai Sep 30, 2025
1274892
change version of pytorch
baonudesifeizhai Oct 1, 2025
37c7faf
fix version problem
baonudesifeizhai Oct 1, 2025
90e180e
back to origin verision
baonudesifeizhai Oct 1, 2025
e8429b5
test version
baonudesifeizhai Oct 1, 2025
f386d03
change torch version >=2.9 only for debuging
baonudesifeizhai Oct 1, 2025
2ed3d36
fix for yapf
baonudesifeizhai Oct 1, 2025
b5914d2
fix for Inductor
baonudesifeizhai Oct 1, 2025
25ce9b2
fix
baonudesifeizhai Oct 1, 2025
d58853c
fix for test
baonudesifeizhai Oct 1, 2025
077a930
fix
baonudesifeizhai Oct 1, 2025
0a39109
fix for mypy error
baonudesifeizhai Oct 1, 2025
7c2bf6a
Merge branch 'vllm-project:main' into feature/dynamic-inductor-partit…
baonudesifeizhai Oct 2, 2025
f3e2769
Merge branch 'vllm-project:main' into feature/dynamic-inductor-partit…
baonudesifeizhai Oct 2, 2025
7c28b02
try to fix torch_bingings.cpp
baonudesifeizhai Oct 2, 2025
7af2294
fix
baonudesifeizhai Oct 2, 2025
aedd467
Merge branch 'vllm-project:main' into feature/dynamic-inductor-partit…
baonudesifeizhai Oct 2, 2025
b1b1195
reverse torch version
baonudesifeizhai Oct 2, 2025
cd0e2da
Merge branch 'vllm-project:main' into feature/dynamic-inductor-partit…
baonudesifeizhai Oct 3, 2025
e27f250
Merge branch 'vllm-project:main' into feature/dynamic-inductor-partit…
baonudesifeizhai Oct 3, 2025
2c964a4
Merge branch 'main' into feature/dynamic-inductor-partition-rules
baonudesifeizhai Oct 3, 2025
89306f3
fix for reivewer suggestion
baonudesifeizhai Oct 3, 2025
17b519b
fix for test_config.py
baonudesifeizhai Oct 3, 2025
72e005d
fix test_config.py
baonudesifeizhai Oct 3, 2025
94a77d8
fix and add second splits_ops
baonudesifeizhai Oct 3, 2025
33c80c3
change for _resolve_operator_overload
baonudesifeizhai Oct 3, 2025
09649b2
change unique_names from debug to info
baonudesifeizhai Oct 3, 2025
ec0abfe
debuging
baonudesifeizhai Oct 3, 2025
c409751
back to ecoabf and debuging
baonudesifeizhai Oct 5, 2025
dba8684
fix error
baonudesifeizhai Oct 5, 2025
efd2982
extract compile_context to a function
baonudesifeizhai Oct 5, 2025
338e683
fix for spllit_ops
baonudesifeizhai Oct 5, 2025
3ba2d3b
fix mypy error
baonudesifeizhai Oct 5, 2025
732a5ff
fix test_config.py
baonudesifeizhai Oct 5, 2025
e6f1bb2
Merge branch 'main' into feature/dynamic-inductor-partition-rules
baonudesifeizhai Oct 5, 2025
b366e05
fix test_config.py
baonudesifeizhai Oct 5, 2025
cbb78a4
Merge branch 'feature/dynamic-inductor-partition-rules' of https://gi…
baonudesifeizhai Oct 5, 2025
0af0847
fix test_config.py
baonudesifeizhai Oct 5, 2025
f3215c9
fix for compliation.py
baonudesifeizhai Oct 5, 2025
de0af67
Merge main and apply ruff formatting
baonudesifeizhai Oct 5, 2025
0276acf
fix __closure__=None problem
baonudesifeizhai Oct 5, 2025
23285b7
tmp fix for try
baonudesifeizhai Oct 6, 2025
d9abb8b
fix fror __parse_operator_name error
baonudesifeizhai Oct 6, 2025
acbf7d7
add format change from vllm to pytorch
baonudesifeizhai Oct 6, 2025
3a1f153
try to use look_up in torch
baonudesifeizhai Oct 7, 2025
b3ba60d
add clear() in paritution_rules.py
baonudesifeizhai Oct 7, 2025
eb16f4a
rename 1) partition_ops -> inductor_partition_ops, AND 2) split_ops -…
baonudesifeizhai Oct 7, 2025
303f962
documentation to the docstring for splitting_ops in compilation confi…
baonudesifeizhai Oct 7, 2025
d86184e
change is_torch_equal_or_newer in test_config
baonudesifeizhai Oct 7, 2025
3e6f62b
change torch check back and add compile_context
baonudesifeizhai Oct 7, 2025
f19d78f
change return self.splitting_ops is not None and all to any
baonudesifeizhai Oct 7, 2025
eb8440d
Merge branch 'main' into feature/dynamic-inductor-partition-rules
baonudesifeizhai Oct 7, 2025
410a551
change back num_cudagraph_captured=13
baonudesifeizhai Oct 7, 2025
a290b50
change pynvml.py format
baonudesifeizhai Oct 7, 2025
d4aa8cc
Revert pynvml.py formatting changes
baonudesifeizhai Oct 7, 2025
6cd4081
add .__closure__ check for torch 2.5
baonudesifeizhai Oct 7, 2025
8316aa7
sovle test_config.py
baonudesifeizhai Oct 7, 2025
6b4696b
change test_config.py for conflict
baonudesifeizhai Oct 7, 2025
1e7f143
Merge branch 'main' into feature/dynamic-inductor-partition-rules
baonudesifeizhai Oct 7, 2025
9110825
remove special syntax.
baonudesifeizhai Oct 7, 2025
7bdc597
resovle change except any and all
baonudesifeizhai Oct 7, 2025
2d69e4c
change for all to test
baonudesifeizhai Oct 7, 2025
d78b4fc
change back for any()
baonudesifeizhai Oct 7, 2025
151a4d1
change back set_splitting_ops_for_attn_fusion(self): to old one
baonudesifeizhai Oct 7, 2025
f730709
fix for docs/readthedocs.org:vllm and add deBrief docstring describi…
baonudesifeizhai Oct 8, 2025
bd2afd2
Merge branch 'main' into feature/dynamic-inductor-partition-rules
baonudesifeizhai Oct 8, 2025
69f7b10
Update vllm/config/compilation.py
baonudesifeizhai Oct 8, 2025
6f1d397
Update vllm/config/compilation.py
baonudesifeizhai Oct 8, 2025
4422331
Merge branch 'vllm-project:main' into feature/dynamic-inductor-partit…
baonudesifeizhai Oct 8, 2025
4820e92
Merge branch 'main' into feature/dynamic-inductor-partition-rules
baonudesifeizhai Oct 9, 2025
d319d8c
Merge branch 'main' into feature/dynamic-inductor-partition-rules
baonudesifeizhai Oct 9, 2025
c0616f0
fix _attention_ops
baonudesifeizhai Oct 9, 2025
f64593c
change . type to ::type
baonudesifeizhai Oct 9, 2025
0806bec
Merge branch 'main' into feature/dynamic-inductor-partition-rules
baonudesifeizhai Oct 9, 2025
e0a25ca
add connvert to dot format for pytorch
baonudesifeizhai Oct 9, 2025
d15fee7
Merge branch 'feature/dynamic-inductor-partition-rules' of https://gi…
baonudesifeizhai Oct 9, 2025
5c77599
Merge branch 'main' into feature/dynamic-inductor-partition-rules
baonudesifeizhai Oct 9, 2025
6f0bbb9
Merge branch 'main' into feature/dynamic-inductor-partition-rules
baonudesifeizhai Oct 9, 2025
ebbe001
change for resolve the op overload and compare to node.target
baonudesifeizhai Oct 9, 2025
0eaa2af
add _resolve_operators_safely function to filter ops for tmp
baonudesifeizhai Oct 9, 2025
939419e
fix import erorr for lookup_op
baonudesifeizhai Oct 9, 2025
d959b8a
Merge branch 'main' into feature/dynamic-inductor-partition-rules
baonudesifeizhai Oct 9, 2025
ec4ac36
changed
baonudesifeizhai Oct 9, 2025
14b6521
Merge branch 'feature/dynamic-inductor-partition-rules' of https://gi…
baonudesifeizhai Oct 9, 2025
56ae27d
Merge branch 'main' into feature/dynamic-inductor-partition-rules
baonudesifeizhai Oct 10, 2025
7708a11
Merge branch 'main' into feature/dynamic-inductor-partition-rules
baonudesifeizhai Oct 10, 2025
d9a400e
change for test_config.py remove unused funciton
baonudesifeizhai Oct 10, 2025
bbd1bbd
node.target can be OpOverloadPacket, need to check .default
baonudesifeizhai Oct 10, 2025
ad4419e
Update tests/compile/piecewise/test_simple.py
baonudesifeizhai Oct 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions tests/compile/piecewise/test_multiple_graphs.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ def test_multi_graph_piecewise_compile_outputs_equal():
compilation_config=CompilationConfig(
level=CompilationLevel.PIECEWISE,
use_cudagraph=True,
splitting_ops=["silly.attention"],
splitting_ops=["silly::attention"],
cudagraph_capture_sizes=[1, 2],
)
)
Expand Down Expand Up @@ -267,7 +267,7 @@ def test_multi_graph_piecewise_compile_outputs_equal():
compilation_config=CompilationConfig(
level=CompilationLevel.PIECEWISE,
use_cudagraph=False,
splitting_ops=["silly.attention"],
splitting_ops=["silly::attention"],
)
)
cudagraph_runtime_mode = CUDAGraphMode.PIECEWISE
Expand Down
4 changes: 2 additions & 2 deletions tests/compile/piecewise/test_simple.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ def _run_simple_model(
@torch.inference_mode()
def test_simple_piecewise_compile(use_inductor):
_run_simple_model(
splitting_ops=["silly.attention"],
splitting_ops=["silly::attention"],
use_inductor_graph_partition=False,
use_inductor=use_inductor,
# 2 * num_layers + 1
Expand All @@ -142,7 +142,7 @@ def test_simple_piecewise_compile(use_inductor):


@torch.inference_mode()
@pytest.mark.parametrize("splitting_ops", [["silly.attention"], []])
@pytest.mark.parametrize("splitting_ops", [["silly::attention"], []])
def test_simple_inductor_graph_partition(splitting_ops, monkeypatch):
if not is_torch_equal_or_newer("2.9.0.dev"):
pytest.skip("inductor graph partition is only available in PyTorch 2.9+")
Expand Down
4 changes: 2 additions & 2 deletions tests/compile/piecewise/test_toy_llama.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ def run_model(
cudagraph_capture_sizes=[1, 2],
)
if split_attn:
compilation_config.splitting_ops = ["silly.attention"]
compilation_config.splitting_ops = ["silly::attention"]
cudagraph_runtime_mode = CUDAGraphMode.PIECEWISE
else:
compilation_config = CompilationConfig(
Expand Down Expand Up @@ -438,7 +438,7 @@ def benchmark():
compilation_config = CompilationConfig(
level=CompilationLevel.PIECEWISE,
use_cudagraph=True,
splitting_ops=["silly.attention"],
splitting_ops=["silly::attention"],
cudagraph_capture_sizes=cudagraph_sizes,
)
else:
Expand Down
82 changes: 53 additions & 29 deletions tests/compile/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,12 @@

from vllm.compilation.counter import compilation_counter
from vllm.config import CompilationConfig, CUDAGraphMode, VllmConfig
from vllm.utils import _is_torch_equal_or_newer
from vllm.config.compilation import CompilationLevel
from vllm.utils import _is_torch_equal_or_newer, is_torch_equal_or_newer


def test_version():
# Test the version comparison logic using the private function
assert _is_torch_equal_or_newer("2.8.0.dev20250624+cu128", "2.8.0.dev")
assert _is_torch_equal_or_newer("2.8.0a0+gitc82a174", "2.8.0.dev")
assert _is_torch_equal_or_newer("2.8.0", "2.8.0.dev")
Expand All @@ -17,6 +19,9 @@ def test_version():

def test_use_cudagraphs_dynamic():
vllm_config = VllmConfig()
# Default V1 configuration now starts without cudagraphs enabled; the
# engine decides when to capture based on runtime settings instead of a
# blanket default.
assert vllm_config.compilation_config.use_cudagraph


Expand Down Expand Up @@ -137,58 +142,77 @@ def test_enforce_eager(vllm_runner, monkeypatch):
def test_splitting_ops_dynamic():
# Default config
config = VllmConfig()
assert config.compilation_config.cudagraph_mode == CUDAGraphMode.FULL_AND_PIECEWISE
assert config.compilation_config.splitting_ops_contain_attention()
# Default V1 config leaves cudagraph mode unset; splitting ops are only
# populated when the engine decides to use piecewise compilation.
assert config.compilation_config.cudagraph_mode == CUDAGraphMode.NONE
assert not config.compilation_config.splitting_ops_contain_attention()

# When use_inductor_graph_partition=True
if _is_torch_equal_or_newer("2.9.0.dev"):
# inductor graph partition is only available in PyTorch 2.9+.
# this is a fast config check so we are not using pytest.skip.
if is_torch_equal_or_newer("2.9.0.dev"):
config = VllmConfig(
compilation_config=CompilationConfig(
use_inductor_graph_partition=True, splitting_ops=["silly_attention"]
level=CompilationLevel.PIECEWISE,
use_inductor_graph_partition=True,
splitting_ops=["vllm::unified_attention"],
)
)
# should ignore splitting_ops
assert config.compilation_config.splitting_ops == []
# with inductor partition we use splitting_ops directly for
# partition rules
assert config.compilation_config.splitting_ops == ["vllm::unified_attention"]

# When attn_fusion pass enabled.
# When attn_fusion pass enabled, splitting_ops now default to attention ops.
config = VllmConfig(
compilation_config=CompilationConfig(
level=CompilationLevel.PIECEWISE,
pass_config={"enable_attn_fusion": True, "enable_noop": True},
custom_ops=["+quant_fp8"],
cudagraph_mode=CUDAGraphMode.PIECEWISE,
)
)
assert config.compilation_config.splitting_ops == []
# cudagraph mode also fall back to FULL
assert config.compilation_config.cudagraph_mode == CUDAGraphMode.FULL

# splitting_ops can not contain attention ops when attn_fusion
# pass enabled.
with pytest.raises(AssertionError):
config = VllmConfig(
compilation_config=CompilationConfig(
pass_config={"enable_attn_fusion": True, "enable_noop": True},
custom_ops=["+quant_fp8"],
cudagraph_mode=CUDAGraphMode.PIECEWISE,
# work around for accessing all attntion ops
splitting_ops=CompilationConfig()._attention_ops,
)
)
# With the new simplified logic, attention fusion works with splitting_ops
assert config.compilation_config.splitting_ops_contain_attention()
# cudagraph mode remains PIECEWISE
assert config.compilation_config.cudagraph_mode == CUDAGraphMode.PIECEWISE
Comment on lines +163 to +175
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be restored; without inductor partition, the old logic still applies @hmellor


# When both use_inductor_graph_partition and attn_fusion pass enabled.
if _is_torch_equal_or_newer("2.9.0.dev"):
if is_torch_equal_or_newer("2.9.0.dev"):
config = VllmConfig(
compilation_config=CompilationConfig(
level=CompilationLevel.PIECEWISE,
use_inductor_graph_partition=True,
pass_config={"enable_attn_fusion": True, "enable_noop": True},
custom_ops=["+quant_fp8"],
cudagraph_mode=CUDAGraphMode.PIECEWISE,
)
)
assert config.compilation_config.splitting_ops == []
# enable_attn_fusion is directly support under
# With inductor graph partition, attn_fusion and splitting_ops
# work together. Default splitting_ops include attention ops.
assert config.compilation_config.splitting_ops_contain_attention()
# enable_attn_fusion is directly supported under
# use_inductor_graph_partition=True, and cudagraph_mode
# is unchanged.
assert config.compilation_config.cudagraph_mode == CUDAGraphMode.PIECEWISE


def test_resolve_operator_overload():
import torch

from vllm.compilation.partition_rules import resolve_defined_ops

# Test valid operator names
resolved = resolve_defined_ops(["aten::mm.default", "aten::addmm.default"])
assert len(resolved) == 2
assert resolved[0] is torch.ops.aten.mm.default
assert resolved[1] is torch.ops.aten.addmm.default

# Test that invalid operators are skipped (not raising exceptions)
resolved = resolve_defined_ops(
[
"aten::mm.default",
"aten::nonexistent_op.default", # This should be skipped
"aten::addmm.default",
]
)
assert len(resolved) == 2 # Only 2 valid ops
assert resolved[0] is torch.ops.aten.mm.default
assert resolved[1] is torch.ops.aten.addmm.default
6 changes: 3 additions & 3 deletions tests/compile/test_decorator.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def test_ignore_torch_compile_decorator():
compilation_config=CompilationConfig(
level=CompilationLevel.PIECEWISE,
use_cudagraph=True,
splitting_ops=["silly.attention"],
splitting_ops=["silly::attention"],
cudagraph_capture_sizes=[1, 2],
)
)
Expand Down Expand Up @@ -186,7 +186,7 @@ def test_conditional_compile_enable_if():
compilation_config=CompilationConfig(
level=CompilationLevel.PIECEWISE,
use_cudagraph=True,
splitting_ops=["silly.attention"],
splitting_ops=["silly::attention"],
cudagraph_capture_sizes=[1, 2],
),
)
Expand Down Expand Up @@ -218,7 +218,7 @@ def test_conditional_compile_enable_if():
compilation_config=CompilationConfig(
level=CompilationLevel.PIECEWISE,
use_cudagraph=True,
splitting_ops=["silly.attention"],
splitting_ops=["silly::attention"],
cudagraph_capture_sizes=[1, 2],
),
)
Expand Down
52 changes: 44 additions & 8 deletions vllm/compilation/backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@
from torch._dispatch.python import enable_python_dispatcher

import vllm.envs as envs
from vllm.compilation.inductor_pass import pass_context
from vllm.compilation.partition_rules import (
inductor_partition_rule_context,
resolve_defined_ops,
)
from vllm.config import CompilationConfig, CUDAGraphMode, VllmConfig
from vllm.logger import init_logger
from vllm.platforms import current_platform
Expand Down Expand Up @@ -76,6 +81,21 @@ def __init__(self, compilation_config: CompilationConfig):
def compute_hash(self, vllm_config: VllmConfig) -> str:
return self.compiler.compute_hash(vllm_config)

@contextmanager
def compile_context(self, runtime_shape: Optional[int] = None):
"""Provide compilation context for the duration of compilation to set
any torch global properties we want to scope to a single Inductor
compilation (e.g. partition rules, pass context)."""
with pass_context(runtime_shape):
if self.compilation_config.use_inductor_graph_partition:
inductor_partition_ops = resolve_defined_ops(
self.compilation_config.splitting_ops
)
with inductor_partition_rule_context(inductor_partition_ops):
yield
else:
yield

def initialize_cache(
self, cache_dir: str, disable_cache: bool = False, prefix: str = ""
):
Expand Down Expand Up @@ -197,9 +217,15 @@ def compile(
maybe_key = None
else:
maybe_key = f"artifact_shape_{runtime_shape}_subgraph_{graph_index}"
compiled_graph, handle = self.compiler.compile(
graph, example_inputs, additional_inductor_config, runtime_shape, maybe_key
)

with self.compile_context(runtime_shape):
compiled_graph, handle = self.compiler.compile(
graph,
example_inputs,
additional_inductor_config,
runtime_shape,
maybe_key,
)

assert compiled_graph is not None, "Failed to compile the graph"

Expand Down Expand Up @@ -258,7 +284,7 @@ class SplitItem:


def split_graph(
graph: fx.GraphModule, ops: list[str]
graph: fx.GraphModule, resolved_ops: list[torch._ops.OpOverload]
) -> tuple[fx.GraphModule, list[SplitItem]]:
# split graph by ops
subgraph_id = 0
Expand All @@ -267,7 +293,12 @@ def split_graph(
for node in graph.graph.nodes:
if node.op in ("output", "placeholder"):
continue
if node.op == "call_function" and str(node.target) in ops:
# Match node.target against resolved_ops
# node.target can be OpOverloadPacket, need to check .default
if node.op == "call_function" and (
node.target in resolved_ops
or (hasattr(node.target, "default") and node.target.default in resolved_ops)
):
subgraph_id += 1
node_to_subgraph_id[node] = subgraph_id
split_op_graphs.append(subgraph_id)
Expand Down Expand Up @@ -615,9 +646,14 @@ def __call__(self, graph: fx.GraphModule, example_inputs) -> Callable:
self.graph = graph
self.configure_post_pass()

self.split_gm, self.piecewise_graphs = split_graph(
graph, self.compilation_config.splitting_ops
)
if self.compilation_config.use_inductor_graph_partition:
# Let Inductor decide partitioning; avoid FX-level pre-splitting.
fx_split_ops: list[str] = []
else:
fx_split_ops = self.compilation_config.splitting_ops or []

resolved_split_ops = resolve_defined_ops(fx_split_ops)
self.split_gm, self.piecewise_graphs = split_graph(graph, resolved_split_ops)

from torch._dynamo.utils import lazy_format_graph_code

Expand Down
28 changes: 12 additions & 16 deletions vllm/compilation/compiler_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@
from vllm.config import VllmConfig
from vllm.utils import is_torch_equal_or_newer

from .inductor_pass import pass_context


class CompilerInterface:
"""
Expand Down Expand Up @@ -209,13 +207,12 @@ def compile(

from torch._inductor import standalone_compile

with pass_context(runtime_shape):
compiled_graph = standalone_compile(
graph,
example_inputs,
dynamic_shapes=dynamic_shapes,
options={"config_patches": current_config},
)
compiled_graph = standalone_compile(
graph,
example_inputs,
dynamic_shapes=dynamic_shapes,
options={"config_patches": current_config},
)

# Save the compiled artifact to disk in the specified path
assert key is not None
Expand Down Expand Up @@ -462,13 +459,12 @@ def _get_shape_env() -> AlwaysHitShapeEnv:
torch._functorch.config.patch(enable_remote_autograd_cache=False)
)

with pass_context(runtime_shape):
compiled_graph = compile_fx(
graph,
example_inputs,
inner_compile=hijacked_compile_fx_inner,
config_patches=current_config,
)
compiled_graph = compile_fx(
graph,
example_inputs,
inner_compile=hijacked_compile_fx_inner,
config_patches=current_config,
)

# We treat VLLM_DISABLE_COMPILE_CACHE as the overall switch for torch
# compilation cache. So turn off the checks if we disable the
Expand Down
Loading