[TPU] Fix tpu torch compile error #27049

Chenyaaang · 2025-10-16T20:16:02Z

Fix torch compile error on TPU platforms, reopen #26453

This pr includes:

Set compilation backend to openxla on TPU platform
Make sure TPU is using forward_tpu when dispatching in custom ops
Bypass some backend checks which require either eager or inductor for non-tpu platforms.

Signed-off-by: Chenyaaang <chenyangli@google.com>

gemini-code-assist

Code Review

This pull request aims to fix torch.compile errors on TPU platforms by introducing support for the openxla backend and adjusting configurations accordingly. The changes are generally in the right direction.

However, I've identified a critical issue in vllm/compilation/backends.py. The current implementation for the openxla backend on TPUs incorrectly falls back to using EagerAdaptor. This would silently disable compilation, negating the performance benefits of using openxla. I have provided a detailed comment with a suggested fix to prevent this incorrect behavior.

gemini-code-assist · 2025-10-16T20:23:27Z

vllm/compilation/backends.py

    else:
-        assert compilation_config.backend == "eager", (
-            "Custom backends not supported with CompilationMode.VLLM_COMPILE"
-        )
+        if current_platform.is_tpu():
+            assert compilation_config.backend == "openxla", (
+                "TPU platform should use openxla as compiler backend."
+            )
+        else:
+            assert compilation_config.backend == "eager", (
+                "Custom backends not supported with CompilationMode.VLLM_COMPILE"
+            )

        logger.debug("Using EagerAdaptor")
        return EagerAdaptor()


This logic incorrectly returns an EagerAdaptor for the openxla backend on TPU platforms. This will cause the model to run in eager mode instead of being compiled with openxla, which defeats the purpose of the changes in this PR. This is a critical issue as it silently disables compilation and its performance benefits.

A dedicated OpenXlaAdaptor should be implemented to correctly invoke the openxla compiler for subgraphs. Until then, it's safer to raise a NotImplementedError to prevent incorrect behavior.

Suggested change

else:

assert compilation_config.backend == "eager", (

"Custom backends not supported with CompilationMode.VLLM_COMPILE"

)

if current_platform.is_tpu():

assert compilation_config.backend == "openxla", (

"TPU platform should use openxla as compiler backend."

)

else:

assert compilation_config.backend == "eager", (

"Custom backends not supported with CompilationMode.VLLM_COMPILE"

)

logger.debug("Using EagerAdaptor")

return EagerAdaptor()

else:

if current_platform.is_tpu():

assert compilation_config.backend == "openxla", (

"TPU platform should use openxla as compiler backend."

)

# Returning EagerAdaptor for openxla backend is incorrect

# as it would disable compilation.

# A proper OpenXlaAdaptor is required.

raise NotImplementedError("OpenXlaAdaptor is not implemented.")

else:

assert compilation_config.backend == "eager", (

"Custom backends not supported with CompilationMode.VLLM_COMPILE"

)

logger.debug("Using EagerAdaptor")

return EagerAdaptor()

Chenyaaang · 2025-10-16T21:10:50Z

@ProExpertProg can you please review this again? Looks like the TPU path is still break.

hmellor · 2025-10-17T09:34:39Z

vllm/compilation/backends.py

+        if current_platform.is_tpu():
+            assert compilation_config.backend == "openxla", (
+                "TPU platform should use openxla as compiler backend."
+            )


Can this platform specific check live in vllm/platforms/tpu.py?

Actually we can remove it, because platforms/tpu.py assigns "openxla" to backend. Do you think this will be better?

ProExpertProg

What issues are you running into? This should have been addressed in #26502.

ProExpertProg · 2025-10-17T18:08:28Z

vllm/compilation/backends.py

-        assert compilation_config.backend == "eager", (
-            "Custom backends not supported with CompilationMode.VLLM_COMPILE"
-        )
+        if current_platform.is_tpu():


We should never reach this code unless CompilationConfig.mode== 3 (CompilationMode.VLLM_COMPILE)

Does this mean we're using VLLM_COMPILE for TPU now? I thought we used DYNAMO_TRACE_ONCE?

ProExpertProg · 2025-10-17T18:12:05Z

vllm/config/compilation.py

            self.backend = "inductor" if self.use_inductor else "eager"

        if self.backend == "":
            self.backend = current_platform.simple_compile_backend


This should already set the backend to "openxla"

ProExpertProg · 2025-10-17T18:14:38Z

vllm/platforms/tpu.py

+        # Note: the default backend is set to inductor now
+        # we want to overwrite to openxla to execute the ops properly on TPU.
+        compilation_config.backend = "openxla"


This should already be set inside init_backend. Also the default backend is NOT set to inductor, it is still "" the way I understand it

Also I see above here that we ARE still using DYNAMO_TRACE_ONCE. What issue does this resolve that you're running into

fix tpu torch compile error

4292817

Signed-off-by: Chenyaaang <chenyangli@google.com>

Chenyaaang requested review from NickLucche, ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256, youkaichao and zou3519 as code owners October 16, 2025 20:16

mergify bot added the tpu Related to Google TPUs label Oct 16, 2025

gemini-code-assist bot reviewed Oct 16, 2025

View reviewed changes

hmellor reviewed Oct 17, 2025

View reviewed changes

ProExpertProg requested changes Oct 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[TPU] Fix tpu torch compile error #27049

[TPU] Fix tpu torch compile error #27049

Chenyaaang commented Oct 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 16, 2025

Uh oh!

Chenyaaang commented Oct 16, 2025

Uh oh!

hmellor Oct 17, 2025

Uh oh!

Chenyaaang Oct 17, 2025 •

edited

Loading

Uh oh!

ProExpertProg left a comment

Uh oh!

ProExpertProg Oct 17, 2025

Uh oh!

ProExpertProg Oct 17, 2025

Uh oh!

ProExpertProg Oct 17, 2025

Uh oh!

ProExpertProg Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[TPU] Fix tpu torch compile error #27049

Are you sure you want to change the base?

[TPU] Fix tpu torch compile error #27049

Conversation

Chenyaaang commented Oct 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Chenyaaang commented Oct 16, 2025

Uh oh!

hmellor Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Chenyaaang Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Chenyaaang commented Oct 16, 2025 •

edited by github-actions bot

Loading

Chenyaaang Oct 17, 2025 •

edited

Loading