Skip to content

Conversation

@Chenyaaang
Copy link
Contributor

@Chenyaaang Chenyaaang commented Oct 16, 2025

Fix torch compile error on TPU platforms, reopen #26453

This pr includes:

  1. Set compilation backend to openxla on TPU platform
  2. Make sure TPU is using forward_tpu when dispatching in custom ops
  3. Bypass some backend checks which require either eager or inductor for non-tpu platforms.

Signed-off-by: Chenyaaang <chenyangli@google.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix torch.compile errors on TPU platforms by introducing support for the openxla backend and adjusting configurations accordingly. The changes are generally in the right direction.

However, I've identified a critical issue in vllm/compilation/backends.py. The current implementation for the openxla backend on TPUs incorrectly falls back to using EagerAdaptor. This would silently disable compilation, negating the performance benefits of using openxla. I have provided a detailed comment with a suggested fix to prevent this incorrect behavior.

Comment on lines 57 to 68
else:
assert compilation_config.backend == "eager", (
"Custom backends not supported with CompilationMode.VLLM_COMPILE"
)
if current_platform.is_tpu():
assert compilation_config.backend == "openxla", (
"TPU platform should use openxla as compiler backend."
)
else:
assert compilation_config.backend == "eager", (
"Custom backends not supported with CompilationMode.VLLM_COMPILE"
)

logger.debug("Using EagerAdaptor")
return EagerAdaptor()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This logic incorrectly returns an EagerAdaptor for the openxla backend on TPU platforms. This will cause the model to run in eager mode instead of being compiled with openxla, which defeats the purpose of the changes in this PR. This is a critical issue as it silently disables compilation and its performance benefits.

A dedicated OpenXlaAdaptor should be implemented to correctly invoke the openxla compiler for subgraphs. Until then, it's safer to raise a NotImplementedError to prevent incorrect behavior.

Suggested change
else:
assert compilation_config.backend == "eager", (
"Custom backends not supported with CompilationMode.VLLM_COMPILE"
)
if current_platform.is_tpu():
assert compilation_config.backend == "openxla", (
"TPU platform should use openxla as compiler backend."
)
else:
assert compilation_config.backend == "eager", (
"Custom backends not supported with CompilationMode.VLLM_COMPILE"
)
logger.debug("Using EagerAdaptor")
return EagerAdaptor()
else:
if current_platform.is_tpu():
assert compilation_config.backend == "openxla", (
"TPU platform should use openxla as compiler backend."
)
# Returning EagerAdaptor for openxla backend is incorrect
# as it would disable compilation.
# A proper OpenXlaAdaptor is required.
raise NotImplementedError("OpenXlaAdaptor is not implemented.")
else:
assert compilation_config.backend == "eager", (
"Custom backends not supported with CompilationMode.VLLM_COMPILE"
)
logger.debug("Using EagerAdaptor")
return EagerAdaptor()

@Chenyaaang
Copy link
Contributor Author

@ProExpertProg can you please review this again? Looks like the TPU path is still break.

Comment on lines +58 to +61
if current_platform.is_tpu():
assert compilation_config.backend == "openxla", (
"TPU platform should use openxla as compiler backend."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this platform specific check live in vllm/platforms/tpu.py?

Copy link
Contributor Author

@Chenyaaang Chenyaaang Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we can remove it, because platforms/tpu.py assigns "openxla" to backend. Do you think this will be better?

Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What issues are you running into? This should have been addressed in #26502.

assert compilation_config.backend == "eager", (
"Custom backends not supported with CompilationMode.VLLM_COMPILE"
)
if current_platform.is_tpu():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never reach this code unless CompilationConfig.mode== 3 (CompilationMode.VLLM_COMPILE)

Does this mean we're using VLLM_COMPILE for TPU now? I thought we used DYNAMO_TRACE_ONCE?

self.backend = "inductor" if self.use_inductor else "eager"

if self.backend == "":
self.backend = current_platform.simple_compile_backend
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should already set the backend to "openxla"

Comment on lines +143 to +145
# Note: the default backend is set to inductor now
# we want to overwrite to openxla to execute the ops properly on TPU.
compilation_config.backend = "openxla"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should already be set inside init_backend. Also the default backend is NOT set to inductor, it is still "" the way I understand it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I see above here that we ARE still using DYNAMO_TRACE_ONCE. What issue does this resolve that you're running into

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tpu Related to Google TPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants