-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[TPU] Fix tpu torch compile error #27049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Chenyaaang <chenyangli@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix torch.compile errors on TPU platforms by introducing support for the openxla backend and adjusting configurations accordingly. The changes are generally in the right direction.
However, I've identified a critical issue in vllm/compilation/backends.py. The current implementation for the openxla backend on TPUs incorrectly falls back to using EagerAdaptor. This would silently disable compilation, negating the performance benefits of using openxla. I have provided a detailed comment with a suggested fix to prevent this incorrect behavior.
| else: | ||
| assert compilation_config.backend == "eager", ( | ||
| "Custom backends not supported with CompilationMode.VLLM_COMPILE" | ||
| ) | ||
| if current_platform.is_tpu(): | ||
| assert compilation_config.backend == "openxla", ( | ||
| "TPU platform should use openxla as compiler backend." | ||
| ) | ||
| else: | ||
| assert compilation_config.backend == "eager", ( | ||
| "Custom backends not supported with CompilationMode.VLLM_COMPILE" | ||
| ) | ||
|
|
||
| logger.debug("Using EagerAdaptor") | ||
| return EagerAdaptor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic incorrectly returns an EagerAdaptor for the openxla backend on TPU platforms. This will cause the model to run in eager mode instead of being compiled with openxla, which defeats the purpose of the changes in this PR. This is a critical issue as it silently disables compilation and its performance benefits.
A dedicated OpenXlaAdaptor should be implemented to correctly invoke the openxla compiler for subgraphs. Until then, it's safer to raise a NotImplementedError to prevent incorrect behavior.
| else: | |
| assert compilation_config.backend == "eager", ( | |
| "Custom backends not supported with CompilationMode.VLLM_COMPILE" | |
| ) | |
| if current_platform.is_tpu(): | |
| assert compilation_config.backend == "openxla", ( | |
| "TPU platform should use openxla as compiler backend." | |
| ) | |
| else: | |
| assert compilation_config.backend == "eager", ( | |
| "Custom backends not supported with CompilationMode.VLLM_COMPILE" | |
| ) | |
| logger.debug("Using EagerAdaptor") | |
| return EagerAdaptor() | |
| else: | |
| if current_platform.is_tpu(): | |
| assert compilation_config.backend == "openxla", ( | |
| "TPU platform should use openxla as compiler backend." | |
| ) | |
| # Returning EagerAdaptor for openxla backend is incorrect | |
| # as it would disable compilation. | |
| # A proper OpenXlaAdaptor is required. | |
| raise NotImplementedError("OpenXlaAdaptor is not implemented.") | |
| else: | |
| assert compilation_config.backend == "eager", ( | |
| "Custom backends not supported with CompilationMode.VLLM_COMPILE" | |
| ) | |
| logger.debug("Using EagerAdaptor") | |
| return EagerAdaptor() |
|
@ProExpertProg can you please review this again? Looks like the TPU path is still break. |
| if current_platform.is_tpu(): | ||
| assert compilation_config.backend == "openxla", ( | ||
| "TPU platform should use openxla as compiler backend." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this platform specific check live in vllm/platforms/tpu.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we can remove it, because platforms/tpu.py assigns "openxla" to backend. Do you think this will be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What issues are you running into? This should have been addressed in #26502.
| assert compilation_config.backend == "eager", ( | ||
| "Custom backends not supported with CompilationMode.VLLM_COMPILE" | ||
| ) | ||
| if current_platform.is_tpu(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should never reach this code unless CompilationConfig.mode== 3 (CompilationMode.VLLM_COMPILE)
Does this mean we're using VLLM_COMPILE for TPU now? I thought we used DYNAMO_TRACE_ONCE?
| self.backend = "inductor" if self.use_inductor else "eager" | ||
|
|
||
| if self.backend == "": | ||
| self.backend = current_platform.simple_compile_backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should already set the backend to "openxla"
| # Note: the default backend is set to inductor now | ||
| # we want to overwrite to openxla to execute the ops properly on TPU. | ||
| compilation_config.backend = "openxla" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should already be set inside init_backend. Also the default backend is NOT set to inductor, it is still "" the way I understand it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I see above here that we ARE still using DYNAMO_TRACE_ONCE. What issue does this resolve that you're running into
Fix torch compile error on TPU platforms, reopen #26453
This pr includes: