Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPU & TPU doesn't support torch.inference_mode #13014

Merged
merged 9 commits into from
May 9, 2022
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Fixed mismatching default values for the types of some arguments in the DeepSpeed and Fully-Sharded strategies which made the CLI unable to use them ([#12989](https://github.com/PyTorchLightning/pytorch-lightning/pull/12989))


-
- Fixed an issue with unsupported torch.inference_mode() on hpu backends by making it use no_grad ([#13014](https://github.com/PyTorchLightning/pytorch-lightning/pull/13014))


-
Expand Down
1 change: 1 addition & 0 deletions docs/source/accelerators/hpu_basic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,3 +81,4 @@ Known limitations
* Multiple optimizers are not supported.
* `Habana dataloader <https://docs.habana.ai/en/latest/PyTorch_User_Guide/PyTorch_User_Guide.html#habana-data-loader>`__ is not supported.
* :class:`~pytorch_lightning.callbacks.device_stats_monitor.DeviceStatsMonitor` is not supported.
* :func:`torch.inference_mode` is not supported
12 changes: 7 additions & 5 deletions pytorch_lightning/trainer/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -1319,7 +1319,7 @@ def _run_evaluate(self) -> _EVALUATE_OUTPUT:
# reset trainer on this loop and all child loops in case user connected a custom loop
self._evaluation_loop.trainer = self

with self.profiler.profile(f"run_{self.state.stage}_evaluation"), _evaluation_context():
with self.profiler.profile(f"run_{self.state.stage}_evaluation"), _evaluation_context(self.accelerator):
eval_loop_results = self._evaluation_loop.run()

# remove the tensors from the eval results
Expand All @@ -1335,7 +1335,7 @@ def _run_predict(self) -> Optional[_PREDICT_OUTPUT]:
self.reset_predict_dataloader(self.lightning_module)
# reset trainer on this loop and all child loops in case user connected a custom loop
self.predict_loop.trainer = self
with _evaluation_context():
with _evaluation_context(self.accelerator):
return self.predict_loop.run()

def _run_sanity_check(self) -> None:
Expand Down Expand Up @@ -2801,11 +2801,13 @@ def configure_optimizers(self):


@contextmanager
def _evaluation_context() -> Generator:
# inference mode is not supported with gloo backend (#9431)
def _evaluation_context(accelerator) -> Generator:
rohitgr7 marked this conversation as resolved.
Show resolved Hide resolved
# inference mode is not supported with gloo backend (#9431) and hpu backend
context_manager_class = (
torch.inference_mode
if _TORCH_GREATER_EQUAL_1_9 and not (dist.is_initialized() and dist.get_backend() == "gloo")
if _TORCH_GREATER_EQUAL_1_9
and not (dist.is_initialized() and dist.get_backend() == "gloo")
and not (isinstance(accelerator, HPUAccelerator))
rohitgr7 marked this conversation as resolved.
Show resolved Hide resolved
else torch.no_grad
)
with context_manager_class():
Expand Down