You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
move_metrics_to_cpu=True seems to move loss to CPU which results in an error when training with native mixed precision.
This is related to the original issue reported in MIC-DKFZ/nnDetection#25 and did not occur with other lightning versions (older version and/or move_metrics_to_cpu=False work fine).
Error when training with mixed precision and move_metrics_to_cpu=True:
/usr/local/lib/python3.7/dist-packages/torch/cuda/amp/grad_scaler.py in scale(self, outputs)
159 # Short-circuit for the common case.
160 if isinstance(outputs, torch.Tensor):
--> 161 assert outputs.is_cuda or outputs.device.type == 'xla'
162 if self._scale is None:
163 self._lazy_init_scale_growth_tracker(outputs.device)
To Reproduce
Can be reproduced with the boring model in colab by passing the following flags to the trainer:
precision=16, # native mixed precision
move_metrics_to_cpu=True,
gpus=[0], # use GPU
Expected behavior
No error :)
Environment
PyTorch Lightning Version (e.g., 1.3.0):
PyTorch Version (e.g., 1.8)
Python version:
OS (e.g., Linux):
CUDA/cuDNN version:
GPU models and configuration:
How you installed PyTorch (conda, pip, source):
If compiling from source, the output of torch.__config__.show():
Any other relevant information:
Additional context
The text was updated successfully, but these errors were encountered:
🐛 Bug
move_metrics_to_cpu=True
seems to move loss to CPU which results in an error when training with native mixed precision.This is related to the original issue reported in MIC-DKFZ/nnDetection#25 and did not occur with other lightning versions (older version and/or move_metrics_to_cpu=False work fine).
Error when training with mixed precision and
move_metrics_to_cpu=True
:To Reproduce
Can be reproduced with the boring model in colab by passing the following flags to the trainer:
Expected behavior
No error :)
Environment
conda
,pip
, source):torch.__config__.show()
:Additional context
The text was updated successfully, but these errors were encountered: