[fabric.example.rl] Not support torch.float64 for MPS device #19981

swyo · 2024-06-17T04:13:05Z

Bug description

I found an error when run the example pytorch-lightning/examples/fabric/reinforcement_learning on M2 Mac (device type=mps)

Reproduce Error

reinforcement_learning git:(master) ✗ fabric run train_fabric.py
W0617 12:53:22.541000 8107367488 torch/distributed/elastic/multiprocessing/redirects.py:27] NOTE: Redirects are currently not supported in Windows or MacOs.
[rank: 0] Seed set to 42
Missing logger folder: logs/fabric_logs/2024-06-17_12-53-24/CartPole-v1_default_42_1718596404
set default torch dtype as torch.float32
Traceback (most recent call last):
  File "/Users/user/workspace/pytorch-lightning/examples/fabric/reinforcement_learning/train_fabric.py", line 215, in <module>
    main(args)
  File "/Users/user/workspace/pytorch-lightning/examples/fabric/reinforcement_learning/train_fabric.py", line 154, in main
    rewards[step] = torch.tensor(reward, device=device).view(-1)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

This bug is fixed by checking device.type and type casting to torch.float32 reward.

@@ -146,7 +146,7 @@ def main(args: argparse.Namespace):
             # Single environment step
             next_obs, reward, done, truncated, info = envs.step(action.cpu().numpy())
             done = torch.logical_or(torch.tensor(done), torch.tensor(truncated))
-            rewards[step] = torch.tensor(reward, device=device).view(-1)
+            rewards[step] = torch.tensor(reward, device=device, dtype=torch.float32 if device.type == 'mps' else None).view(-1)

What version are you seeing the problem on?

master

Environment

Current environment

#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):  2.3.0
#- Lightning App Version (e.g., 0.5.2): 2.3.0
#- PyTorch Version (e.g., 2.0): 2.3.1
#- Python version (e.g., 3.9): 3.12.3
#- OS (e.g., Linux): Mac
#- CUDA/cuDNN version: MPS
#- GPU models and configuration: M2
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):

The text was updated successfully, but these errors were encountered:

swyo added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Jun 17, 2024

github-actions bot added the ver: 2.2.x label Jun 17, 2024

swyo added a commit to swyo/pytorch-lightning that referenced this issue Jun 17, 2024

fix example for mps; issue Lightning-AI#19981

c204de8

swyo mentioned this issue Jun 17, 2024

Fix dtype for MPS in reinforcement learning example #19982

Merged

3 tasks

awaelchli added example and removed needs triage Waiting to be triaged by maintainers labels Jun 17, 2024

awaelchli assigned swyo Jun 17, 2024

awaelchli closed this as completed in #19982 Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fabric.example.rl] Not support torch.float64 for MPS device #19981

[fabric.example.rl] Not support torch.float64 for MPS device #19981

swyo commented Jun 17, 2024 •

edited

Loading

[fabric.example.rl] Not support torch.float64 for MPS device #19981

[fabric.example.rl] Not support torch.float64 for MPS device #19981

Comments

swyo commented Jun 17, 2024 • edited Loading

Bug description

Reproduce Error

What version are you seeing the problem on?

Environment

swyo commented Jun 17, 2024 •

edited

Loading