Tensors must be CUDA and dense on using DDP #2529
Replies: 13 comments 8 replies
-
Got the exact same error with the newest release of ptl, if you move back down to version 0.8.1 it should work. (pip install pytorch-lightning==0.8.1) @hadarishav |
Beta Was this translation helpful? Give feedback.
-
I also got the same error. The |
Beta Was this translation helpful? Give feedback.
-
I am also experiencing this issue on PL 0.8.5, python 3.7.8, torch 1.5.1. When using Is this solved in PL 0.9 and the issue was not closed? The traceback the OP posted and my own traceback do not map to lines in the current repo. |
Beta Was this translation helpful? Give feedback.
-
I also had this issue, but it is because when I calculate the accuracy I wasn't doing device=x.device. |
Beta Was this translation helpful? Give feedback.
-
@velocityCavalry could you share the code snippet how you calculated the accuracy? I am wondering which tensors you had that weren't already on the gpu. The outputs that that go into validation_epoch_end should be on the right device. However, if you create a new tensor and return that in validation_epoch_end, you need to make sure it is on "self.device" |
Beta Was this translation helpful? Give feedback.
-
Same with pl 1.0.5 |
Beta Was this translation helpful? Give feedback.
-
I've faced the same "Tensors must be CUDA and dense" issue with pl 1.0.5. @williamFalcon Here is my custom metric classes:
and in my test loop:
When running this code with DDP backend, the "Tensors must be CUDA and dense" error will happen at the end of test. |
Beta Was this translation helpful? Give feedback.
-
Hi everybody! If one of you could provide a minimum script to reproduce, that'd be great Then open an bug report in the issues tab with it. Discussion is better suited to... discuss :D |
Beta Was this translation helpful? Give feedback.
-
Got the same error. On pl==1.6.0 |
Beta Was this translation helpful? Give feedback.
-
same bug at pytorch/pytorch#88685 |
Beta Was this translation helpful? Give feedback.
-
Using, torch == 2.0.1, torchmetrics==1.0.0, pytorch_lightning == 2.0.4, deepspeed == 0.9.5, I was still getting the error. The metric object was not loaded into the respective cuda device when |
Beta Was this translation helpful? Give feedback.
-
I have met with same problem. make sure you're using tensors on GPU with config of collect_device='cuda'. Otherwise your collect device should be 'cpu'. |
Beta Was this translation helpful? Give feedback.
-
My code works fine when I am using single GPU, however when I switch multi-gpu and ddp I get the following error -
Any help is appreciated.
Beta Was this translation helpful? Give feedback.
All reactions