You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running the code for ddp_cpu on SLURM based cluster, I get this error:
Traceback (most recent call last): File "image_classifier.py", line 99, in <module> cli_main() File "image_classifier.py", line 87, in cli_main trainer.fit(model, datamodule=dm) File "/pylon5/cis200022p/balu/softwares/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 472, in fit results = self.accelerator_backend.train() File "/pylon5/cis200022p/balu/softwares/pytorch/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_hpc_accelerator.py", line 64, in train self.ddp_train(process_idx=self.task_idx, model=model) File "/pylon5/cis200022p/balu/softwares/pytorch/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_hpc_accelerator.py", line 172, in ddp_train self.model_to_device(model) TypeError: model_to_device() missing 1 required positional argument: 'process_idx'
When I look here the model_to_device function needs process_idx as an input, but is not sent here
🐛 Bug
When running the code for ddp_cpu on SLURM based cluster, I get this error:
When I look here the model_to_device function needs process_idx as an input, but is not sent here
Please reproduce using the BoringModel
I used this code :
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pl_examples/basic_examples/simple_image_classifier.py
Along with this slurm job script:
Environment
- GPU:
- available: False
- version: 10.2
- numpy: 1.19.2
- pyTorch_debug: False
- pyTorch_version: 1.7.1
- pytorch-lightning: 1.1.3
- tqdm: 4.56.0
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.8.5
- version: Proposal for help #1 SMP Mon Jul 29 17:46:05 UTC 2019
The text was updated successfully, but these errors were encountered: