model_to_device() missing 1 required positional argument 'process_idx' #5465

adityabalu · 2021-01-11T17:46:20Z

🐛 Bug

When running the code for ddp_cpu on SLURM based cluster, I get this error:

Traceback (most recent call last): File "image_classifier.py", line 99, in <module> cli_main() File "image_classifier.py", line 87, in cli_main trainer.fit(model, datamodule=dm) File "/pylon5/cis200022p/balu/softwares/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 472, in fit results = self.accelerator_backend.train() File "/pylon5/cis200022p/balu/softwares/pytorch/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_hpc_accelerator.py", line 64, in train self.ddp_train(process_idx=self.task_idx, model=model) File "/pylon5/cis200022p/balu/softwares/pytorch/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_hpc_accelerator.py", line 172, in ddp_train self.model_to_device(model) TypeError: model_to_device() missing 1 required positional argument: 'process_idx'

When I look here the model_to_device function needs process_idx as an input, but is not sent here

Please reproduce using the BoringModel

I used this code :
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pl_examples/basic_examples/simple_image_classifier.py

Along with this slurm job script:

> #!/bin/bash
> #SBATCH --job-name='pl_dist'
> #SBATCH --nodes=2
> #SBATCH -p RM
> #SBATCH --ntasks-per-node=1
> #SBATCH -t 1:00:00
> 
> module load anaconda3
> source activate /pylon5/softwares/pytorch
> 
> export NCCL_DEBUG=INFO
> export PYTHONFAULTHANDLER=1
> 
> srun -n 2 --ntasks-per-node 1 python image_classifier.py --accelerator 'ddp_cpu' --num_nodes 2 --num_processes 1 --max_epochs 50

Environment

CUDA:
- GPU:
- available: False
- version: 10.2
Packages:
- numpy: 1.19.2
- pyTorch_debug: False
- pyTorch_version: 1.7.1
- pytorch-lightning: 1.1.3
- tqdm: 4.56.0
System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.8.5
- version: Proposal for help #1 SMP Mon Jul 29 17:46:05 UTC 2019

The text was updated successfully, but these errors were encountered:

github-actions · 2021-01-11T17:47:03Z

Hi! thanks for your contribution!, great first issue!

adityabalu added bug Something isn't working help wanted Open to be worked on labels Jan 11, 2021

edenlightning added the environment: slurm label Jan 11, 2021

awaelchli linked a pull request Jan 13, 2021 that will close this issue

[bugfix] Fix signature mismatch in DDPCPUHPCAccelerator's model_to_device #5505

Merged

12 tasks

SeanNaren closed this as completed in #5505 Jan 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model_to_device() missing 1 required positional argument 'process_idx' #5465

model_to_device() missing 1 required positional argument 'process_idx' #5465

adityabalu commented Jan 11, 2021

github-actions bot commented Jan 11, 2021

model_to_device() missing 1 required positional argument 'process_idx' #5465

model_to_device() missing 1 required positional argument 'process_idx' #5465

Comments

adityabalu commented Jan 11, 2021

🐛 Bug

Please reproduce using the BoringModel

Environment

github-actions bot commented Jan 11, 2021