Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model_to_device() missing 1 required positional argument 'process_idx' #5465

Closed
adityabalu opened this issue Jan 11, 2021 · 1 comment · Fixed by #5505
Closed

model_to_device() missing 1 required positional argument 'process_idx' #5465

adityabalu opened this issue Jan 11, 2021 · 1 comment · Fixed by #5505
Labels
bug Something isn't working environment: slurm help wanted Open to be worked on

Comments

@adityabalu
Copy link

🐛 Bug

When running the code for ddp_cpu on SLURM based cluster, I get this error:

Traceback (most recent call last): File "image_classifier.py", line 99, in <module> cli_main() File "image_classifier.py", line 87, in cli_main trainer.fit(model, datamodule=dm) File "/pylon5/cis200022p/balu/softwares/pytorch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 472, in fit results = self.accelerator_backend.train() File "/pylon5/cis200022p/balu/softwares/pytorch/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_hpc_accelerator.py", line 64, in train self.ddp_train(process_idx=self.task_idx, model=model) File "/pylon5/cis200022p/balu/softwares/pytorch/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_hpc_accelerator.py", line 172, in ddp_train self.model_to_device(model) TypeError: model_to_device() missing 1 required positional argument: 'process_idx'

When I look here the model_to_device function needs process_idx as an input, but is not sent here

Please reproduce using the BoringModel

I used this code :
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pl_examples/basic_examples/simple_image_classifier.py

Along with this slurm job script:

> #!/bin/bash
> #SBATCH --job-name='pl_dist'
> #SBATCH --nodes=2
> #SBATCH -p RM
> #SBATCH --ntasks-per-node=1
> #SBATCH -t 1:00:00
> 
> module load anaconda3
> source activate /pylon5/softwares/pytorch
> 
> export NCCL_DEBUG=INFO
> export PYTHONFAULTHANDLER=1
> 
> srun -n 2 --ntasks-per-node 1 python image_classifier.py --accelerator 'ddp_cpu' --num_nodes 2 --num_processes 1 --max_epochs 50

Environment

  • CUDA:
    - GPU:
    - available: False
    - version: 10.2
  • Packages:
    - numpy: 1.19.2
    - pyTorch_debug: False
    - pyTorch_version: 1.7.1
    - pytorch-lightning: 1.1.3
    - tqdm: 4.56.0
  • System:
    - OS: Linux
    - architecture:
    - 64bit
    - ELF
    - processor: x86_64
    - python: 3.8.5
    - version: Proposal for help #1 SMP Mon Jul 29 17:46:05 UTC 2019
@adityabalu adityabalu added bug Something isn't working help wanted Open to be worked on labels Jan 11, 2021
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working environment: slurm help wanted Open to be worked on
Projects
None yet
2 participants