error: unrecognized arguments: --local_rank=1 #1302

TAICHIKF · 2022-02-18T03:49:14Z

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug
usage: train.py [-h]
train.py: error: unrecognized arguments: --local_rank=1 configs/fpn_crossformer_b_panda_40k.py --work-dir ./seg-output --launcher pytorch
usage: train.py [-h]
train.py: error: unrecognized arguments: --local_rank=2 configs/fpn_crossformer_b_panda_40k.py --work-dir ./seg-output --launcher pytorch
usage: train.py [-h]
train.py: error: unrecognized arguments: --local_rank=0 configs/fpn_crossformer_b_panda_40k.py --work-dir ./seg-output --launcher pytorch
usage: train.py [-h]
train.py: error: unrecognized arguments: --local_rank=3 configs/fpn_crossformer_b_panda_40k.py --work-dir ./seg-output --launcher pytorch
Traceback (most recent call last):
File "/root/anaconda3/envs/CrossFormer/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/root/anaconda3/envs/CrossFormer/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/anaconda3/envs/CrossFormer/lib/python3.6/site-packages/torch/distributed/launch.py", line 260, in
main()
File "/root/anaconda3/envs/CrossFormer/lib/python3.6/site-packages/torch/distributed/launch.py", line 256, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/root/anaconda3/envs/CrossFormer/bin/python', '-u', './train.py', '--local_rank=3', 'configs/fpn_crossformer_b_panda_40k.py', '--work-dir', './seg-output', '--launcher', 'pytorch']' returned non-zero exit status 2.

Reproduction

What command or script did you run?

./dist_train.sh configs/fpn_crossformer_b_panda_40k.py 4 ckpt/backbone-corssformer-s.pth

Did you make any modifications on the code or config? Did you understand what you have modified?
no
What dataset did you use?

PANDA

Environment

Python3.6-based
mmcv-full==1.2.7 mmsegmentation==0.12.0
numpy scipy Pillow pyyaml torch==1.7.0 torchvision==0.8.1 timm==0.3.2

Please run python mmseg/utils/collect_env.py to collect necessary environment information and paste it here.
You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback

If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix

If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

The text was updated successfully, but these errors were encountered:

MengzhangLI · 2022-02-18T05:03:36Z

Can you try to run Train with multiple GPUs commands below?

https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/train.md#train-with-multiple-gpus

I think your error is caused by incorrect usage of train.py.

TAICHIKF · 2022-02-18T06:45:47Z

I use 【./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments] 】 this conmand, but it has the same error.

MengzhangLI · 2022-02-18T06:58:20Z

Could you try to add '--deterministic' in [optional arguments] ?

TAICHIKF · 2022-02-18T07:34:40Z

when I use 【python3 -m torch.distributed.launch --nproc_per_node=4 /mnt/code/sicap_test1_dis.py】， it can work but got a new error: FileNotFoundError: [Errno 2] No such file or directory '/mnt/code/mmsegmentation/run/sicap_pspnet_0211/.eval_hook';
When training with 4 Gpus, the above error occurs after one verification, and then the program continues training with one RANDOM GPU.

TAICHIKF · 2022-02-18T11:12:46Z

MengzhangLI · 2022-02-26T05:23:48Z

Your mmcv and mmseg version are too old. Could you try to upgrade to latest and try agian?

MengzhangLI self-assigned this Feb 18, 2022

MeowZheng closed this as completed Apr 9, 2022

wjkim81 pushed a commit to wjkim81/mmsegmentation that referenced this issue Dec 3, 2023

remove unsupported tags in pypi page (open-mmlab#1302)

b420024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error: unrecognized arguments: --local_rank=1 #1302

error: unrecognized arguments: --local_rank=1 #1302

TAICHIKF commented Feb 18, 2022

MengzhangLI commented Feb 18, 2022

TAICHIKF commented Feb 18, 2022

MengzhangLI commented Feb 18, 2022

TAICHIKF commented Feb 18, 2022

TAICHIKF commented Feb 18, 2022

MengzhangLI commented Feb 26, 2022

error: unrecognized arguments: --local_rank=1 #1302

error: unrecognized arguments: --local_rank=1 #1302

Comments

TAICHIKF commented Feb 18, 2022

MengzhangLI commented Feb 18, 2022

TAICHIKF commented Feb 18, 2022

MengzhangLI commented Feb 18, 2022

TAICHIKF commented Feb 18, 2022

TAICHIKF commented Feb 18, 2022

MengzhangLI commented Feb 26, 2022