Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练过程设备不一致的错误 #24

Open
ForeUP opened this issue Jan 17, 2024 · 0 comments
Open

训练过程设备不一致的错误 #24

ForeUP opened this issue Jan 17, 2024 · 0 comments

Comments

@ForeUP
Copy link

ForeUP commented Jan 17, 2024

作者的训练代码 train.py

pos_mask = pid2clothes[pids]

存在设备不一致的错误,不知道其他人有没有遇到:

........
Model size: 23.51622M
==> Start training
pid2clothes: cpu pids: cuda:0
Traceback (most recent call last):
File "main.py", line 203, in
main(config)
File "main.py", line 142, in main
train_cal(config, epoch, model, classifier, clothes_classifier, criterion_cla, criterion_pair,
File "/home/user/F/CCReID/Simple-CCReID/train.py", line 29, in train_cal
pos_mask = pid2clothes[pids]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
pid2clothes: cpu pids: cuda:1
Traceback (most recent call last):
File "main.py", line 203, in
main(config)
File "main.py", line 142, in main
train_cal(config, epoch, model, classifier, clothes_classifier, criterion_cla, criterion_pair,
File "/home/user/F/CCReID/Simple-CCReID/train.py", line 29, in train_cal
pos_mask = pid2clothes[pids]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1448193) of binary: /home/......

从打印的结果来看pid2clothes和pids的设备不同导致报错,我用的torch1.13,
需要改成:pos_mask = pid2clothes[pids.cpu()]
在此之前一直以为是分布式训练的错误:ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1448193) of binary: /home/....
实际是设备不一致的错误:RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
才可以正常训练,放在这里记录一下。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant