You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
........
Model size: 23.51622M
==> Start training pid2clothes: cpu pids: cuda:0
Traceback (most recent call last):
File "main.py", line 203, in
main(config)
File "main.py", line 142, in main
train_cal(config, epoch, model, classifier, clothes_classifier, criterion_cla, criterion_pair,
File "/home/user/F/CCReID/Simple-CCReID/train.py", line 29, in train_cal
pos_mask = pid2clothes[pids]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) pid2clothes: cpu pids: cuda:1
Traceback (most recent call last):
File "main.py", line 203, in
main(config)
File "main.py", line 142, in main
train_cal(config, epoch, model, classifier, clothes_classifier, criterion_cla, criterion_pair,
File "/home/user/F/CCReID/Simple-CCReID/train.py", line 29, in train_cal
pos_mask = pid2clothes[pids]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1448193) of binary: /home/......
从打印的结果来看pid2clothes和pids的设备不同导致报错,我用的torch1.13,
需要改成:pos_mask = pid2clothes[pids.cpu()]
在此之前一直以为是分布式训练的错误:ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1448193) of binary: /home/....
实际是设备不一致的错误:RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
才可以正常训练,放在这里记录一下。
The text was updated successfully, but these errors were encountered:
作者的训练代码 train.py
Simple-CCReID/train.py
Line 28 in f773d01
存在设备不一致的错误,不知道其他人有没有遇到:
从打印的结果来看pid2clothes和pids的设备不同导致报错,我用的torch1.13,
需要改成:
pos_mask = pid2clothes[pids.cpu()]
在此之前一直以为是分布式训练的错误:ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1448193) of binary: /home/....
实际是设备不一致的错误:RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
才可以正常训练,放在这里记录一下。
The text was updated successfully, but these errors were encountered: