Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test code bug in training #3

Open
chexiaoyu opened this issue Aug 2, 2021 · 3 comments
Open

Test code bug in training #3

chexiaoyu opened this issue Aug 2, 2021 · 3 comments

Comments

@chexiaoyu
Copy link

Thanks for your research!
When I run python scripts/train_rpn_3d.py --config=kitti_3d_base --exp_name base, training is normal, but a bug appear in testing. My torch version is 0.4.1. It looks like a object type error, have you ever met the bug? Thank you!

Epoch:9                                                                                                                                                                                                     
acc/fg: 0.950                                                                                                                                                                                               
acc/bg: 0.997                                                                                                                                                                                               
misc/z: 0.641                                                                                                                                                                                               
misc/ry: 0.311                                                                                                                                                                                              
acc/iou: 0.856                                                                                                                                                                                              
loss/ttloss: 0.551                                                                                                                                                                                          
testing                                                                                                                                                                                                     
  0%|                                                                                                                                                                              | 0/3769 [00:00<?, ?it/s]
Traceback (most recent call last):                                                                                                                                                                          
  File "scripts/train_rpn_3d.py", line 323, in <module>                                                                                                                                                     
    main(args)                                                                                                                                                                                              
  File "scripts/train_rpn_3d.py", line 288, in main                                                                                                                                                         
    iou_3d = test_kitti_3d(dataset_val, rpn_net, conf, results_path, paths.data, writer=writer)                                                                                                             
  File "/**/**/model/M3DSSD-master/lib/rpn_util.py", line 1794, in test_kitti_3d                                                                                                                  
    aboxes = im_detect_3d(im, net, rpn_conf, imobj)                                                                                                                                                         
  File "/**/**/model/M3DSSD-master/lib/rpn_util.py", line 1462, in im_detect_3d                                                                                                                   
    bbox_x3d = bbox_x3d * bbox_stds[0, 4] + bbox_means[0, 4]                                                                                                                                                
RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.DoubleTensor for argument #2 'other'
@chexiaoyu
Copy link
Author

I just fix the bug.Modify rpn_util.py #1444 1445 1446

anchors = torch.from_numpy(rpn_conf.anchors).cuda().float()
bbox_means = torch.from_numpy(rpn_conf.bbox_means).cuda().float()
bbox_stds = torch.from_numpy(rpn_conf.bbox_stds).cuda().float()

Moreover, #1516 sorted_inds = (-aboxes[:, 4]).argsort(), tensor doesn't support argsort(), should use torch.sort().
In the end, how to use multi-gpu to train?

@mumianyuxin
Copy link
Owner

Thanks for your contribution to fixing the bug. I haven't tested multi-gpu training, you can modify the code to support this feature.

@revisitq
Copy link
Collaborator

I just fix the bug.Modify rpn_util.py #1444 1445 1446

anchors = torch.from_numpy(rpn_conf.anchors).cuda().float()
bbox_means = torch.from_numpy(rpn_conf.bbox_means).cuda().float()
bbox_stds = torch.from_numpy(rpn_conf.bbox_stds).cuda().float()

Moreover, #1516 sorted_inds = (-aboxes[:, 4]).argsort(), tensor doesn't support argsort(), should use torch.sort().
In the end, how to use multi-gpu to train?

Hi! To train with multi-gpu, you should modify the code at lib/core/init_training_model to as fllow:

        if 'CUDA_VISIBLE_DEVICES' not in os.environ.keys():
            os.environ['CUDA_VISIBLE_DEVICES'] = '0'
        device_ids = [id for id in range(len(os.environ['CUDA_VISIBLE_DEVICES'].split(',')))]
        network = torch.nn.DataParallel(network, device_ids)
        network.to('cuda')

Then use CUDA_VISIBLE_DEVICES='your gpu device ids' python scripts/train_rpn_3d.py --config=config --exp_name=exp_name to train with multi-gpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants