When i start two detection task at the same time on the same four gpu, it will report error. #1593

BangguWu · 2019-10-28T08:56:39Z

I modify the channel of configs/faster_rcnn_r50_fpn_1x.py to change the backbone to resnet18, Then I want to run two detection tasks at the same time on the same four gpu, it will report an error.

2019-10-28 04:51:52,751 - INFO - Distributed training: False
2019-10-28 04:51:52,946 - INFO - load model from: pretrained_model/MPNCOV/ResNet18_MPNCOV.pth.tar
Traceback (most recent call last):
File "tools/train.py", line 114, in
main()
File "tools/train.py", line 84, in main
cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/builder.py", line 43, in build_detector
return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/ssd/wubanggu/code/mmdetection/mmdet/models/builder.py", line 15, in build
return build_from_cfg(cfg, registry, default_args)
File "/ssd/wubanggu/code/mmdetection/mmdet/utils/registry.py", line 74, in build_from_cfg
return obj_type(**args)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/detectors/faster_rcnn.py", line 27, in init
pretrained=pretrained)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/detectors/two_stage.py", line 57, in init
self.init_weights(pretrained=pretrained)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/detectors/two_stage.py", line 65, in init_weights
self.backbone.init_weights(pretrained=pretrained)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/backbones/resnet.py", line 485, in init_weights
load_checkpoint(self, pretrained, strict=False, logger=logger)
File "/ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/mmcv-0.2.12-py3.7-linux-x86_64.egg/mmcv/runner/checkpoint.py", line 172, in load_checkpoint
checkpoint = torch.load(filename, map_location=map_location)
File "/ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/serialization.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/serialization.py", line 581, in _load
deserialized_objects[key].set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 79470024 more bytes. The file might be corrupted.
terminate called after throwing an instance of 'c10::Error'
what(): owning_ptr == NullType::singleton() || owning_ptr->refcount.load() > 0 ASSERT FAILED at ../c10/util/intrusive_ptr.h:350, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at ../c10/util/intrusive_ptr.h:350)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6d (0x7f3c19c2033d in /ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x50eb65 (0x7f3c1a165b65 in /ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #2: THStorage_free + 0x25 (0x7f3c1a787945 in /ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #3: + 0x511818 (0x7f3c45b5f818 in /ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

frame #26: __libc_start_main + 0xf0 (0x7f3c54f1b830 in /lib/x86_64-linux-gnu/libc.so.6)

Aborted (core dumped)

footprinthere · 2022-07-20T06:31:11Z

Same error occurred to me. How did you solve this?

BangguWu closed this as completed Nov 9, 2019

FANGAreNotGnu pushed a commit to FANGAreNotGnu/mmdetection that referenced this issue Oct 23, 2023

Add aliases to presets, improve preset documentation (open-mmlab#1593)

3c0cc69

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When i start two detection task at the same time on the same four gpu, it will report error. #1593

When i start two detection task at the same time on the same four gpu, it will report error. #1593

BangguWu commented Oct 28, 2019

footprinthere commented Jul 20, 2022

When i start two detection task at the same time on the same four gpu, it will report error. #1593

When i start two detection task at the same time on the same four gpu, it will report error. #1593

Comments

BangguWu commented Oct 28, 2019

footprinthere commented Jul 20, 2022