Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When i start two detection task at the same time on the same four gpu, it will report error. #1593

Closed
BangguWu opened this issue Oct 28, 2019 · 1 comment

Comments

@BangguWu
Copy link

I modify the channel of configs/faster_rcnn_r50_fpn_1x.py to change the backbone to resnet18, Then I want to run two detection tasks at the same time on the same four gpu, it will report an error.

2019-10-28 04:51:52,751 - INFO - Distributed training: False
2019-10-28 04:51:52,946 - INFO - load model from: pretrained_model/MPNCOV/ResNet18_MPNCOV.pth.tar
Traceback (most recent call last):
File "tools/train.py", line 114, in
main()
File "tools/train.py", line 84, in main
cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/builder.py", line 43, in build_detector
return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/ssd/wubanggu/code/mmdetection/mmdet/models/builder.py", line 15, in build
return build_from_cfg(cfg, registry, default_args)
File "/ssd/wubanggu/code/mmdetection/mmdet/utils/registry.py", line 74, in build_from_cfg
return obj_type(**args)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/detectors/faster_rcnn.py", line 27, in init
pretrained=pretrained)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/detectors/two_stage.py", line 57, in init
self.init_weights(pretrained=pretrained)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/detectors/two_stage.py", line 65, in init_weights
self.backbone.init_weights(pretrained=pretrained)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/backbones/resnet.py", line 485, in init_weights
load_checkpoint(self, pretrained, strict=False, logger=logger)
File "/ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/mmcv-0.2.12-py3.7-linux-x86_64.egg/mmcv/runner/checkpoint.py", line 172, in load_checkpoint
checkpoint = torch.load(filename, map_location=map_location)
File "/ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/serialization.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/serialization.py", line 581, in _load
deserialized_objects[key].set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 79470024 more bytes. The file might be corrupted.
terminate called after throwing an instance of 'c10::Error'
what(): owning_ptr == NullType::singleton() || owning_ptr->refcount
.load() > 0 ASSERT FAILED at ../c10/util/intrusive_ptr.h:350, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at ../c10/util/intrusive_ptr.h:350)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6d (0x7f3c19c2033d in /ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x50eb65 (0x7f3c1a165b65 in /ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #2: THStorage_free + 0x25 (0x7f3c1a787945 in /ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #3: + 0x511818 (0x7f3c45b5f818 in /ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

frame #26: __libc_start_main + 0xf0 (0x7f3c54f1b830 in /lib/x86_64-linux-gnu/libc.so.6)

Aborted (core dumped)

@BangguWu BangguWu closed this as completed Nov 9, 2019
@footprinthere
Copy link

Same error occurred to me. How did you solve this?

FANGAreNotGnu pushed a commit to FANGAreNotGnu/mmdetection that referenced this issue Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants