You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I modify the channel of configs/faster_rcnn_r50_fpn_1x.py to change the backbone to resnet18, Then I want to run two detection tasks at the same time on the same four gpu, it will report an error.
2019-10-28 04:51:52,751 - INFO - Distributed training: False
2019-10-28 04:51:52,946 - INFO - load model from: pretrained_model/MPNCOV/ResNet18_MPNCOV.pth.tar
Traceback (most recent call last):
File "tools/train.py", line 114, in
main()
File "tools/train.py", line 84, in main
cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/builder.py", line 43, in build_detector
return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/ssd/wubanggu/code/mmdetection/mmdet/models/builder.py", line 15, in build
return build_from_cfg(cfg, registry, default_args)
File "/ssd/wubanggu/code/mmdetection/mmdet/utils/registry.py", line 74, in build_from_cfg
return obj_type(**args)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/detectors/faster_rcnn.py", line 27, in init
pretrained=pretrained)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/detectors/two_stage.py", line 57, in init
self.init_weights(pretrained=pretrained)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/detectors/two_stage.py", line 65, in init_weights
self.backbone.init_weights(pretrained=pretrained)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/backbones/resnet.py", line 485, in init_weights
load_checkpoint(self, pretrained, strict=False, logger=logger)
File "/ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/mmcv-0.2.12-py3.7-linux-x86_64.egg/mmcv/runner/checkpoint.py", line 172, in load_checkpoint
checkpoint = torch.load(filename, map_location=map_location)
File "/ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/serialization.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/serialization.py", line 581, in _load
deserialized_objects[key].set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 79470024 more bytes. The file might be corrupted.
terminate called after throwing an instance of 'c10::Error'
what(): owning_ptr == NullType::singleton() || owning_ptr->refcount.load() > 0 ASSERT FAILED at ../c10/util/intrusive_ptr.h:350, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at ../c10/util/intrusive_ptr.h:350)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6d (0x7f3c19c2033d in /ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x50eb65 (0x7f3c1a165b65 in /ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #2: THStorage_free + 0x25 (0x7f3c1a787945 in /ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #3: + 0x511818 (0x7f3c45b5f818 in /ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #26: __libc_start_main + 0xf0 (0x7f3c54f1b830 in /lib/x86_64-linux-gnu/libc.so.6)
Aborted (core dumped)
The text was updated successfully, but these errors were encountered:
I modify the channel of configs/faster_rcnn_r50_fpn_1x.py to change the backbone to resnet18, Then I want to run two detection tasks at the same time on the same four gpu, it will report an error.
2019-10-28 04:51:52,751 - INFO - Distributed training: False
2019-10-28 04:51:52,946 - INFO - load model from: pretrained_model/MPNCOV/ResNet18_MPNCOV.pth.tar
Traceback (most recent call last):
File "tools/train.py", line 114, in
main()
File "tools/train.py", line 84, in main
cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/builder.py", line 43, in build_detector
return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/ssd/wubanggu/code/mmdetection/mmdet/models/builder.py", line 15, in build
return build_from_cfg(cfg, registry, default_args)
File "/ssd/wubanggu/code/mmdetection/mmdet/utils/registry.py", line 74, in build_from_cfg
return obj_type(**args)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/detectors/faster_rcnn.py", line 27, in init
pretrained=pretrained)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/detectors/two_stage.py", line 57, in init
self.init_weights(pretrained=pretrained)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/detectors/two_stage.py", line 65, in init_weights
self.backbone.init_weights(pretrained=pretrained)
File "/ssd/wubanggu/code/mmdetection/mmdet/models/backbones/resnet.py", line 485, in init_weights
load_checkpoint(self, pretrained, strict=False, logger=logger)
File "/ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/mmcv-0.2.12-py3.7-linux-x86_64.egg/mmcv/runner/checkpoint.py", line 172, in load_checkpoint
checkpoint = torch.load(filename, map_location=map_location)
File "/ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/serialization.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/serialization.py", line 581, in _load
deserialized_objects[key].set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 79470024 more bytes. The file might be corrupted.
terminate called after throwing an instance of 'c10::Error'
what(): owning_ptr == NullType::singleton() || owning_ptr->refcount.load() > 0 ASSERT FAILED at ../c10/util/intrusive_ptr.h:350, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at ../c10/util/intrusive_ptr.h:350)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6d (0x7f3c19c2033d in /ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x50eb65 (0x7f3c1a165b65 in /ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #2: THStorage_free + 0x25 (0x7f3c1a787945 in /ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #3: + 0x511818 (0x7f3c45b5f818 in /ssd/wubanggu/anaconda3/envs/pytorch-1.2/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #26: __libc_start_main + 0xf0 (0x7f3c54f1b830 in /lib/x86_64-linux-gnu/libc.so.6)
Aborted (core dumped)
The text was updated successfully, but these errors were encountered: