This repository has been archived by the owner on Nov 21, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
multi GPU running Erro #42
Labels
Comments
@moyans: this type of error is difficult to reproduce. Can you provide the following information which may help us:
Thanks. |
@rbgirshick thx , here's my information: |
@rbgirshick i got the same problem. My imformation is same to @moyans . |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi thx for your excellent work.When I set "NUM_GPUS: 2" runing "e2e_mask_rcnn_R-50-C4_1x" get this erro:
INFO detector.py: 434: Changing learning rate 0.000000 -> 0.003333 at iter 0
I0126 09:53:34.445952 22227 context_gpu.cu:321] GPU 0: 428 MB
I0126 09:53:34.445999 22227 context_gpu.cu:321] GPU 1: 502 MB
I0126 09:53:34.446020 22227 context_gpu.cu:325] Total: 931 MB
I0126 09:53:34.452062 22230 context_gpu.cu:321] GPU 0: 428 MB
I0126 09:53:34.452098 22230 context_gpu.cu:321] GPU 1: 649 MB
I0126 09:53:34.452108 22230 context_gpu.cu:325] Total: 1078 MB
I0126 09:53:34.457942 22227 context_gpu.cu:321] GPU 0: 428 MB
I0126 09:53:34.457973 22227 context_gpu.cu:321] GPU 1: 781 MB
I0126 09:53:34.457991 22227 context_gpu.cu:325] Total: 1209 MB
I0126 09:53:34.462600 22227 context_gpu.cu:321] GPU 0: 428 MB
I0126 09:53:34.462627 22227 context_gpu.cu:321] GPU 1: 913 MB
I0126 09:53:34.462646 22227 context_gpu.cu:325] Total: 1341 MB
I0126 09:53:34.469861 22227 context_gpu.cu:321] GPU 0: 428 MB
I0126 09:53:34.469885 22227 context_gpu.cu:321] GPU 1: 1059 MB
I0126 09:53:34.469903 22227 context_gpu.cu:325] Total: 1488 MB
I0126 09:53:34.476531 22233 context_gpu.cu:321] GPU 0: 428 MB
I0126 09:53:34.476557 22233 context_gpu.cu:321] GPU 1: 1191 MB
I0126 09:53:34.476568 22233 context_gpu.cu:325] Total: 1620 MB
I0126 09:53:34.484189 22227 context_gpu.cu:321] GPU 0: 428 MB
I0126 09:53:34.484215 22227 context_gpu.cu:321] GPU 1: 1323 MB
I0126 09:53:34.484221 22227 context_gpu.cu:325] Total: 1751 MB
I0126 09:53:34.493640 22227 context_gpu.cu:321] GPU 0: 468 MB
I0126 09:53:34.493661 22227 context_gpu.cu:321] GPU 1: 1440 MB
I0126 09:53:34.493669 22227 context_gpu.cu:325] Total: 1909 MB
I0126 09:53:34.504007 22226 context_gpu.cu:321] GPU 0: 488 MB
I0126 09:53:34.504045 22226 context_gpu.cu:321] GPU 1: 1552 MB
I0126 09:53:34.504053 22226 context_gpu.cu:325] Total: 2041 MB
I0126 09:53:34.509182 22226 context_gpu.cu:321] GPU 0: 615 MB
I0126 09:53:34.509219 22226 context_gpu.cu:321] GPU 1: 1574 MB
I0126 09:53:34.509229 22226 context_gpu.cu:325] Total: 2190 MB
I0126 09:53:34.515350 22226 context_gpu.cu:321] GPU 0: 712 MB
I0126 09:53:34.515370 22226 context_gpu.cu:321] GPU 1: 1626 MB
I0126 09:53:34.515381 22226 context_gpu.cu:325] Total: 2338 MB
I0126 09:53:34.521685 22226 context_gpu.cu:321] GPU 0: 810 MB
I0126 09:53:34.521703 22226 context_gpu.cu:321] GPU 1: 1677 MB
I0126 09:53:34.521711 22226 context_gpu.cu:325] Total: 2487 MB
I0126 09:53:34.526744 22232 context_gpu.cu:321] GPU 0: 913 MB
I0126 09:53:34.526764 22232 context_gpu.cu:321] GPU 1: 1713 MB
I0126 09:53:34.526770 22232 context_gpu.cu:325] Total: 2627 MB
I0126 09:53:34.534111 22227 context_gpu.cu:321] GPU 0: 986 MB
I0126 09:53:34.534143 22227 context_gpu.cu:321] GPU 1: 1779 MB
I0126 09:53:34.534152 22227 context_gpu.cu:325] Total: 2766 MB
I0126 09:53:34.540236 22226 context_gpu.cu:321] GPU 0: 1054 MB
I0126 09:53:34.540256 22226 context_gpu.cu:321] GPU 1: 1839 MB
I0126 09:53:34.540262 22226 context_gpu.cu:325] Total: 2894 MB
I0126 09:53:34.561064 22226 context_gpu.cu:321] GPU 0: 1179 MB
I0126 09:53:34.561101 22226 context_gpu.cu:321] GPU 1: 1845 MB
I0126 09:53:34.561110 22226 context_gpu.cu:325] Total: 3025 MB
I0126 09:53:34.571882 22226 context_gpu.cu:321] GPU 0: 1228 MB
I0126 09:53:34.571910 22226 context_gpu.cu:321] GPU 1: 1927 MB
I0126 09:53:34.571919 22226 context_gpu.cu:325] Total: 3156 MB
I0126 09:53:34.584810 22226 context_gpu.cu:321] GPU 0: 1391 MB
I0126 09:53:34.584846 22226 context_gpu.cu:321] GPU 1: 1927 MB
I0126 09:53:34.584856 22226 context_gpu.cu:325] Total: 3318 MB
F0126 09:53:34.613024 22226 context_gpu.cu:387] Error at: /home/moyan/caffe2/caffe2/caffe2/core/context_gpu.cu:387: an illegal memory access was encountered
*** Check failure stack trace: ***
terminate called recursively
terminate called after throwing an instance of '*** Aborted at 1516931614 (unix time) try "date -d @1516931614" if you are using GNU date ***
caffe2::EnforceNotMet'
what(): [enforce fail at context_gpu.h:170] . Encountered CUDA error: an illegal memory access was encountered Error from operator:
input: "gpu_0/rpn_cls_logits_w_grad" input: "gpu_1/rpn_cls_logits_w_grad" output: "gpu_0/rpn_cls_logits_w_grad" name: "" type: "Add" device_option { device_type: 1 cuda_gpu_id: 0 }
@ 0x7f71a16785cd google::LogMessage::Fail()
PC: @ 0x7f71b5133428 gsignal
@ 0x7f71a167a433 google::LogMessage::SendToLog()
*** SIGABRT (@0x3e800005153) received by PID 20819 (TID 0x7f705ffff700) from PID 20819; stack trace: ***
@ 0x7f71a167815b google::LogMessage::Flush()
@ 0x7f71b5be9390 (unknown)
@ 0x7f71a167ae1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f71b5133428 gsignal
@ 0x7f71b513502a abort
@ 0x7f71ae550b39 __gnu_cxx::__verbose_terminate_handler()
@ 0x7f71ae54f1fb __cxxabiv1::__terminate()
@ 0x7f71ae54f234 std::terminate()
@ 0x7f71ae56ac8a execute_native_thread_routine_compat
@ 0x7f71b5bdf6ba start_thread
@ 0x7f71a1e87085 caffe2::CUDAContext::Delete()
@ 0x7f71b52053dd clone
@ 0x7f71a1dafe96 caffe2::Tensor<>::ResizeLike<>()
@ 0x0 (unknown)
When I set "NUM_GPUS: 1" runing "e2e_mask_rcnn_R-50-C4_1x" is well .
The text was updated successfully, but these errors were encountered: