You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 21, 2023. It is now read-only.
When I use one GPU to train, there is no problem. But when I use two or four GPUs, the problem come out. The log output:
terminate called after throwing an instance of 'caffe2::EnforceNotMet'
what(): [enforce fail at context_gpu.h:170] . Encountered CUDA error: an illegal memory access was encountered Error from operator:
input: "gpu_0/rpn_cls_logits_fpn2_w_grad" input: "gpu_1/rpn_cls_logits_fpn2_w_grad" output: "gpu_0/rpn_cls_logits_fpn2_w_grad" name: "" type: "Add" device_option { device_type: 1 cuda_gpu_id: 0 }
*** Aborted at 1516866180 (unix time) try "date -d @1516866180" if you are using GNU date ***
terminate called recursively
terminate called recursively
terminate called recursively
PC: @ 0x7ff67559f428 gsignal
terminate called recursively
terminate called recursively
E0125 07:43:00.745853 55683 pybind_state.h:422] Exception encountered running PythonOp function: RuntimeError: [enforce fail at context_gpu.h:307] error == cudaSuccess. 77 vs 0. Error at: /mnt/hzhida/project/caffe2/caffe2/core/context_gpu.h:307: an illegal memory access was encountered