RuntimeError: [enforce fail at context_gpu.cu:234] #842

JeasonUESTC · 2019-03-18T06:48:02Z

After i installed Dtectron as a test file with the line: python2 detectron/tests/test_spatial_narrow_as_op.py
,i encounted the following error:
[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
Found Detectron ops lib: /pytorch/build/lib/libcaffe2_detectron_ops_gpu.so
E..

ERROR: test_large_forward (main.SpatialNarrowAsOpTest)

Traceback (most recent call last):
File "detectron/tests/test_spatial_narrow_as_op.py", line 68, in test_large_forward
self._run_test(A, B)
File "detectron/tests/test_spatial_narrow_as_op.py", line 37, in _run_test
workspace.FeedBlob('A', A)
File "/pytorch/build/caffe2/python/workspace.py", line 335, in FeedBlob
return _Workspace_feed_blob(ws, name, arr, device_option)
File "/pytorch/build/caffe2/python/workspace.py", line 694, in _Workspace_feed_blob
return ws.create_blob(name).feed(arr, device_option)
File "/pytorch/build/caffe2/python/workspace.py", line 724, in _Blob_feed
return blob._feed(arg, device_option)
RuntimeError: [enforce fail at context_gpu.cu:234] error == cudaSuccess. 60 vs 0. Error at: /pytorch/caffe2/core/context_gpu.cu:234: peer mapping resources exhausted
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, void const*) + 0x76 (0x7f4036b33ad6 in /pytorch/build/lib/libc10.so)
frame #1: + 0x2345588 (0x7f400877e588 in /pytorch/build/lib/libcaffe2_gpu.so)
frame #2: caffe2::CUDAContext::CUDAContext(caffe2::DeviceOption const&) + 0x145 (0x7f4008780395 in /pytorch/build/lib/libcaffe2_gpu.so)
frame #3: + 0xfd53b (0x7f403948553b in /pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #4: + 0x1001e0 (0x7f40394881e0 in /pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #5: + 0x50d46 (0x7f40393d8d46 in /pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #6: + 0x924b0 (0x7f403941a4b0 in /pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #7: PyEval_EvalFrameEx + 0x9446 (0x4c5326 in python2)
frame #8: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #9: PyEval_EvalFrameEx + 0x58e6 (0x4c17c6 in python2)
frame #10: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #11: PyEval_EvalFrameEx + 0x58e6 (0x4c17c6 in python2)
frame #12: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #13: PyEval_EvalFrameEx + 0x58e6 (0x4c17c6 in python2)
frame #14: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #15: PyEval_EvalFrameEx + 0x58e6 (0x4c17c6 in python2)
frame #16: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #17: PyEval_EvalFrameEx + 0x6076 (0x4c1f56 in python2)
frame #18: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #19: python2() [0x4d57a3]
frame #20: PyObject_Call + 0x3e (0x4a587e in python2)
frame #21: PyEval_EvalFrameEx + 0x263e (0x4be51e in python2)
frame #22: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #23: python2() [0x4d5669]
frame #24: python2() [0x4eef5e]
frame #25: PyObject_Call + 0x3e (0x4a587e in python2)
frame #26: python2() [0x548fc3]
frame #27: PyEval_EvalFrameEx + 0x578d (0x4c166d in python2)
frame #28: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #29: python2() [0x4d57a3]
frame #30: PyObject_Call + 0x3e (0x4a587e in python2)
frame #31: PyEval_EvalFrameEx + 0x263e (0x4be51e in python2)
frame #32: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #33: python2() [0x4d5669]
frame #34: python2() [0x4eef5e]
frame #35: PyObject_Call + 0x3e (0x4a587e in python2)
frame #36: python2() [0x548fc3]
frame #37: PyEval_EvalFrameEx + 0x578d (0x4c166d in python2)
frame #38: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #39: python2() [0x4d57a3]
frame #40: PyObject_Call + 0x3e (0x4a587e in python2)
frame #41: PyEval_EvalFrameEx + 0x263e (0x4be51e in python2)
frame #42: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #43: python2() [0x4d5669]
frame #44: python2() [0x4eef5e]
frame #45: PyObject_Call + 0x3e (0x4a587e in python2)
frame #46: python2() [0x548fc3]
frame #47: PyEval_EvalFrameEx + 0x578d (0x4c166d in python2)
frame #48: PyEval_EvalFrameEx + 0x553f (0x4c141f in python2)
frame #49: PyEval_EvalFrameEx + 0x553f (0x4c141f in python2)
frame #50: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #51: python2() [0x4d5669]
frame #52: python2() [0x4eef5e]
frame #53: python2() [0x4eeb66]
frame #54: python2() [0x4aaafb]
frame #55: PyEval_EvalFrameEx + 0x578d (0x4c166d in python2)
frame #56: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #57: python2() [0x4eb69f]
frame #58: PyRun_FileExFlags + 0x82 (0x4e58f2 in python2)
frame #59: PyRun_SimpleFileExFlags + 0x186 (0x4e41a6 in python2)
frame #60: Py_Main + 0x54e (0x4938ce in python2)
frame #61: __libc_start_main + 0xf0 (0x7f403ccf7830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #62: _start + 0x29 (0x493299 in python2)

Ran 3 tests in 13.256s

FAILED (errors=1)
Can you give me a help. Thank you very much.

Vimos · 2019-06-24T12:27:29Z

Check https://discuss.pytorch.org/t/runtimeerror-enforce-fail-at-context-gpu/48689 first.

xieydd · 2019-07-04T09:57:24Z

When i use 5 gpus no error, but when i set 8 gpus , error peer mapping resources exhausted?
And i set num_workers=0;

lisentao · 2022-06-12T02:47:11Z

I just set
"export CUDA_VISIBLE_DEVICES=0"
and all work well

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: [enforce fail at context_gpu.cu:234] #842

RuntimeError: [enforce fail at context_gpu.cu:234] #842

JeasonUESTC commented Mar 18, 2019

Vimos commented Jun 24, 2019

xieydd commented Jul 4, 2019

lisentao commented Jun 12, 2022

RuntimeError: [enforce fail at context_gpu.cu:234] #842

RuntimeError: [enforce fail at context_gpu.cu:234] #842

Comments

JeasonUESTC commented Mar 18, 2019

ERROR: test_large_forward (main.SpatialNarrowAsOpTest)

Vimos commented Jun 24, 2019

xieydd commented Jul 4, 2019

lisentao commented Jun 12, 2022