Skip to content
This repository has been archived by the owner on Nov 21, 2023. It is now read-only.

RuntimeError: [enforce fail at context_gpu.cu:234] #842

Open
JeasonUESTC opened this issue Mar 18, 2019 · 3 comments
Open

RuntimeError: [enforce fail at context_gpu.cu:234] #842

JeasonUESTC opened this issue Mar 18, 2019 · 3 comments

Comments

@JeasonUESTC
Copy link

After i installed Dtectron as a test file with the line: python2 detectron/tests/test_spatial_narrow_as_op.py
,i encounted the following error:
[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
Found Detectron ops lib: /pytorch/build/lib/libcaffe2_detectron_ops_gpu.so
E..

ERROR: test_large_forward (main.SpatialNarrowAsOpTest)

Traceback (most recent call last):
File "detectron/tests/test_spatial_narrow_as_op.py", line 68, in test_large_forward
self._run_test(A, B)
File "detectron/tests/test_spatial_narrow_as_op.py", line 37, in _run_test
workspace.FeedBlob('A', A)
File "/pytorch/build/caffe2/python/workspace.py", line 335, in FeedBlob
return _Workspace_feed_blob(ws, name, arr, device_option)
File "/pytorch/build/caffe2/python/workspace.py", line 694, in _Workspace_feed_blob
return ws.create_blob(name).feed(arr, device_option)
File "/pytorch/build/caffe2/python/workspace.py", line 724, in _Blob_feed
return blob._feed(arg, device_option)
RuntimeError: [enforce fail at context_gpu.cu:234] error == cudaSuccess. 60 vs 0. Error at: /pytorch/caffe2/core/context_gpu.cu:234: peer mapping resources exhausted
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, void const*) + 0x76 (0x7f4036b33ad6 in /pytorch/build/lib/libc10.so)
frame #1: + 0x2345588 (0x7f400877e588 in /pytorch/build/lib/libcaffe2_gpu.so)
frame #2: caffe2::CUDAContext::CUDAContext(caffe2::DeviceOption const&) + 0x145 (0x7f4008780395 in /pytorch/build/lib/libcaffe2_gpu.so)
frame #3: + 0xfd53b (0x7f403948553b in /pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #4: + 0x1001e0 (0x7f40394881e0 in /pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #5: + 0x50d46 (0x7f40393d8d46 in /pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #6: + 0x924b0 (0x7f403941a4b0 in /pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #7: PyEval_EvalFrameEx + 0x9446 (0x4c5326 in python2)
frame #8: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #9: PyEval_EvalFrameEx + 0x58e6 (0x4c17c6 in python2)
frame #10: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #11: PyEval_EvalFrameEx + 0x58e6 (0x4c17c6 in python2)
frame #12: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #13: PyEval_EvalFrameEx + 0x58e6 (0x4c17c6 in python2)
frame #14: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #15: PyEval_EvalFrameEx + 0x58e6 (0x4c17c6 in python2)
frame #16: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #17: PyEval_EvalFrameEx + 0x6076 (0x4c1f56 in python2)
frame #18: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #19: python2() [0x4d57a3]
frame #20: PyObject_Call + 0x3e (0x4a587e in python2)
frame #21: PyEval_EvalFrameEx + 0x263e (0x4be51e in python2)
frame #22: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #23: python2() [0x4d5669]
frame #24: python2() [0x4eef5e]
frame #25: PyObject_Call + 0x3e (0x4a587e in python2)
frame #26: python2() [0x548fc3]
frame #27: PyEval_EvalFrameEx + 0x578d (0x4c166d in python2)
frame #28: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #29: python2() [0x4d57a3]
frame #30: PyObject_Call + 0x3e (0x4a587e in python2)
frame #31: PyEval_EvalFrameEx + 0x263e (0x4be51e in python2)
frame #32: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #33: python2() [0x4d5669]
frame #34: python2() [0x4eef5e]
frame #35: PyObject_Call + 0x3e (0x4a587e in python2)
frame #36: python2() [0x548fc3]
frame #37: PyEval_EvalFrameEx + 0x578d (0x4c166d in python2)
frame #38: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #39: python2() [0x4d57a3]
frame #40: PyObject_Call + 0x3e (0x4a587e in python2)
frame #41: PyEval_EvalFrameEx + 0x263e (0x4be51e in python2)
frame #42: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #43: python2() [0x4d5669]
frame #44: python2() [0x4eef5e]
frame #45: PyObject_Call + 0x3e (0x4a587e in python2)
frame #46: python2() [0x548fc3]
frame #47: PyEval_EvalFrameEx + 0x578d (0x4c166d in python2)
frame #48: PyEval_EvalFrameEx + 0x553f (0x4c141f in python2)
frame #49: PyEval_EvalFrameEx + 0x553f (0x4c141f in python2)
frame #50: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #51: python2() [0x4d5669]
frame #52: python2() [0x4eef5e]
frame #53: python2() [0x4eeb66]
frame #54: python2() [0x4aaafb]
frame #55: PyEval_EvalFrameEx + 0x578d (0x4c166d in python2)
frame #56: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2)
frame #57: python2() [0x4eb69f]
frame #58: PyRun_FileExFlags + 0x82 (0x4e58f2 in python2)
frame #59: PyRun_SimpleFileExFlags + 0x186 (0x4e41a6 in python2)
frame #60: Py_Main + 0x54e (0x4938ce in python2)
frame #61: __libc_start_main + 0xf0 (0x7f403ccf7830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #62: _start + 0x29 (0x493299 in python2)


Ran 3 tests in 13.256s

FAILED (errors=1)
Can you give me a help. Thank you very much.

@Vimos
Copy link

Vimos commented Jun 24, 2019

@xieydd
Copy link

xieydd commented Jul 4, 2019

When i use 5 gpus no error, but when i set 8 gpus , error peer mapping resources exhausted?
And i set num_workers=0;

@lisentao
Copy link

I just set
"export CUDA_VISIBLE_DEVICES=0"
and all work well

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants