Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenCL][Mxnet] Error: Entry function 'fuse_conv2d_relu_7__kernel2' uses too much shared data #1302

Closed
nguyenducthinhdl opened this issue Jun 20, 2018 · 4 comments

Comments

@nguyenducthinhdl
Copy link

nguyenducthinhdl commented Jun 20, 2018

Dear contributors,

I met an issue related to the OpenCL targeting with mxnet model on TVM processing:

This is message error:

[14:32:18] src/nnvm/legacy_json_util.cc:190: Loading symbol saved by previous version v0.12.0. Attempting to upgrade...
[14:32:18] src/nnvm/legacy_json_util.cc:198: Symbol successfully upgraded!
model compiled.
[14:32:23] src/runtime/opencl/opencl_device_api.cc:235: Initialize OpenCL platform 'NVIDIA CUDA '
[14:32:24] src/runtime/opencl/opencl_device_api.cc:260: opencl(0)='GeForce GTX 750 Ti ' cl_device_id=0x83dde60
Traceback (most recent call last):
File "/home/abc/work/code.py", line 81, in
m.run()
File "/usr/local/lib/python3.5/dist-packages/tvm-0.4.0-py3.5-linux-x86_64.egg/tvm/contrib/graph_runtime.py", line 113, in run
self._run()
File "/usr/local/lib/python3.5/dist-packages/tvm-0.4.0-py3.5-linux-x86_64.egg/tvm/_ffi/_ctypes/function.py", line 183, in call
ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
File "/usr/local/lib/python3.5/dist-packages/tvm-0.4.0-py3.5-linux-x86_64.egg/tvm/_ffi/base.py", line 66, in check_call
raise TVMError(py_str(_LIB.TVMGetLastError()))
tvm._ffi.base.TVMError: [14:32:26] src/runtime/module_util.cc:52: Check failed: ret == 0 (-1 vs. 0) [14:32:26] src/runtime/opencl/opencl_module.cc:141: OpenCL build error for device=0x83dde60ptxas error : Entry function 'fuse_conv2d_relu_9__kernel2' uses too much shared data (0x1ca64 bytes, 0xc000 max)
ptxas error : Entry function 'fuse_conv2d_relu_7__kernel2' uses too much shared data (0xda44 bytes, 0xc000 max)

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.5/dist-packages/tvm-0.4.0-py3.5-linux-x86_64.egg/tvm/libtvm.so(dmlc::StackTraceabi:cxx11+0x5a) [0x7f61e38f8b7a]
[bt] (1) /usr/local/lib/python3.5/dist-packages/tvm-0.4.0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x5b26ae) [0x7f61e3cea6ae]
[bt] (2) /usr/local/lib/python3.5/dist-packages/tvm-0.4.0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x607127) [0x7f61e3d3f127]
[bt] (3) /usr/local/lib/python3.5/dist-packages/tvm-0.4.0-py3.5-linux-x86_64.egg/tvm/libtvm.so(+0x6055f7) [0x7f61e3d3d5f7]
[bt] (4) /usr/local/lib/python3.5/dist-packages/tvm-0.4.0-py3.5-linux-x86_64.egg/tvm/libtvm.so(TVMFuncCall+0x5e) [0x7f61e3cd7bde]
[bt] (5) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7f622aae9e20]
[bt] (6) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call+0x2eb) [0x7f622aae988b]
[bt] (7) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(_ctypes_callproc+0x49a) [0x7f622aae401a]
[bt] (8) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(+0x9fcb) [0x7f622aad7fcb]
[bt] (9) /usr/bin/python3.5(PyObject_Call+0x47) [0x5c1797]

Have you got any idea to resolve this trouble?

Thanks a lot

@tqchen
Copy link
Member

tqchen commented Jun 20, 2018

please use https://discuss.tvm.ai/ for general support questions

@tqchen tqchen closed this as completed Jun 20, 2018
@FrozenGene
Copy link
Member

FrozenGene commented Jun 21, 2018

I also encounter this problem when I run one complex model. So I want to answer here, because I don't find you ask this question in discuss.tvm.ai. I have pointed out what's wrong of OpenCL kernel code. My wrong OpenCL kernel code is:

__local float pad_temp_global_global_shared[33800];

I think your model should also have similar OpenCL code. The problem is 33800. NV only allows shared data's size 48K and Intel's GPU allows 64K. You can call clinfo command to check it out.

You can refer this: #525 When I tune as the suggestion, I can reduce the array size below 32K. I think it can also help you.

@nguyenducthinhdl
Copy link
Author

Many thanks for you @FrozenGene ,

I will refer your solution.

Regards

@ndcuong91
Copy link

@FrozenGene thanks for your information. But i think the fuse function here use too much memory (~111kb) so we need to reduce array size as ur suggestion. Can you share the way you tune your model in detail?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants