ImageNet Issues on Pascal GPUs #4567

amithr1 · 2017-01-06T16:38:39Z

Hi All,

I was able to compile MXNET on the Pascal GPUs after adding -gencode arch=compute_60,code=compute_60 flags. The system uses cuda 8.0.
I found that when I compile OpenCV with CUDA support turned off, and run ImageNet, I get only 20-25 Images/Sec with two GPUs. I thought that OpenCV was limiting performance so I used OpenCV with CUDA support turned on.
But, when I do that I get seg faults. When I did a back trace, I found that simple functions such as CudaSetDevice() Fail. Attaching the backtrace below. Not sure if it is a bug in OpenCV or MXNET.

#0 0x00003fffb7c9af54 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1 0x00003fff6388d588 in cudbgApiDetach () from /usr/lib/nvidia/libcuda.so.1
#2 0x00003fff638600f8 in cudbgApiDetach () from /usr/lib/nvidia/libcuda.so.1
#3 0x00003fff63886cd0 in cudbgApiDetach () from /usr/lib/nvidia/libcuda.so.1
#4 0x00003fff63972360 in cuVDPAUCtxCreate () from /usr/lib/nvidia/libcuda.so.1
#5 0x00003fff638907c4 in cudbgApiDetach () from /usr/lib/nvidia/libcuda.so.1
#6 0x00003fff638924dc in cudbgApiDetach () from /usr/lib/nvidia/libcuda.so.1
#7 0x00003fff63849368 in cudbgApiDetach () from /usr/lib/nvidia/libcuda.so.1
#8 0x00003fff63744644 in ?? () from /usr/lib/nvidia/libcuda.so.1
#9 0x00003fff638bbd30 in cuInit () from /usr/lib/nvidia/libcuda.so.1
#10 0x00003fff9fdf4b9c in __cudaInitManagedRuntime () from /usr/local/cuda/lib64/libcudart.so.8.0
#11 0x00003fff9fdf7618 in __cudaInitManagedRuntime () from /usr/local/cuda/lib64/libcudart.so.8.0
#12 0x00003fffb7c9fa2c in pthread_once () from /lib64/libpthread.so.0
#13 0x00003fff9fe378c8 in cudaGraphicsVDPAURegisterOutputSurface () from /usr/local/cuda/lib64/libcudart.so.8.0
#14 0x00003fff9fdee9f8 in __cudaInitManagedRuntime () from /usr/local/cuda/lib64/libcudart.so.8.0
#15 0x00003fff9fdf8fa4 in _cudaInitManagedRuntime () from /usr/local/cuda/lib64/libcudart.so.8.0
#16 0x00003fff9fe13760 in cudaSetDevice () from /usr/local/cuda/lib64/libcudart.so.8.0
#17 0x00003fffa22b3924 in mxnet::StorageImpl::ActivateDevice (ctx=...) at src/storage/storage.cc:47
#18 0x00003fffa22b1754 in mxnet::StorageImpl::Alloc (this=0x3fff0c0073d0, size=1204224, ctx=...) at src/storage/storage.cc:95
#19 0x00003fffa14e73c0 in mxnet::NDArray::Chunk::CheckAndAlloc (this=0x111dcea8) at include/mxnet/./ndarray.h:346
#20 0x00003fffa14e731c in mxnet::NDArray::Chunk::Chunk (this=0x111dcea8, size=301056, ctx=..., delay_alloc=false, dtype=0) at include/mxnet/./ndarray.h:341

piiswrong · 2017-01-06T18:23:00Z

don't use GPU enabled OpenCV. It doesn't offer speed up as we don't use opencv's gpu features.

mli · 2017-01-07T20:38:59Z

try --test-io option, it will tell you how fast to read the data:

https://github.com/dmlc/mxnet/tree/master/example/image-classification#speed

amithr1 · 2017-01-10T21:12:37Z

Thanks..I tested it again today with this option. Looks like IO is becoming the bottleneck..

yajiedesign · 2017-09-28T08:05:52Z

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!

yajiedesign closed this as completed Sep 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ImageNet Issues on Pascal GPUs #4567

ImageNet Issues on Pascal GPUs #4567

amithr1 commented Jan 6, 2017

piiswrong commented Jan 6, 2017

mli commented Jan 7, 2017

amithr1 commented Jan 10, 2017

yajiedesign commented Sep 28, 2017

ImageNet Issues on Pascal GPUs #4567

ImageNet Issues on Pascal GPUs #4567

Comments

amithr1 commented Jan 6, 2017

piiswrong commented Jan 6, 2017

mli commented Jan 7, 2017

amithr1 commented Jan 10, 2017

yajiedesign commented Sep 28, 2017