Mack-RCNN C++ Deployment Not Working in GPU Mode And CPU Mode #15207

zheshipinyinMc · 2019-06-11T01:14:25Z

1、Train Mask R-CNN with COCO dataset.
2、Test saved model in python is ok.
3、Deploy mask R-CNN with gluoncv c++ deployment, the model is not working in GPU mode and CPU mode.

MXNet: 1.4
System: Ubuntu 16.04
Gluon CV: 0.4.0

errors:
In GPU mode, the error is "incubator-mxnet/cpp-package/include/mxnet-cpp/ndarray.hpp:242: Check failed: MXNDArrayWaitAll() == 0 (-1 vs. 0) : [08:43:52] src/storage/./pooled_storage_manager.h:157: cudaMalloc failed: out of memory".
In CPU mode, the error is "incubator-mxnet/cpp-package/include/mxnet-cpp/ndarray.hpp:242: Check failed: MXNDArrayWaitAll() == 0 (-1 vs. 0) : [08:46:48] src/ndarray/ndarray.cc:752: Check failed: !IsMKLDNNData(): We can't generate TBlob for MKLDNN data. Please use Reorder2Default() to generate a new NDArray first".

My GPU has 6G memory, CPU has 32G memory.

incubator-mxnet make command :
"make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CPP_PACKAGE=1 USE_CUDA=1 USE_MKLDNN=1 USE_CUDNN=1 USE_CUDA_PATH=/usr/local/cuda"

But gluoncv yolov3 is working in GPU mode and CPU mode.
@zhreshold

mxnet-label-bot · 2019-06-11T01:14:29Z

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: C++

zheshipinyinMc · 2019-06-11T09:21:18Z

@pluskid @piiswrong

zhreshold · 2019-06-11T17:53:45Z

@zheshipinyinMc For CPU you can try disable MKLDNN in your build see if it works.

For GPU, it's possible that your model may work properly in python imperative mode since network can be inferenced section by section, but in C++ it will allocate all the memory once before execution and you only have 6G gpu memory.
Try reduce the input image size and see if you are able to inference a small input. Let me know what size may fit for 6G memory and we can probably figure out a way to improve it.

pengzhao-intel · 2019-06-11T23:09:24Z

@zheshipinyinMc this issue in MKLDNN backend should be fixed by #15038 .
Could you try the nightly build (pip install --pre mxnet-mkl)?

zheshipinyinMc · 2019-06-12T02:00:54Z

@zhreshold test on server with GPU, the image(1000w591h) needs about 10G memory,the image(500w295h) needs about 6G memory.And everything is ok with CPU mode. But i resize the image(150*150),it is still not working on my computer.

zheshipinyinMc · 2019-06-12T05:57:05Z

@zhreshold I just make may incubator-mxnet again with command 'make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CPP_PACKAGE=1 USE_CUDA=1 USE_MKLDNN=0 USE_GPERFTOOLS=1 USE_CUDNN=1 USE_CUDA_PATH=/usr/local/cuda', then the demo can work normally in CPU mode,but is cost 107343 ms (600w655h)(105212ms--->137w150h ).

zheshipinyinMc · 2019-06-13T10:48:11Z

@zhreshold
auto ids = exec->outputs[0].Copy(Context(kCPU, 0));
auto scores = exec->outputs[1].Copy(Context(kCPU, 0));
auto bboxes = exec->outputs[2].Copy(Context(kCPU, 0));
if (exec->outputs.size() > 3) {
auto masks = exec->outputs[3].Copy(Context(kCPU, 0));
}

the scores is 1x1x1000 , we can get score by scores.At(0,0, i)
the bboxes is 1x1000x4, we can get box by bboxes.At(0,i, 0),bboxes.At(0,i, 1), bboxes.At(0,i, 2), bboxes.At(0,i, 3)
the masks is 1x1000x14x14, how to get the value of mask? there is no NDArray::At(input1,input2,input3,input4).

zheshipinyinMc · 2019-06-13T10:51:37Z

@pengzhao-intel

pengzhao-intel · 2019-06-13T11:01:51Z

@zhreshold I just make may incubator-mxnet again with command 'make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CPP_PACKAGE=1 USE_CUDA=1 USE_MKLDNN=0 USE_GPERFTOOLS=1 USE_CUDNN=1 USE_CUDA_PATH=/usr/local/cuda', then the demo can work normally in CPU mode,but is cost 107343 ms (600w_655h)(105212ms--->137w_150h ).

Please build with USE_MKLDNN=1 USE_GPERFTOOLS=0

zheshipinyinMc · 2019-06-13T12:17:11Z

@pengzhao-intel I will try this. Another question：
auto ids = exec->outputs[0].Copy(Context(kCPU, 0));
auto scores = exec->outputs[1].Copy(Context(kCPU, 0));
auto bboxes = exec->outputs[2].Copy(Context(kCPU, 0));
if (exec->outputs.size() > 3) {
auto masks = exec->outputs[3].Copy(Context(kCPU, 0));
}

the scores is 1x1x1000 , we can get score by scores.At(0,0, i)
the bboxes is 1x1000x4, we can get box by bboxes.At(0,i, 0),bboxes.At(0,i, 1), bboxes.At(0,i, 2), bboxes.At(0,i, 3)
the masks is 1x1000x14x14, how to get the value of mask? there is no NDArray::At(input1,input2,input3,input4).

pengzhao-intel · 2019-06-13T12:19:46Z

@xinyu-intel to help you for this question :)

zhreshold · 2019-06-13T18:07:21Z

@zheshipinyinMc

const mx_float *mask_ptr = exec->outputs[3].GetData();
// calculate offset and access the elements

zheshipinyinMc · 2019-06-14T00:49:18Z

@zhreshold thanks,but i found that mask values of python deployment and c++ deployment are different.And detected bboxes also have a little deviation。
python:[467.61517 95.62402 820.6834 469.75653]
c++: [460.774 111.568 819.672 453.766]

zhreshold · 2019-06-14T01:05:20Z

might due to different input values

zheshipinyinMc · 2019-06-14T01:13:46Z

maybe.And how to get middle layer output from gluoncv model.In mxnet model we can get middle layer output like this,just change all_layers[]:
net = edict()
net.ctx = ctx
net.sym, net.arg_params, net.aux_params = mx.model.load_checkpoint(prefix, epoch)
all_layers = net.sym.get_internals()
net.sym = all_layers['fc1_output'] #conv_6dw7_7_batchnorm_output ,fc1_output
net.model = mx.mod.Module(symbol=net.sym, context=net.ctx, label_names = None)
net.model.bind(data_shapes=[('data', (1, 3, image_shape[1], image_shape[2]))])
net.model.set_params(net.arg_params, net.aux_params)

zheshipinyinMc · 2019-06-14T01:43:18Z

@pengzhao-intel same error with USE_MKLDNN=1 USE_GPERFTOOLS=0

xinyu-intel · 2019-06-14T01:46:57Z

@zheshipinyinMc which version of mxnet are you using and can you please give the reproduce method?

zheshipinyinMc · 2019-06-14T02:02:16Z

@xinyu-intel
mxnet 1.4.1
gluoncv 0.4.0
Finally i get mask value by "masks.GetData()[index]".But i am still curisous about the command "NDArray::At(size_t c, size_t h, size_t w)" vs "NDArray(index1,index2,index3,index4)".And I found that we can construct NDArray like this:
// construct NDArray from data buffer
NDArray(data_buffer, Shape(1, rgb_image.rows, rgb_image.cols, 3), ctx);
So you can add NDArray::At(index1,index2,index3,index4) ?

xinyu-intel · 2019-06-14T02:04:13Z

please try pip install mxnet-mkl --pre

zheshipinyinMc · 2019-06-14T02:06:16Z

@xinyu-intel thanks.How about the "NDArray::At(size_t c, size_t h, size_t w)" vs "NDArray(index1,index2,index3,index4)".

zachgk added C++ Related to C++ Memory Pending Requester Info Python labels Jun 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mack-RCNN C++ Deployment Not Working in GPU Mode And CPU Mode #15207

Mack-RCNN C++ Deployment Not Working in GPU Mode And CPU Mode #15207

zheshipinyinMc commented Jun 11, 2019 •

edited

Loading

mxnet-label-bot commented Jun 11, 2019

zheshipinyinMc commented Jun 11, 2019

zhreshold commented Jun 11, 2019

pengzhao-intel commented Jun 11, 2019

zheshipinyinMc commented Jun 12, 2019

zheshipinyinMc commented Jun 12, 2019

zheshipinyinMc commented Jun 13, 2019 •

edited

Loading

zheshipinyinMc commented Jun 13, 2019

pengzhao-intel commented Jun 13, 2019 •

edited

Loading

zheshipinyinMc commented Jun 13, 2019

pengzhao-intel commented Jun 13, 2019

zhreshold commented Jun 13, 2019

zheshipinyinMc commented Jun 14, 2019

zhreshold commented Jun 14, 2019

zheshipinyinMc commented Jun 14, 2019 •

edited

Loading

zheshipinyinMc commented Jun 14, 2019

xinyu-intel commented Jun 14, 2019

zheshipinyinMc commented Jun 14, 2019

xinyu-intel commented Jun 14, 2019

zheshipinyinMc commented Jun 14, 2019

Mack-RCNN C++ Deployment Not Working in GPU Mode And CPU Mode #15207

Mack-RCNN C++ Deployment Not Working in GPU Mode And CPU Mode #15207

Comments

zheshipinyinMc commented Jun 11, 2019 • edited Loading

mxnet-label-bot commented Jun 11, 2019

zheshipinyinMc commented Jun 11, 2019

zhreshold commented Jun 11, 2019

pengzhao-intel commented Jun 11, 2019

zheshipinyinMc commented Jun 12, 2019

zheshipinyinMc commented Jun 12, 2019

zheshipinyinMc commented Jun 13, 2019 • edited Loading

zheshipinyinMc commented Jun 13, 2019

pengzhao-intel commented Jun 13, 2019 • edited Loading

zheshipinyinMc commented Jun 13, 2019

pengzhao-intel commented Jun 13, 2019

zhreshold commented Jun 13, 2019

zheshipinyinMc commented Jun 14, 2019

zhreshold commented Jun 14, 2019

zheshipinyinMc commented Jun 14, 2019 • edited Loading

zheshipinyinMc commented Jun 14, 2019

xinyu-intel commented Jun 14, 2019

zheshipinyinMc commented Jun 14, 2019

xinyu-intel commented Jun 14, 2019

zheshipinyinMc commented Jun 14, 2019

zheshipinyinMc commented Jun 11, 2019 •

edited

Loading

zheshipinyinMc commented Jun 13, 2019 •

edited

Loading

pengzhao-intel commented Jun 13, 2019 •

edited

Loading

zheshipinyinMc commented Jun 14, 2019 •

edited

Loading