Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Mack-RCNN C++ Deployment Not Working in GPU Mode And CPU Mode #15207

Open
zheshipinyinMc opened this issue Jun 11, 2019 · 20 comments
Open

Mack-RCNN C++ Deployment Not Working in GPU Mode And CPU Mode #15207

zheshipinyinMc opened this issue Jun 11, 2019 · 20 comments

Comments

@zheshipinyinMc
Copy link

zheshipinyinMc commented Jun 11, 2019

1、Train Mask R-CNN with COCO dataset.
2、Test saved model in python is ok.
3、Deploy mask R-CNN with gluoncv c++ deployment, the model is not working in GPU mode and CPU mode.

MXNet: 1.4
System: Ubuntu 16.04
Gluon CV: 0.4.0

errors:
In GPU mode, the error is "incubator-mxnet/cpp-package/include/mxnet-cpp/ndarray.hpp:242: Check failed: MXNDArrayWaitAll() == 0 (-1 vs. 0) : [08:43:52] src/storage/./pooled_storage_manager.h:157: cudaMalloc failed: out of memory".
In CPU mode, the error is "incubator-mxnet/cpp-package/include/mxnet-cpp/ndarray.hpp:242: Check failed: MXNDArrayWaitAll() == 0 (-1 vs. 0) : [08:46:48] src/ndarray/ndarray.cc:752: Check failed: !IsMKLDNNData(): We can't generate TBlob for MKLDNN data. Please use Reorder2Default() to generate a new NDArray first".

My GPU has 6G memory, CPU has 32G memory.

incubator-mxnet make command :
"make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CPP_PACKAGE=1 USE_CUDA=1 USE_MKLDNN=1 USE_CUDNN=1 USE_CUDA_PATH=/usr/local/cuda"

But gluoncv yolov3 is working in GPU mode and CPU mode.
@zhreshold

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: C++

@zheshipinyinMc
Copy link
Author

@pluskid @piiswrong

@zhreshold
Copy link
Member

@zheshipinyinMc For CPU you can try disable MKLDNN in your build see if it works.

For GPU, it's possible that your model may work properly in python imperative mode since network can be inferenced section by section, but in C++ it will allocate all the memory once before execution and you only have 6G gpu memory.
Try reduce the input image size and see if you are able to inference a small input. Let me know what size may fit for 6G memory and we can probably figure out a way to improve it.

@pengzhao-intel
Copy link
Contributor

@zheshipinyinMc this issue in MKLDNN backend should be fixed by #15038 .
Could you try the nightly build (pip install --pre mxnet-mkl)?

@zheshipinyinMc
Copy link
Author

@zhreshold test on server with GPU, the image(1000w591h) needs about 10G memory,the image(500w295h) needs about 6G memory.And everything is ok with CPU mode. But i resize the image(150*150),it is still not working on my computer.

@zheshipinyinMc
Copy link
Author

@zhreshold I just make may incubator-mxnet again with command 'make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CPP_PACKAGE=1 USE_CUDA=1 USE_MKLDNN=0 USE_GPERFTOOLS=1 USE_CUDNN=1 USE_CUDA_PATH=/usr/local/cuda', then the demo can work normally in CPU mode,but is cost 107343 ms (600w655h)(105212ms--->137w150h ).

@zheshipinyinMc
Copy link
Author

zheshipinyinMc commented Jun 13, 2019

@zhreshold
auto ids = exec->outputs[0].Copy(Context(kCPU, 0));
auto scores = exec->outputs[1].Copy(Context(kCPU, 0));
auto bboxes = exec->outputs[2].Copy(Context(kCPU, 0));
if (exec->outputs.size() > 3) {
auto masks = exec->outputs[3].Copy(Context(kCPU, 0));
}

the scores is 1x1x1000 , we can get score by scores.At(0,0, i)
the bboxes is 1x1000x4, we can get box by bboxes.At(0,i, 0),bboxes.At(0,i, 1), bboxes.At(0,i, 2), bboxes.At(0,i, 3)
the masks is 1x1000x14x14, how to get the value of mask? there is no NDArray::At(input1,input2,input3,input4).

@zheshipinyinMc
Copy link
Author

@pengzhao-intel

@pengzhao-intel
Copy link
Contributor

pengzhao-intel commented Jun 13, 2019

@zhreshold I just make may incubator-mxnet again with command 'make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CPP_PACKAGE=1 USE_CUDA=1 USE_MKLDNN=0 USE_GPERFTOOLS=1 USE_CUDNN=1 USE_CUDA_PATH=/usr/local/cuda', then the demo can work normally in CPU mode,but is cost 107343 ms (600w_655h)(105212ms--->137w_150h ).

Please build with USE_MKLDNN=1 USE_GPERFTOOLS=0

@zheshipinyinMc
Copy link
Author

@pengzhao-intel I will try this. Another question:
auto ids = exec->outputs[0].Copy(Context(kCPU, 0));
auto scores = exec->outputs[1].Copy(Context(kCPU, 0));
auto bboxes = exec->outputs[2].Copy(Context(kCPU, 0));
if (exec->outputs.size() > 3) {
auto masks = exec->outputs[3].Copy(Context(kCPU, 0));
}

the scores is 1x1x1000 , we can get score by scores.At(0,0, i)
the bboxes is 1x1000x4, we can get box by bboxes.At(0,i, 0),bboxes.At(0,i, 1), bboxes.At(0,i, 2), bboxes.At(0,i, 3)
the masks is 1x1000x14x14, how to get the value of mask? there is no NDArray::At(input1,input2,input3,input4).

@pengzhao-intel
Copy link
Contributor

@xinyu-intel to help you for this question :)

@zhreshold
Copy link
Member

@zheshipinyinMc

const mx_float *mask_ptr = exec->outputs[3].GetData();
// calculate offset and access the elements

@zheshipinyinMc
Copy link
Author

@zhreshold thanks,but i found that mask values of python deployment and c++ deployment are different.And detected bboxes also have a little deviation。
python:[467.61517 95.62402 820.6834 469.75653]
c++: [460.774 111.568 819.672 453.766]

@zhreshold
Copy link
Member

might due to different input values

@zheshipinyinMc
Copy link
Author

zheshipinyinMc commented Jun 14, 2019

maybe.And how to get middle layer output from gluoncv model.In mxnet model we can get middle layer output like this,just change all_layers[]:
net = edict()
net.ctx = ctx
net.sym, net.arg_params, net.aux_params = mx.model.load_checkpoint(prefix, epoch)
all_layers = net.sym.get_internals()
net.sym = all_layers['fc1_output'] #conv_6dw7_7_batchnorm_output ,fc1_output
net.model = mx.mod.Module(symbol=net.sym, context=net.ctx, label_names = None)
net.model.bind(data_shapes=[('data', (1, 3, image_shape[1], image_shape[2]))])
net.model.set_params(net.arg_params, net.aux_params)

@zheshipinyinMc
Copy link
Author

@pengzhao-intel same error with USE_MKLDNN=1 USE_GPERFTOOLS=0

@xinyu-intel
Copy link
Contributor

@zheshipinyinMc which version of mxnet are you using and can you please give the reproduce method?

@zheshipinyinMc
Copy link
Author

@xinyu-intel
mxnet 1.4.1
gluoncv 0.4.0
Finally i get mask value by "masks.GetData()[index]".But i am still curisous about the command "NDArray::At(size_t c, size_t h, size_t w)" vs "NDArray(index1,index2,index3,index4)".And I found that we can construct NDArray like this:
// construct NDArray from data buffer
NDArray(data_buffer, Shape(1, rgb_image.rows, rgb_image.cols, 3), ctx);
So you can add NDArray::At(index1,index2,index3,index4) ?

@xinyu-intel
Copy link
Contributor

please try pip install mxnet-mkl --pre

@zheshipinyinMc
Copy link
Author

@xinyu-intel thanks.How about the "NDArray::At(size_t c, size_t h, size_t w)" vs "NDArray(index1,index2,index3,index4)".

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants