Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When use a exported mask rcnn caffe2 model to infer an image, get error [enforce fail at batch_permutation_op.cu:66] X.dim32(0) > 0. 0 vs 0 #1895

Closed
Julymycin opened this issue Aug 12, 2020 · 7 comments
Labels
upstream issues Issues in other libraries

Comments

@Julymycin
Copy link

Instructions To Reproduce the 🐛 Bug:

  1. what changes you made (git diff) or what code you wrote
None
<put diff or code here>
  1. what exact command you run:
cd tools/deploy/ && ./caffe2_converter.py --config-file ../../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml \
--output ./caffe2_model --run-eval \
MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl \
MODEL.DEVICE cuda
#########
cd project_directory
mkdir build && cd build
cmake ..
make -j10
./project_name 
3. what you observed (including __full logs__):

terminate called after throwing an instance of 'c10::Error'
what(): [enforce fail at batch_permutation_op.cu:66] X.dim32(0) > 0. 0 vs 0
Error from operator:
input: "614" input: "609" output: "input.68" name: "" type: "BatchPermutation" device_option { device_type: 1 device_id: 0 }frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, void const*) + 0x67 (0x7f1c2eae5787 in /usr/local/libtorch/lib/libc10.so)
frame #1: caffe2::BatchPermutationOp<float, caffe2::CUDAContext>::RunOnDevice() + 0x440 (0x7f1be3fb3670 in /usr/local/libtorch/lib/libtorch_cuda.so)
frame #2: + 0x35d36e2 (0x7f1be3f6d6e2 in /usr/local/libtorch/lib/libtorch_cuda.so)
frame #3: caffe2::SimpleNet::Run() + 0x196 (0x7f1c1f339336 in /usr/local/libtorch/lib/libtorch_cpu.so)
frame #4: caffe2::Workspace::RunNet(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x8a2 (0x7f1c1f385162 in /usr/local/libtorch/lib/libtorch_cpu.so)
frame #5: newrun(cv::Mat&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x570 (0x4716d3 in ./mhp_parsing)
frame #6: main + 0x269 (0x474ddc in ./mhp_parsing)
frame #7: __libc_start_main + 0xf0 (0x7f1bdf665830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #8: _start + 0x29 (0x46fc09 in ./mhp_parsing)

Aborted (core dumped)

``` 4. please simplify the steps as much as possible so they do not require additional resources to run, such as a private dataset. use the exported model to infer images, get error when image like this item: ![1](https://user-images.githubusercontent.com/24741969/89980596-c836c000-dca4-11ea-93a3-f6f3f35c8a2f.jpg)

Expected behavior:

I have searched the issuses and find one similar: #1580 , but didn't get how to fix this problem.
And when I use some of my finetuned models to infer this image, one works, while others failed.

Environment:

Provide your environment information using the following command:

wget -nc -q https://github.com/facebookresearch/detectron2/raw/master/detectron2/utils/collect_env.py && python collect_env.py

sys.platform linux
Python 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0]
numpy 1.18.5
detectron2 0.2 @/home/qiu/Projects/detectron2/detectron2
Compiler GCC 7.5
CUDA compiler CUDA 10.1
detectron2 arch flags sm_61
DETECTRON2_ENV_MODULE
PyTorch 1.5.1 @/home/qiu/anaconda3/envs/de2/lib/python3.7/site-packages/torch
PyTorch debug build False
GPU available True
GPU 0 GeForce GTX 1080 Ti
CUDA_HOME /usr/local/cuda-10.1
Pillow 7.2.0
torchvision 0.6.0a0+35d732a @/home/qiu/anaconda3/envs/de2/lib/python3.7/site-packages/torchvision
torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75
fvcore 0.1.1.post20200716
cv2 4.3.0

PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.3
Magma 2.5.2
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
If your issue looks like an installation issue / environment issue,
please first try to solve it yourself with the instructions in
https://detectron2.readthedocs.io/tutorials/install.html#common-installation-issues

@Julymycin
Copy link
Author

1
sorry, the example image.

@Julymycin Julymycin changed the title When use a exported mask ecnn caffe2 model to infer an image, get error [enforce fail at batch_permutation_op.cu:66] X.dim32(0) > 0. 0 vs 0 When use a exported mask rcnn caffe2 model to infer an image, get error [enforce fail at batch_permutation_op.cu:66] X.dim32(0) > 0. 0 vs 0 Aug 12, 2020
@ppwwyyxx
Copy link
Contributor

Before pytorch/pytorch#39851, the model throws an exception if no object is detected.

@ppwwyyxx ppwwyyxx added the upstream issues Issues in other libraries label Aug 12, 2020
@Julymycin
Copy link
Author

Julymycin commented Aug 13, 2020

@ppwwyyxx Thanks for you answer!
Therefore, should I use a newer version pytorch to retrain the network, or download a newer version libtorch to build and run the caffe2 model to infer?
And I have try to use libtorch 1.6.0 with protobuf(3.11.x, libtorch 1.6.0 is built with this version of protobuf), and some error occurs at the end of build process,

undefined reference to `google::protobuf::RepeatedPtrField<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::begin() const'
undefined reference to `google::protobuf::RepeatedPtrField<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::end() const'

for the same line code:
for (auto& str : predictNet_.external_input()) {

    Workspace workSpace;
    for (auto& str : predictNet_.external_input()) {
        workSpace.CreateBlob(str);
    }
    CAFFE_ENFORCE(workSpace.CreateNet(predictNet_));
    CAFFE_ENFORCE(workSpace.RunNetOnce(initNet_));

Could you please tell me how to modify the code in caffe2_mask_rcnn.cpp.

And I have seen that you say this error can be catch in #1724 , but I use

try{
        CAFFE_ENFORCE(workSpace.RunNet(predictNet_.name()));
}catch(c10::Error){
    cout << c10::Error << endl;
}

it seems not work. Is there another way to catch it?

By the way, I am a bit confused of the definition of 'no object is detected.', when I use python code to infer the example image and visualize the result, if I reduce the confidence score threshold to 0.1 or lower, there is one or some objects. So, it makes me confused

@ppwwyyxx
Copy link
Contributor

Unfortuantely in PyTorch 1.6 protobuf has to be compiled and linked to the program.
To build it in the docker:

# install the correct version of protobuf:
wget https://github.com/protocolbuffers/protobuf/releases/download/v3.11.4/protobuf-cpp-3.11.4.tar.gz && tar xf protobuf-cpp-3.11.4.tar.gz
cd protobuf-3.11.4
export CXXFLAGS=-D_GLIBCXX_USE_CXX11_ABI=0
./configure --prefix=$HOME/.local && make && make install
export CPATH=$HOME/.local/include
export LIBRARY_PATH=$HOME/.local/lib
export LD_LIBRARY_PATH=$HOME/.local/lib

To link to it:

--- i/tools/deploy/CMakeLists.txt
+++ w/tools/deploy/CMakeLists.txt
@@ -10,7 +10,7 @@ find_package(OpenCV REQUIRED)
 add_executable(caffe2_mask_rcnn caffe2_mask_rcnn.cpp)
 target_link_libraries(
   caffe2_mask_rcnn
-  "${TORCH_LIBRARIES}" gflags glog ${OpenCV_LIBS})
+  "${TORCH_LIBRARIES}" gflags glog protobuf ${OpenCV_LIBS})
 set_property(TARGET caffe2_mask_rcnn PROPERTY CXX_STANDARD 14)

@Julymycin
Copy link
Author

@ppwwyyxx Thanks for your answer, but after I compiled and linked protobuf 3.11.4 to the program, there is still a error output when building it:

[build] hp.cpp:46: undefined reference to `google::protobuf::RepeatedPtrField<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::begin() const'
[build] hp.cpp:46: undefined reference to `google::protobuf::RepeatedPtrField<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::end() const'
[build] collect2: error: ld returned 1 exit status

for this line code:
for (auto& str : predictNet_.external_input()) {

    Workspace workSpace;
    for (auto& str : predictNet_.external_input()) {
        workSpace.CreateBlob(str);
    }
    CAFFE_ENFORCE(workSpace.CreateNet(predictNet_));
    CAFFE_ENFORCE(workSpace.RunNetOnce(initNet_));

And the output for ld -lprotobuf --verbose is:

==================================================
attempt to open //usr/local/lib/x86_64-linux-gnu/libprotobuf.so failed
attempt to open //usr/local/lib/x86_64-linux-gnu/libprotobuf.a failed
attempt to open //lib/x86_64-linux-gnu/libprotobuf.so failed
attempt to open //lib/x86_64-linux-gnu/libprotobuf.a failed
attempt to open //usr/lib/x86_64-linux-gnu/libprotobuf.so failed
attempt to open //usr/lib/x86_64-linux-gnu/libprotobuf.a failed
attempt to open //usr/lib/x86_64-linux-gnu64/libprotobuf.so failed
attempt to open //usr/lib/x86_64-linux-gnu64/libprotobuf.a failed
attempt to open //usr/local/lib64/libprotobuf.so failed
attempt to open //usr/local/lib64/libprotobuf.a failed
attempt to open //lib64/libprotobuf.so failed
attempt to open //lib64/libprotobuf.a failed
attempt to open //usr/lib64/libprotobuf.so failed
attempt to open //usr/lib64/libprotobuf.a failed
attempt to open //usr/local/lib/libprotobuf.so succeeded
-lprotobuf (//usr/local/lib/libprotobuf.so)
libz.so.1 needed by //usr/local/lib/libprotobuf.so
attempt to open /usr/local/lib/libz.so.1 failed
attempt to open //usr/local/cuda-10.1/targets/x86_64-linux/lib/libz.so.1 failed
attempt to open //usr/lib/x86_64-linux-gnu/libfakeroot/libz.so.1 failed
attempt to open //usr/local/lib/i386-linux-gnu/libz.so.1 failed
attempt to open //usr/lib/i386-linux-gnu/libz.so.1 failed
attempt to open //usr/local/lib/i686-linux-gnu/libz.so.1 failed
attempt to open //lib/i686-linux-gnu/libz.so.1 failed
attempt to open //usr/lib/i686-linux-gnu/libz.so.1 failed
attempt to open //usr/local/lib/libz.so.1 failed
attempt to open //usr/local/lib/libz.so.1 failed
attempt to open //usr/local/lib/x86_64-linux-gnu/libz.so.1 failed
found libz.so.1 at //lib/x86_64-linux-gnu/libz.so.1
libstdc++.so.6 needed by //usr/local/lib/libprotobuf.so
attempt to open /usr/local/lib/libstdc++.so.6 failed
attempt to open //usr/local/cuda-10.1/targets/x86_64-linux/lib/libstdc++.so.6 failed
attempt to open //usr/lib/x86_64-linux-gnu/libfakeroot/libstdc++.so.6 failed
attempt to open //usr/local/lib/i386-linux-gnu/libstdc++.so.6 failed
attempt to open //lib/i386-linux-gnu/libstdc++.so.6 failed
attempt to open //usr/local/lib/i686-linux-gnu/libstdc++.so.6 failed
attempt to open //lib/i686-linux-gnu/libstdc++.so.6 failed
attempt to open //usr/lib/i686-linux-gnu/libstdc++.so.6 failed
attempt to open //usr/local/lib/libstdc++.so.6 failed
attempt to open //usr/local/lib/libstdc++.so.6 failed
attempt to open //usr/local/lib/x86_64-linux-gnu/libstdc++.so.6 failed
attempt to open //lib/x86_64-linux-gnu/libstdc++.so.6 failed
found libstdc++.so.6 at //usr/lib/x86_64-linux-gnu/libstdc++.so.6
libc.so.6 needed by //usr/local/lib/libprotobuf.so
attempt to open /usr/local/lib/libc.so.6 failed
attempt to open //usr/local/cuda-10.1/targets/x86_64-linux/lib/libc.so.6 failed
attempt to open //usr/lib/x86_64-linux-gnu/libfakeroot/libc.so.6 failed
attempt to open //usr/local/lib/i386-linux-gnu/libc.so.6 failed
attempt to open //usr/lib/i386-linux-gnu/libc.so.6 failed
attempt to open //usr/local/lib/i686-linux-gnu/libc.so.6 failed
attempt to open //lib/i686-linux-gnu/libc.so.6 failed
attempt to open //usr/lib/i686-linux-gnu/libc.so.6 failed
attempt to open //usr/local/lib/libc.so.6 failed
attempt to open //usr/local/lib/libc.so.6 failed
attempt to open //usr/local/lib/x86_64-linux-gnu/libc.so.6 failed
attempt to open //usr/lib/x86_64-linux-gnu/libc.so.6 failed
attempt to open //usr/local/lib/x86_64-linux-gnu/libc.so.6 failed
attempt to open //usr/lib/x86_64-linux-gnu/libc.so.6 failed
attempt to open //usr/lib/x86_64-linux-gnu64/libc.so.6 failed
attempt to open //usr/local/lib64/libc.so.6 failed
attempt to open //lib64/libc.so.6 failed
attempt to open //usr/lib64/libc.so.6 failed
attempt to open //usr/local/lib/libc.so.6 failed
attempt to open //lib/libc.so.6 failed
attempt to open //usr/lib/libc.so.6 failed
attempt to open //usr/x86_64-linux-gnu/lib64/libc.so.6 failed
attempt to open //usr/x86_64-linux-gnu/lib/libc.so.6 failed
attempt to open /usr/local/lib/libc.so.6 failed
attempt to open //usr/local/cuda-10.1/targets/x86_64-linux/lib/libc.so.6 failed
attempt to open //usr/lib/x86_64-linux-gnu/libfakeroot/libc.so.6 failed
attempt to open //usr/local/lib/i386-linux-gnu/libc.so.6 failed
attempt to open //usr/lib/i386-linux-gnu/libc.so.6 failed
attempt to open //usr/local/lib/i686-linux-gnu/libc.so.6 failed
attempt to open //lib/i686-linux-gnu/libc.so.6 failed
attempt to open //usr/lib/i686-linux-gnu/libc.so.6 failed
attempt to open //usr/local/lib/libc.so.6 failed
attempt to open //usr/local/lib/libc.so.6 failed
attempt to open //usr/local/lib/x86_64-linux-gnu/libc.so.6 failed
found libc.so.6 at //lib/x86_64-linux-gnu/libc.so.6
ld-linux-x86-64.so.2 needed by //usr/local/lib/libprotobuf.so
attempt to open /usr/local/lib/ld-linux-x86-64.so.2 failed
attempt to open //usr/local/cuda-10.1/targets/x86_64-linux/lib/ld-linux-x86-64.so.2 failed
attempt to open //usr/lib/x86_64-linux-gnu/libfakeroot/ld-linux-x86-64.so.2 failed
attempt to open //usr/local/lib/i386-linux-gnu/ld-linux-x86-64.so.2 failed
attempt to open //lib/i386-linux-gnu/ld-linux-x86-64.so.2 failed
attempt to open //usr/lib/i386-linux-gnu/ld-linux-x86-64.so.2 failed
attempt to open //usr/local/lib/i686-linux-gnu/ld-linux-x86-64.so.2 failed
attempt to open //lib/i686-linux-gnu/ld-linux-x86-64.so.2 failed
attempt to open //usr/lib/i686-linux-gnu/ld-linux-x86-64.so.2 failed
attempt to open //usr/local/lib/ld-linux-x86-64.so.2 failed
attempt to open //usr/local/lib/ld-linux-x86-64.so.2 failed
attempt to open //usr/local/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 failed
found ld-linux-x86-64.so.2 at //lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 needed by //usr/local/lib/libprotobuf.so
attempt to open /usr/local/lib/libgcc_s.so.1 failed
attempt to open //usr/local/cuda-10.1/targets/x86_64-linux/lib/libgcc_s.so.1 failed
attempt to open //usr/lib/x86_64-linux-gnu/libfakeroot/libgcc_s.so.1 failed
attempt to open //usr/local/lib/i386-linux-gnu/libgcc_s.so.1 failed
attempt to open //usr/lib/i386-linux-gnu/libgcc_s.so.1 failed
attempt to open //usr/local/lib/i686-linux-gnu/libgcc_s.so.1 failed
attempt to open //lib/i686-linux-gnu/libgcc_s.so.1 failed
attempt to open //usr/lib/i686-linux-gnu/libgcc_s.so.1 failed
attempt to open //usr/local/lib/libgcc_s.so.1 failed
attempt to open //usr/local/lib/libgcc_s.so.1 failed
attempt to open //usr/local/lib/x86_64-linux-gnu/libgcc_s.so.1 failed
found libgcc_s.so.1 at //lib/x86_64-linux-gnu/libgcc_s.so.1
libm.so.6 needed by //usr/lib/x86_64-linux-gnu/libstdc++.so.6
attempt to open /usr/local/lib/libm.so.6 failed
attempt to open //usr/local/cuda-10.1/targets/x86_64-linux/lib/libm.so.6 failed
attempt to open //usr/lib/x86_64-linux-gnu/libfakeroot/libm.so.6 failed
attempt to open //usr/local/lib/i386-linux-gnu/libm.so.6 failed
attempt to open //usr/lib/i386-linux-gnu/libm.so.6 failed
attempt to open //usr/local/lib/i686-linux-gnu/libm.so.6 failed
attempt to open //lib/i686-linux-gnu/libm.so.6 failed
attempt to open //usr/lib/i686-linux-gnu/libm.so.6 failed
attempt to open //usr/local/lib/libm.so.6 failed
attempt to open //usr/local/lib/libm.so.6 failed
attempt to open //usr/local/lib/x86_64-linux-gnu/libm.so.6 failed
found libm.so.6 at //lib/x86_64-linux-gnu/libm.so.6
ld: warning: cannot find entry symbol _start; not setting start address

@ppwwyyxx
Copy link
Contributor

The above command works in the official docker. In your environment it may need export CXXFLAGS=-D_GLIBCXX_USE_CXX11_ABI=1 instead of 0.

@Julymycin
Copy link
Author

@ppwwyyxx Thanks, it works!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
upstream issues Issues in other libraries
Projects
None yet
Development

No branches or pull requests

2 participants