Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to run demo.py with undefined symbol,how can I solve this problem #232

Open
soldatjiang opened this issue Nov 6, 2017 · 16 comments

Comments

@soldatjiang
Copy link

soldat@soldat:~/Program/Faster-RCNN_TF$ python ./tools/demo.py --model ./data/faster_rcnn_models/VGG16_faster_rcnn_final.caffemodel
Traceback (most recent call last):
File "./tools/demo.py", line 11, in
from networks.factory import get_network
File "/home/soldat/Program/Faster-RCNN_TF/tools/../lib/networks/init.py", line 8, in
from .VGGnet_train import VGGnet_train
File "/home/soldat/Program/Faster-RCNN_TF/tools/../lib/networks/VGGnet_train.py", line 2, in
from networks.network import Network
File "/home/soldat/Program/Faster-RCNN_TF/tools/../lib/networks/network.py", line 3, in
import roi_pooling_layer.roi_pooling_op as roi_pool_op
File "/home/soldat/Program/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in
_roi_pooling_module = tf.load_op_library(filename)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: /home/soldat/Program/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE

@awilliamson
Copy link

I am getting the same issue. Perhaps tensorflow/tensorflow#13607 is related?

@apennisi
Copy link

I was not able to solve the problem, and you?

@awilliamson
Copy link

@apennisi It was the culmination of a few days worth of bashing my head against a wall and collating from many sources on the fly.
I have my fork with 2to3 conversion. ( Which is what I presume caused your issue ). Specifically most changes were Makefile changes. ( here )

  1. Ensure the CUDA_PATH at the top is changed to your path, or alternatively replace it in-line. This way the CUDA section gets executed.
  2. Define TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())') and modify the g++ call to include -L$TF_LIB -ltensorflow_framework
  3. The default arch set by this repository does not account for 10-series cards. I was running on a few GTX Titan XP ( and 1080 ). Therefore I set -arch=sm_61.

I'm not claiming this will fix your issues. You may then encounter issues when running the demo. This is due to encoding issues caused by the 2to3 conversion. The solution to these was a combination of eragonruan/text-detection-ctpn and CharlesShang/TFFRCNN

This may be a little beyond scope of your original error, but I believe the cause was your attempt at 2to3 conversion, alongside the MakeFile issues with your system. If you could feedback on any of the above steps, this would be very useful; additionally, this may provide a singular location for others who like us were struggling with errors.

@apennisi
Copy link

@awilliamson I already tried all these fixes without success..I receive always that error. I already converted from python2 to python3 and on my macbook (cpu) works. I am trying on a server with a Tesla TK80 and I have such an error. Do you have any other suggestions?

@awilliamson
Copy link

@apennisi Not quite sure without more information regarding your environment etc. It does sound odd, as the fix for your specific undefined symbol is TF_LIB linking in step 2. You shouldn't be getting that error on a CPU only implementation to my knowledge (ensure you pass the cpu only flag to Faster-RCNN). Additionally for a K80, it is a different architecture. This article shows some of the sm_XX codes for various cards and their respective CUDA variants.
I admit, it is a hard problem to solve, and took me a day or two to collate enough information to solve it for my specific platform. Feel free to e-mail me on my institutional e-mail address ( shouldn't be hard to find / figure out ;) ) if you want to discuss this further. If we can figure out your problem, then it might be suitable to respond here once found.

@apennisi
Copy link

Of course, I change the architecture! Did you change something else?

@ambr89
Copy link

ambr89 commented Dec 15, 2017

I solve it,

I downgraded tensorflow to 1.3

I've change demo.py
I've GTX 1080 Ti.
at line 114
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

but your 2° step for me doesn't work, in make.sh

g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc
roi_pooling_op.cu.o -I $TF_INC -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS
-D_GLIBCXX_USE_CXX11_ABI=0 -lcudart -L $CUDA_PATH/lib64 -L $TF_LIB -ltensorflow_framework

/usr/bin/ld: cannot find -ltensorflow_framework
collect2: error: ld returned 1 exit status

@trikim
Copy link

trikim commented Dec 31, 2017

I think the problem is that your tensorflow version is too high.
My cuda version is 8.0.
My cudnn version is 6.0.
At the first time, I used "pip install --user tensorflow-gpu" to install tensorflow whose version is 1.4.1.
So I met the same problem said above.
At the second time, I downloaded the "Linux GPU: Python 2" package from https://github.com/tensorflow/tensorflow. And finished the installation by "pip install tf_nightly_gpu-1.head-cp27-none-linux_x86_64.whl". This time the tensorflow version changed to 1.4.0-dev20170920.
In Faster-RCNN_TF/lib, before "make", I edited the file:~/.local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/platform/default/mutex.h by the reference of #245
At last, I succeed to run the demo.py.
python ./tools/demo.py --model ./models/VGGnet_fast_rcnn_iter_70000.ckpt

@xtanitfy
Copy link

xtanitfy commented Feb 8, 2018

awilliamson is right! I use his way and solved the problem .
add this compile flag:
LIBS_FLGAS=-L/usr/local/lib/python2.7/dist-packages/tensorflow -ltensorflow_framework

@wtliao
Copy link

wtliao commented Mar 1, 2018

@awilliamson Hi, thanks for your solution. But it does not work for me. I encountered the new issues as:

tensorflow.python.framework.errors_impl.NotFoundError: /home/wtliao/work_space/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumES3
only a little different. Could you help me? thanks

@wtliao
Copy link

wtliao commented Mar 2, 2018

@awilliamson the only way i can fix this problem is to use tf1.3+cuda8.0+cudnn6.0... so sad

@ChanChiChoi
Copy link

ChanChiChoi commented Jul 9, 2018

my environment is: cuda 9.0 ; tensorflow 1.8.0. python3.6
this is my solution, just change:

    g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc \
        roi_pooling_op.cu.o -I $TF_INC  -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS \
        -lcudart -L $CUDA_PATH/lib64

to

    TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
    g++ -std=c++11 -shared  -o roi_pooling.so  roi_pooling_op.cc  \
         roi_pooling_op.cu.o -I $TF_INC  -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS  \
         -lcudart -L $CUDA_PATH/lib64  -L $TF_LIB -ltensorflow_framework

@cfh3c
Copy link

cfh3c commented Jul 10, 2018

You can use both include and lib to solve it:
TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')

nvcc -std=c++11 -c -o roi_pooling_op_gpu.cu.o roi_pooling_op_gpu.cu.cc
-I $TF_INC -L $TF_LIB -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC $CXXFLAGS

g++ -std=c++11 -D_GLIBCXX_USE_CXX11_ABI=0 -shared -o ./build/roi_pooling.so roi_pooling_op.cc
roi_pooling_op_gpu.cu.o -I $TF_INC -fPIC $CXXFLAGS -D_GLIBCXX_USE_CXX11_ABI=0 -lcudart -L $CUDA_HOME/lib64 -L $TF_LIB -ltensorflow_framework

rm -rf roi_pooling_op_gpu.cu.o

@chenyanyin
Copy link

chenyanyin commented Jul 9, 2019

my environment is: cuda 9.0 ; tensorflow 1.8.0. python3.6
this is my solution, just change:

    g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc \
        roi_pooling_op.cu.o -I $TF_INC  -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS \
        -lcudart -L $CUDA_PATH/lib64

to

    TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
    g++ -std=c++11 -shared  -o roi_pooling.so  roi_pooling_op.cc  \
         roi_pooling_op.cu.o -I $TF_INC  -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS  \
         -lcudart -L $CUDA_PATH/lib64  -L $TF_LIB -ltensorflow_framework

hello, my envs is same with you ,that is cuda 9.0 too, but i got a erro with you said:
erro is:

ImportError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcusolver.so.8.0: cannot open shared object file: No such file or directory

@emilyfy
Copy link

emilyfy commented Aug 2, 2019

@ambr89 I got the same error as you, compiling with -ltensorflow_framework didn't work. I tried to look for libtensorflow_framework.so and couldn't find it but found libtensorflow_framework.so.1 instead inside /usr/local/lib/python2.7/dist-packages/tensorflow. So I made a copy called libtensorflow_framework.so and that fixed it. Hope that helps!

@lijf138
Copy link

lijf138 commented Jun 22, 2020

my error @soldatjiang same error:

tensorflow.python.framework.errors_impl.NotFoundError: /home/ii/app/Faster-RCNN_TF-master/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE

@awilliamson Hope your helps!!
my environment is: cuda 9.0 ; cudnn7.1.2 tensorflow 1.10.0 python3.5.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests