Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined symbol: _ZTIN10tensorflow8OpKernelE #108

Open
GaryWooCN opened this issue Nov 7, 2017 · 20 comments
Open

Undefined symbol: _ZTIN10tensorflow8OpKernelE #108

GaryWooCN opened this issue Nov 7, 2017 · 20 comments

Comments

@GaryWooCN
Copy link

Hi, I am running the master trunk and encounter the error when do training. Could anyone help on this? Thanks.

File "./faster_rcnn/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in
_roi_pooling_module = tf.load_op_library(filename)
File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/usr/local/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: ./faster_rcnn/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE

The ./lib/make.sh is as following:

!/usr/bin/env bash
TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
echo $TF_INC

CUDA_PATH=/usr/local/cuda-8.0/

cd roi_pooling_layer

nvcc -std=c++11 -c -o roi_pooling_op.cu.o roi_pooling_op_gpu.cu.cc
-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_60

if you install tf using already-built binary, or gcc version 4.x, uncomment the two lines below

g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o roi_pooling.so roi_pooling_op.cc
roi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64

for gcc5-built tf

#g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=1 -o roi_pooling.so roi_pooling_op.cc \

roi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64

cd ..

add building psroi_pooling layer

cd psroi_pooling_layer
nvcc -std=c++11 -c -o psroi_pooling_op.cu.o psroi_pooling_op_gpu.cu.cc
-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_60

#g++ -std=c++11 -shared -o psroi_pooling.so psroi_pooling_op.cc \

psroi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64

if you install tf using already-built binary, or gcc version 4.x, uncomment the two lines below

g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o psroi_pooling.so psroi_pooling_op.cc
psroi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64

cd ..

@freeksg66
Copy link

tensorflow/tensorflow#13607
I use this issue and fixed it.

@Kongsea
Copy link

Kongsea commented Dec 6, 2017

I encountered exactly this error too.
Have you solved it now?

@Kongsea
Copy link

Kongsea commented Dec 6, 2017

I have downloaded roi_pooling.so from https://github.com/CharlesShang/TFFRCNN/blob/roi_pooling/lib/roi_pooling_layer/roi_pooling.so and replaced my compiled roi_pooling.so according to @CharlesShang .
It encountered another error:
tensorflow.python.framework.errors_impl.NotFoundError: faster_rcnn/../lib/roi_pooling_layer/roi_pooling.so: invalid ELF header

@Kongsea
Copy link

Kongsea commented Dec 6, 2017

I finally downgraded tensorflow from 1.4 to 1.3 and added -D_GLIBCXX_USE_CXX11_ABI=0, then this problem was solved.

@yh284914425
Copy link

where to add -D_GLIBCXX_USE_CXX11_ABI=0?
and I use tensorflow_gpu-1.4.0-cp27-none-linux_x86_64.whl and my gcc version is 5.4.0 。The ./lib/make.sh is as following.How should the file be modified? Can you help me? Thanks

`#!/usr/bin/env bash
TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
echo $TF_INC

CUDA_PATH=/usr/local/cuda/

cd roi_pooling_layer

nvcc -std=c++11 -c -o roi_pooling_op.cu.o roi_pooling_op_gpu.cu.cc
-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_52

if you install tf using already-built binary, or gcc version 4.x, uncomment the two lines below

#g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o roi_pooling.so roi_pooling_op.cc \

roi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64

for gcc5-built tf

g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc
roi_pooling_op.cu.o -I $TF_INC -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS -D_GLIBCXX_USE_CXX11_ABI=0
-lcudart -L $CUDA_PATH/lib64
cd ..

add building psroi_pooling layer

cd psroi_pooling_layer
nvcc -std=c++11 -c -o psroi_pooling_op.cu.o psroi_pooling_op_gpu.cu.cc
-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_52

g++ -std=c++11 -shared -o psroi_pooling.so psroi_pooling_op.cc
psroi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64

if you install tf using already-built binary, or gcc version 4.x, uncomment the two lines below

#g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o psroi_pooling.so psroi_pooling_op.cc \

psroi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64

cd ..`
@Kongsea

@yh284914425
Copy link

I don't know which places need to be annotated, and those places need to be modified.please help me @Kongsea

@Kongsea
Copy link

Kongsea commented Dec 11, 2017

Downgrade your tensorflow to r1.3.

Try to modify this line
g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc roi_pooling_op.cu.o -I $TF_INC -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS -D_GLIBCXX_USE_CXX11_ABI=0 -lcudart -L $CUDA_PATH/lib64

to

g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o roi_pooling.so roi_pooling_op.cc roi_pooling_op.cu.o -I $TF_INC -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS -D_GLIBCXX_USE_CXX11_ABI=0 -lcudart -L $CUDA_PATH/lib64

@selinachenxi
Copy link

It doesn't have to downgrade to 1.3. I am using 1.4 with gcc 5.4.
In make.sh file, add
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
at the beginning, then add
-L $TF_LIB -ltensorflow_framework
behind -L $CUDA_PATH/lib64
re make, it works.

@zhangweilion
Copy link

@selinachenxi
my tensorflow is 1.4 gcc 5.4
I modify the make.sh , just below, and it doesn't work

TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')

#adding by zw
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
#end adding by zw

TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')

#adding by zw
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
#end adding by zw

CUDA_PATH=/usr/local/cuda/
CXXFLAGS=''

if [[ "$OSTYPE" =~ ^darwin ]]; then
CXXFLAGS+='-undefined dynamic_lookup'
fi

cd roi_pooling_layer

if [ -d "$CUDA_PATH" ]; then
nvcc -std=c++11 -c -o roi_pooling_op.cu.o roi_pooling_op_gpu.cu.cc
-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC $CXXFLAGS
-arch=sm_37

g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc \
	roi_pooling_op.cu.o -I $TF_INC  -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS \
	-lcudart -L $TF_LIB -ltensorflow_framework -L $CUDA_PATH/lib64

else
g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc
-I $TF_INC -fPIC $CXXFLAGS
fi

cd ..

@Kongsea
Copy link

Kongsea commented Feb 22, 2018

This bash works:

#!/usr/bin/env bash
TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
echo $TF_INC

CUDA_PATH=/usr/local/cuda/

cd roi_pooling_layer

nvcc -std=c++11 -c -o roi_pooling_op.cu.o roi_pooling_op_gpu.cu.cc \
	-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_61

g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o roi_pooling.so roi_pooling_op.cc \
	roi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64 -L $TF_LIB -ltensorflow_framework

cd ..

cd psroi_pooling_layer

nvcc -std=c++11 -c -o psroi_pooling_op.cu.o psroi_pooling_op_gpu.cu.cc \
	-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_61

g++ -std=c++11 -shared -D_GLIBCXX_USE_CXX11_ABI=0 -o psroi_pooling.so psroi_pooling_op.cc \
	psroi_pooling_op.cu.o -I $TF_INC -fPIC -lcudart -L $CUDA_PATH/lib64 -L $TF_LIB -ltensorflow_framework

cd ..

@xmeng525
Copy link

xmeng525 commented Feb 21, 2019

I had similar problem because of namespace. I changed my "new_op.cu.cc" from

namespace tensorflow{
// my code
}

to

using namespace tensorflow;
// my code

and it is fixed.

@vllsm
Copy link

vllsm commented Mar 8, 2019

It doesn't have to downgrade to 1.3. I am using 1.4 with gcc 5.4.
In make.sh file, add
TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
at the beginning, then add
-L $TF_LIB -ltensorflow_framework
behind -L $CUDA_PATH/lib64
re make, it works.

THX so much

@leavewave
Copy link

I had similar problem because of namespace. I changed my "new_op.cu.cc" from

namespace tensorflow{
// my code
}

to

using namespace tensorflow;
// my code

and it is fixed.

hi, where is this file? i cannot find it.

@helinwang
Copy link

helinwang commented Mar 11, 2021

I ran into similar issue, the problem was I manually compiled TF and tries to load another TF operator library. The problem is due two the two *.so files are compiled by different ABI.
The fix for me was compiling my custom TF with --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"

E.g.,

bazel build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --config=v2 --copt=-mavx --copt=-msse4.2 //tensorflow/tools/pip_package:build_pip_package 

@ArmageddonKnight
Copy link

Adding this linking option works for me: -Wl,--no-as-needed

Reference: https://stackoverflow.com/questions/48189818/undefined-symbol-ztin10tensorflow8opkernele

@FeiDao7943
Copy link

I just avoid this issue in change version of g++, gcc, TF, and CUDA.
It works on both colab and physical computers.
You can try in this environment, that seems not so reasonable but effective.

Ubuntu 18.04.5 LTS
tensorflow-gpu==1.13.1
numpy==1.16.0 (this might be the key)
gcc (Ubuntu 5.5.0-12ubuntu1) 5.5.0
g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
CUDA 10.0

And "-D_GLIBCXX_USE_CXX11_ABI=0" in the "tf_xxxx_complie.sh" should be deleted

@Brunda02
Copy link

My current environment is
tensorflow-gpu==1.13.1
gcc==7.5.0
CUDA=10.0
I am getting the same error .
Can anyone suggest which environment I should use

@FeiDao7943
Copy link

@Brunda02 :
I hope this list is useful for you, especially the different place with yours. By the way, this environment is tested on the Google Colab and my PC, I am not so sure that it can work on other machine.

List:
Ubuntu 18.04.5 LTS
tensorflow-gpu==1.13.1
numpy==1.16.0 (this might be the key)
gcc (Ubuntu 5.5.0-12ubuntu1) 5.5.0
g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
CUDA 10.0

And "-D_GLIBCXX_USE_CXX11_ABI=0" in the "tf_xxxx_complie.sh" should be deleted

@Brunda02
Copy link

@FeiDao7943 what is tf_xxxx_complie.sh?

@FeiDao7943
Copy link

@Brunda02 tf_xxxx_complie.sh total 3 files. In location: ./frustum-pointnets-master/models/tf_ops/ there are 3 folders, and there is a file named tf_xxxx_complie.sh in each folder, which xxxx is the name of the folder. And each folder just has only one .sh file.

And "-D_GLIBCXX_USE_CXX11_ABI=0" in the tf_xxxx_complie.sh should be deleted, if not exist then ignore it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests