undefined symbol in roi_pooling_layer #90

chaochaow · 2017-02-24T02:02:54Z

I followed the instruction and try to run the training script on CPU but got the following error msg. It looks like it has some issues to load the roi_pooling.so library. Does anybody encounter this problem? I am using Tensorflow 0.11.0.

./experiments/scripts/faster_rcnn_end2end.sh cpu 0 VGG16 pascal_voc

Traceback (most recent call last):
File "./tools/train_net.py", line 16, in
from networks.factory import get_network
File "/home/Projects/Faster-RCNN_TF/tools/../lib/networks/init.py", line 8, in
from .VGGnet_train import VGGnet_train
File "/home/Projects/Faster-RCNN_TF/tools/../lib/networks/VGGnet_train.py", line 2, in
from networks.network import Network
File "/home/Projects/Faster-RCNN_TF/tools/../lib/networks/network.py", line 3, in
import roi_pooling_layer.roi_pooling_op as roi_pool_op
File "/home/Projects/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in
_roi_pooling_module = tf.load_op_library(filename)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 63, in load_op_library
raise errors._make_specific_exception(None, None, error_msg, error_code)
tensorflow.python.framework.errors.NotFoundError: /home/Projects/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE

showkeyjar · 2017-02-26T09:44:06Z

the same error.

Peratham · 2017-02-28T02:06:54Z

same here.

Peratham · 2017-02-28T02:34:03Z

I set the cuda path in lib/make.sh and it works for me.

chaochaow · 2017-02-28T05:23:02Z

thanks for the information, @Peratham. Do you run the training script with the GPU version?

chaochaow · 2017-03-01T00:15:34Z

@Peratham : I tried GPU version today and it gave an error stating roi_pooling.o can not be found when trying to build cython module in lib.

Peratham · 2017-03-01T03:18:52Z

Yes, I ran it with the GPU version.

I also set nvcc argument based on cuda compatibility on the card that I am using. But I think the default value works on most GPUs anyway. So this may not be the case.

Peratham · 2017-03-01T03:29:49Z

Also, do you have eigen in your system and the system path?

XinliangZhu · 2017-03-01T03:42:09Z

@Peratham Hi Peratham, when I run the demo.py, I got the problem of "ImportError: No module named gpu_nms". I checked the FRCN_ROOT/lib/nms, and there's no gpu_nms.so. So I guess it maybe the problem of building Cython. I used CUDA 8.0, Anaconda with python 2.7 and tensorflow 1.0. Do you have any idea? Thank you!

Peratham · 2017-03-01T04:17:03Z

@XinliangZhu Have you run make.sh in the /lib directory? What is the output when you run it? Are there any errors?

XinliangZhu · 2017-03-01T04:21:24Z

@Peratham Yes, I have run it. No error occurred.

XinliangZhu · 2017-03-01T04:51:53Z

@Peratham I have solved the problem of generating gpu_nms.so by revising cudaconfig = {'home':home, 'nvcc':nvcc, 'include': pjoin(home, 'include'), 'lib64': pjoin(home, 'lib64')} as cudaconfig = {'home':home, 'nvcc':nvcc, 'include': pjoin(home, 'include'), 'lib64': pjoin(home, 'lib')}
However, it occurs new problem just like chaochaow's. By the way, I am using gpu.

Peratham · 2017-03-01T05:13:19Z

@XinliangZhu Is your machine 32 or 64 bits?

chaochaow · 2017-03-01T05:24:04Z

Found a similar discussion here. #50

I still have problem with CPU version after adding -D_GLIBCXX_USE_CXX11_ABI=0. Will try GPU version tomorrow.

XinliangZhu · 2017-03-01T05:24:30Z

@Peratham Thanks! Mine is 64 bits. I have completely solved the problem. To summarize, I first change the cudaconfig in /lib/setup.py as above. This will guarantee the generation of gpu_nms.so. Then I revise the CUDA_PATH and add -D_GLIBCXX_USE_CXX11_ABI=0 in /lib/make.sh. Just followed your instruction and CharlesShang/TFFRCNN#2. Thanks!

chaochaow · 2017-03-01T05:30:01Z

@Peratham @XinliangZhu now I have the same problem as CharlesShang/TFFRCNN#2 as nsivaramakrishnan posted in cpu version.

Looks like there is no solution for cpu version yet according to #36

XinliangZhu · 2017-03-01T17:23:40Z

@chaochaow Based on my experience, this may be caused by the mismatched g++ version. So is your g++ 5.* and where did you add -D_GLIBCXX_USE_CXX11_ABI=0 ? Since you used CPU only, you should add it in the else part in /lib/make.sh.

chaochaow · 2017-03-02T01:51:40Z

@XinliangZhu @Peratham : when I run the GPU version today, I got the following error after adding
g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc
roi_pooling_op.cu.o -I $TF_INC -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS -D_GLIBCXX_USE_CXX11_ABI=0
-lcudart -L $CUDA_PATH/lib64

Is there anything else I need to do? It looks like my setup is able to locate nvcc

skipping 'utils/nms.c' Cython extension (up-to-date)
skipping 'nms/cpu_nms.c' Cython extension (up-to-date)
rm -rf build
bash make.sh
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
/usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h(36): error: identifier "__builtin_ia32_monitorx" is undefined

/usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h(42): error: identifier "__builtin_ia32_mwaitx" is undefined

2 errors detected in the compilation of "/tmp/tmpxft_000016e3_00000000-7_roi_pooling_op_gpu.cu.cpp1.ii".
g++: error: roi_pooling_op.cu.o: No such file or directory

XinliangZhu · 2017-03-02T02:27:04Z

@chaochaow Since you are trying CPU only mode, I guess you should put -D_GLIBCXX_USE_CXX11_ABI=0 in the else block. Like else g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc -D_GLIBCXX_USE_CXX11_ABI=0 \ -I $TF_INC -fPIC $CXXFLAGS.

For solving your current error, please refer to: tensorflow/tensorflow#1066

chaochaow · 2017-03-02T06:27:32Z

@XinliangZhu Thanks a lot for the help! I added -D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES into nvcc line and it fixed the compilation issue. Then ran into the gpu_nms issue like yours and got it fixed by changing cudaconfig in setup.py according to your previous post. Now the error I have is sparse_softmax_cross_entropy_with_logits call. Any suggestion?

Googled it and saw this exact same issue as posted here with TF1.0
tensorflow/models#857

Normalizing targets
done
Solving...
Traceback (most recent call last):
File "./tools/train_net.py", line 96, in
max_iters=args.max_iters)
File "/media/garmin/Data/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 264, in train_net
sw.train_model(sess, max_iters)
File "/media/garmin/Data/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 114, in train_model
rpn_cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(rpn_cls_score, rpn_label))
File "/home/garmin/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1684, in sparse_softmax_cross_entropy_with_logits
labels, logits)
File "/home/garmin/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1533, in _ensure_xent_args
"named arguments (labels=..., logits=..., ...)" % name)
ValueError: Only call sparse_softmax_cross_entropy_with_logits with named arguments (labels=..., logits=..., ...)

XinliangZhu · 2017-03-02T06:31:26Z

This is due to the version of your TF. You can google it. There're many discussions on it. On Mar 2, 2017, at 12:27 AM, chaochaow <notifications@github.com<mailto:notifications@github.com>> wrote: @XinliangZhu<https://github.com/XinliangZhu> Thanks a lot for the help! I add -D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES into nvcc line and it fixed the compilation issue. Then ran into the gpu_nms issue like yours and got it fixed by changing cudaconfig in setup.py according to your previous post. Now the error I have is sparse_softmax_cross_entropy_with_logits call. Any suggestion? Normalizing targets done Solving... Traceback (most recent call last): File "./tools/train_net.py", line 96, in max_iters=args.max_iters) File "/media/garmin/Data/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 264, in train_net sw.train_model(sess, max_iters) File "/media/garmin/Data/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 114, in train_model rpn_cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(rpn_cls_score, rpn_label)) File "/home/garmin/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1684, in sparse_softmax_cross_entropy_with_logits labels, logits) File "/home/garmin/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1533, in _ensure_xent_args "named arguments (labels=..., logits=..., ...)" % name) ValueError: Only call sparse_softmax_cross_entropy_with_logits with named arguments (labels=..., logits=..., ...) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#90 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADruDupWYjlyGiOQVku4xneDWtLm3HwIks5rhmFVgaJpZM4MKvPS>.

chaochaow · 2017-03-02T06:47:49Z

yeah, I updated all those math operators and it ran properly now. Looks like this code took a lot of resource and my 1050 runs out of memory pretty quickly.

XinliangZhu · 2017-03-02T06:50:36Z

try a smaller model like VGG1024~

…

Sent from my iPhone On Mar 2, 2017, at 12:47 AM, chaochaow <notifications@github.com<mailto:notifications@github.com>> wrote: yeah, I updated all those math operators and it ran properly now. Looks like this code took a lot of resource and my 1050 runs out of memory pretty quickly. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#90 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADruDk4rbg-8K-5BuVgvhHSwIf1HiXrMks5rhmYWgaJpZM4MKvPS>.

nanyigongzi · 2017-03-16T10:07:03Z

@Peratham hi,could you tell me where to add cuda path in in a specific location of make.sh?

TomHeaven · 2017-04-19T02:40:34Z

I'm using Mac OS X. The correct method fixing the problem is to use clang++ instead of g++ in make.sh.

The updated mash.sh looks like this:

TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')


# config
CXX=g++
CUDA_PATH=/usr/local/cuda/
NVCCFLAGS='--expt-relaxed-constexpr'
CXXFLAGS=''

if [[ "$OSTYPE" =~ ^darwin ]]; then
    CXX=clang++
    CXXFLAGS+=' -undefined dynamic_lookup '
    TF_INC=/Library/Python/2.7/site-packages/tensorflow/include
fi

cd roi_pooling_layer

if [ -d "$CUDA_PATH" ]; then
	nvcc -std=c++11 -c -o roi_pooling_op.cu.o roi_pooling_op_gpu.cu.cc  \
		-I $TF_INC -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC $NVCCFLAGS \
		-arch=sm_37

	$CXX -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc -D_GLIBCXX_USE_CXX11_ABI=0 \
		roi_pooling_op.cu.o -I $TF_INC  -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS \
		-lcudart -L $CUDA_PATH/lib64
else
	$CXX -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc -D_GLIBCXX_USE_CXX11_ABI=0 \
		-I $TF_INC -fPIC $CXXFLAGS
fi

cd ..

danaodai · 2018-02-01T08:44:42Z

my problem is cuda path is wrong。

AlbertoBenini mentioned this issue Mar 14, 2017

GPU compile error #96

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

undefined symbol in roi_pooling_layer #90

undefined symbol in roi_pooling_layer #90

chaochaow commented Feb 24, 2017 •

edited

Loading

showkeyjar commented Feb 26, 2017

Peratham commented Feb 28, 2017

Peratham commented Feb 28, 2017

chaochaow commented Feb 28, 2017

chaochaow commented Mar 1, 2017

Peratham commented Mar 1, 2017

Peratham commented Mar 1, 2017

XinliangZhu commented Mar 1, 2017

Peratham commented Mar 1, 2017 •

edited

Loading

XinliangZhu commented Mar 1, 2017

XinliangZhu commented Mar 1, 2017

Peratham commented Mar 1, 2017

chaochaow commented Mar 1, 2017

XinliangZhu commented Mar 1, 2017

chaochaow commented Mar 1, 2017 •

edited

Loading

XinliangZhu commented Mar 1, 2017 •

edited

Loading

chaochaow commented Mar 2, 2017

XinliangZhu commented Mar 2, 2017

chaochaow commented Mar 2, 2017 •

edited

Loading

XinliangZhu commented Mar 2, 2017 via email

chaochaow commented Mar 2, 2017

XinliangZhu commented Mar 2, 2017 via email •

edited

Loading

nanyigongzi commented Mar 16, 2017

TomHeaven commented Apr 19, 2017

danaodai commented Feb 1, 2018

undefined symbol in roi_pooling_layer #90

undefined symbol in roi_pooling_layer #90

Comments

chaochaow commented Feb 24, 2017 • edited Loading

showkeyjar commented Feb 26, 2017

Peratham commented Feb 28, 2017

Peratham commented Feb 28, 2017

chaochaow commented Feb 28, 2017

chaochaow commented Mar 1, 2017

Peratham commented Mar 1, 2017

Peratham commented Mar 1, 2017

XinliangZhu commented Mar 1, 2017

Peratham commented Mar 1, 2017 • edited Loading

XinliangZhu commented Mar 1, 2017

XinliangZhu commented Mar 1, 2017

Peratham commented Mar 1, 2017

chaochaow commented Mar 1, 2017

XinliangZhu commented Mar 1, 2017

chaochaow commented Mar 1, 2017 • edited Loading

XinliangZhu commented Mar 1, 2017 • edited Loading

chaochaow commented Mar 2, 2017

XinliangZhu commented Mar 2, 2017

chaochaow commented Mar 2, 2017 • edited Loading

XinliangZhu commented Mar 2, 2017 via email

chaochaow commented Mar 2, 2017

XinliangZhu commented Mar 2, 2017 via email • edited Loading

nanyigongzi commented Mar 16, 2017

TomHeaven commented Apr 19, 2017

danaodai commented Feb 1, 2018

chaochaow commented Feb 24, 2017 •

edited

Loading

Peratham commented Mar 1, 2017 •

edited

Loading

chaochaow commented Mar 1, 2017 •

edited

Loading

XinliangZhu commented Mar 1, 2017 •

edited

Loading

chaochaow commented Mar 2, 2017 •

edited

Loading

XinliangZhu commented Mar 2, 2017 via email •

edited

Loading