-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
undefined symbol in roi_pooling_layer #90
Comments
the same error. |
same here. |
I set the cuda path in lib/make.sh and it works for me. |
thanks for the information, @Peratham. Do you run the training script with the GPU version? |
@Peratham : I tried GPU version today and it gave an error stating roi_pooling.o can not be found when trying to build cython module in lib. |
Yes, I ran it with the GPU version. I also set nvcc argument based on cuda compatibility on the card that I am using. But I think the default value works on most GPUs anyway. So this may not be the case. |
Also, do you have eigen in your system and the system path? |
@Peratham Hi Peratham, when I run the demo.py, I got the problem of "ImportError: No module named gpu_nms". I checked the FRCN_ROOT/lib/nms, and there's no gpu_nms.so. So I guess it maybe the problem of building Cython. I used CUDA 8.0, Anaconda with python 2.7 and tensorflow 1.0. Do you have any idea? Thank you! |
@XinliangZhu Have you run make.sh in the /lib directory? What is the output when you run it? Are there any errors? |
@Peratham Yes, I have run it. No error occurred. |
@Peratham I have solved the problem of generating gpu_nms.so by revising |
@XinliangZhu Is your machine 32 or 64 bits? |
Found a similar discussion here. #50 I still have problem with CPU version after adding -D_GLIBCXX_USE_CXX11_ABI=0. Will try GPU version tomorrow. |
@Peratham Thanks! Mine is 64 bits. I have completely solved the problem. To summarize, I first change the |
@Peratham @XinliangZhu now I have the same problem as CharlesShang/TFFRCNN#2 as nsivaramakrishnan posted in cpu version. Looks like there is no solution for cpu version yet according to #36 |
@chaochaow Based on my experience, this may be caused by the mismatched g++ version. So is your g++ 5.* and where did you add |
@XinliangZhu @Peratham : when I run the GPU version today, I got the following error after adding Is there anything else I need to do? It looks like my setup is able to locate nvcc skipping 'utils/nms.c' Cython extension (up-to-date) /usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h(42): error: identifier "__builtin_ia32_mwaitx" is undefined 2 errors detected in the compilation of "/tmp/tmpxft_000016e3_00000000-7_roi_pooling_op_gpu.cu.cpp1.ii". |
@chaochaow Since you are trying CPU only mode, I guess you should put For solving your current error, please refer to: tensorflow/tensorflow#1066 |
@XinliangZhu Thanks a lot for the help! I added -D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES into nvcc line and it fixed the compilation issue. Then ran into the gpu_nms issue like yours and got it fixed by changing cudaconfig in setup.py according to your previous post. Now the error I have is sparse_softmax_cross_entropy_with_logits call. Any suggestion? Googled it and saw this exact same issue as posted here with TF1.0 Normalizing targets |
This is due to the version of your TF. You can google it. There're many discussions on it.
On Mar 2, 2017, at 12:27 AM, chaochaow <notifications@github.com<mailto:notifications@github.com>> wrote:
@XinliangZhu<https://github.com/XinliangZhu> Thanks a lot for the help! I add -D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES into nvcc line and it fixed the compilation issue. Then ran into the gpu_nms issue like yours and got it fixed by changing cudaconfig in setup.py according to your previous post. Now the error I have is sparse_softmax_cross_entropy_with_logits call. Any suggestion?
Normalizing targets
done
Solving...
Traceback (most recent call last):
File "./tools/train_net.py", line 96, in
max_iters=args.max_iters)
File "/media/garmin/Data/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 264, in train_net
sw.train_model(sess, max_iters)
File "/media/garmin/Data/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 114, in train_model
rpn_cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(rpn_cls_score, rpn_label))
File "/home/garmin/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1684, in sparse_softmax_cross_entropy_with_logits
labels, logits)
File "/home/garmin/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1533, in _ensure_xent_args
"named arguments (labels=..., logits=..., ...)" % name)
ValueError: Only call sparse_softmax_cross_entropy_with_logits with named arguments (labels=..., logits=..., ...)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#90 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADruDupWYjlyGiOQVku4xneDWtLm3HwIks5rhmFVgaJpZM4MKvPS>.
|
yeah, I updated all those math operators and it ran properly now. Looks like this code took a lot of resource and my 1050 runs out of memory pretty quickly. |
try a smaller model like VGG1024~
…Sent from my iPhone
On Mar 2, 2017, at 12:47 AM, chaochaow <notifications@github.com<mailto:notifications@github.com>> wrote:
yeah, I updated all those math operators and it ran properly now. Looks like this code took a lot of resource and my 1050 runs out of memory pretty quickly.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#90 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADruDk4rbg-8K-5BuVgvhHSwIf1HiXrMks5rhmYWgaJpZM4MKvPS>.
|
@Peratham hi,could you tell me where to add cuda path in in a specific location of make.sh? |
I'm using Mac OS X. The correct method fixing the problem is to use The updated
|
my problem is cuda path is wrong。 |
I followed the instruction and try to run the training script on CPU but got the following error msg. It looks like it has some issues to load the roi_pooling.so library. Does anybody encounter this problem? I am using Tensorflow 0.11.0.
./experiments/scripts/faster_rcnn_end2end.sh cpu 0 VGG16 pascal_voc
Traceback (most recent call last):
File "./tools/train_net.py", line 16, in
from networks.factory import get_network
File "/home/Projects/Faster-RCNN_TF/tools/../lib/networks/init.py", line 8, in
from .VGGnet_train import VGGnet_train
File "/home/Projects/Faster-RCNN_TF/tools/../lib/networks/VGGnet_train.py", line 2, in
from networks.network import Network
File "/home/Projects/Faster-RCNN_TF/tools/../lib/networks/network.py", line 3, in
import roi_pooling_layer.roi_pooling_op as roi_pool_op
File "/home/Projects/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in
_roi_pooling_module = tf.load_op_library(filename)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 63, in load_op_library
raise errors._make_specific_exception(None, None, error_msg, error_code)
tensorflow.python.framework.errors.NotFoundError: /home/Projects/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE
The text was updated successfully, but these errors were encountered: