Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot run demo on CPU mode #36

Open
teddybearz opened this issue Nov 30, 2016 · 38 comments
Open

cannot run demo on CPU mode #36

teddybearz opened this issue Nov 30, 2016 · 38 comments

Comments

@teddybearz
Copy link

running inside the latest docker tensorflow:

docker run -it -p 8888:8888 tensorflow/tensorflow

`

root@f54905c5bdaf:/notebooks/Faster-RCNN_TF# python ./tools/demo.py --model /VGGnet_fast_rcnn_iter_70000.ckpt
Traceback (most recent call last):
File "./tools/demo.py", line 11, in
from networks.factory import get_network
File "/notebooks/Faster-RCNN_TF/tools/../lib/networks/init.py", line 8, in
from .VGGnet_train import VGGnet_train
File "/notebooks/Faster-RCNN_TF/tools/../lib/networks/VGGnet_train.py", line 2, in
from networks.network import Network
File "/notebooks/Faster-RCNN_TF/tools/../lib/networks/network.py", line 3, in
import roi_pooling_layer.roi_pooling_op as roi_pool_op
File "/notebooks/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in
_roi_pooling_module = tf.load_op_library(filename)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 63, in load_op_library
raise errors._make_specific_exception(None, None, error_msg, error_code)
tensorflow.python.framework.errors.NotFoundError: /notebooks/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _Z22ROIPoolBackwardLaucherPKffiiiiiiiS0_PfPKiRKN5Eigen9GpuDeviceE

root@f54905c5bdaf:/notebooks/Faster-RCNN_TF# nm -gC lib/roi_pooling_layer/roi_pooling.so |grep GpuDevice
U ROIPoolForwardLaucher(float const*, float, int, int, int, int, int, int, float const*, float*, int*, Eigen::GpuDevice const&)
U ROIPoolBackwardLaucher(float const*, float, int, int, int, int, int, int, int, float const*, float*, int const*, Eigen::GpuDevice const&)
U Eigen::GpuDevice const& tensorflow::OpKernelContext::eigen_deviceEigen::GpuDevice() const

`

@teddybearz
Copy link
Author

to reproduce (after download VGGnet_fast_rcnn_iter_70000.ckpt to ~/):

`
docker run -v ~/VGGnet_fast_rcnn_iter_70000.ckpt:/VGGnet_fast_rcnn_iter_70000.ckpt -it -p 8888:8888 tensorflow/tensorflow bash

sudo apt-get update
sudo apt-get install -y git
sudo apt-get install -y python-opencv
sudo apt-get install -y python-tk

pip install cython
pip install easydict
pip install image

sudo ln /dev/null /dev/raw1394

git clone --recursive https://github.com/smallcorgi/Faster-RCNN_TF.git

cd Faster-RCNN_TF/lib
make
cd ..
python ./tools/demo.py --model /VGGnet_fast_rcnn_iter_70000.ckpt

`

@tyyyang
Copy link

tyyyang commented Dec 10, 2016

I also encounter the same problem.

@donnyyou
Copy link

I have encountered the same fault, too. And I wonder the solution to this problem. Thanks!

@jacobunderlinebenseal
Copy link

me too

@jaig
Copy link

jaig commented Jan 9, 2017

I am facing the similar problem when I start to train it on CPU or run a demo. Solution for this ?

@nsivaramakrishnan
Copy link

Hi,
I am getting the same error while trying to run demo.py:
tensorflow.python.framework.errors_impl.NotFoundError: /home/fmc/rcnn/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling.so: undefined symbol: _Z22ROIPoolBackwardLaucherPKffiiiiiiiS0_PfPKiRKN5Eigen9GpuDeviceE
I added "-D_GLIBCXX_USE_CXX11_ABI=0" in make,sh. I use g++ version 5.4.0 and TF V0.12. Btw, am trying to run this on CPU. Any help is highly appreciated.
-Siva

@jaig
Copy link

jaig commented Jan 11, 2017

Can we train this model using CPU itself?

@oplkqingy
Copy link

I meet similar issue in ubuntu16.04 with g++ version 5.4.0 and TF v0.12.Befor add "-D_GLIBCXX_USE_CXX11_ABI=0" in make.sh, show "_ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE" when run the demo, and after add ,show "_Z22ROIPoolBackwardLaucherPKffiiiiiiiS0_PfPKiRKN5Eigen9GpuDeviceE" when run the demo.

I have'nt GPU,How can I run the demo in CPU-noly mode?

@raviv
Copy link

raviv commented Jan 22, 2017

Having the same problem (_Z22ROIPoolBackwardLaucherPKffiiiiiiiS0_PfPKiRKN5Eigen9GpuDeviceE) when trying to train on CPU.
Adding "-D_GLIBCXX_USE_CXX11_ABI=0" to the g++ command in make.sh and re-making didn't help.
Thanks.

@civilman628
Copy link

civilman628 commented Feb 5, 2017

g++ -std=c++11 -shared -o roi_pooling.so roi_pooling_op.cc \
	roi_pooling_op.cu.o -I $TF_INC  -D GOOGLE_CUDA=1 -fPIC $CXXFLAGS  -D_GLIBCXX_USE_CXX11_ABI=0 \
	-lcudart -L $CUDA_PATH/lib64

@DiegoGLagash
Copy link

same problem here.

@pbarker
Copy link

pbarker commented Feb 12, 2017

same problem here as well

@EunmiKang
Copy link

me too :(

@andresrommier
Copy link

Had to modify the make.sh file to change the GPU architecture to match mine (sm_61), then had to change the Cuda path (in Arch linux is /opt/cuda).

@wxwang0601
Copy link

same problem!
Befor add "-D_GLIBCXX_USE_CXX11_ABI=0" in make.sh, show "_ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE" when run the demo, and after add ,show "_Z22ROIPoolBackwardLaucherPKffiiiiiiiS0_PfPKiRKN5Eigen9GpuDeviceE" when run the demo.

@googleios @raviv have u solve the problem?

@louisquinn
Copy link

Hi all, I've figured out a workaround to use only the CPU. I have only tested this method for the demo script, not sure if it will work for training, but it should.

Download and Install CUDA:
https://developer.nvidia.com/cuda-downloads

Compile for GPU OR Copy my .so
You can download my .so file from here: https://drive.google.com/open?id=0B-0d5quIGY5XVEJvYU9XRkVJTWM
Or you can run make.sh and compile with CUDA (not sure if this will work)

Include these lines of code at the top of your Python scripts
import os
os.environ['CUDA_VISIBLE_DEVICES'] = ''

@guotong1988
Copy link

I succeed to run another faster-rcnn on CPU from this repo

@shinyke
Copy link

shinyke commented Apr 7, 2017

@louisquinn I succeed with your method. thx~

@jhcruvinel
Copy link

@louisquinn,
I would like to know how you managed to install and run the example you mentioned without a GPU

@jhcruvinel
Copy link

@guotong1988, tf-faster-rcnn requires GPU. How you managed to install without a GPU

@jhcruvinel
Copy link

@louisquinn, I was able to reproduce your script. It worked.

@ghost
Copy link

ghost commented May 7, 2017

How you managed to install without a GPU ?

@jhcruvinel
Copy link

I installed the CUDA driver, although the machine does not have the card. Then I set it to use CPU only. It worked!

@liydxl
Copy link

liydxl commented May 12, 2017

@louisquinn, hi, I add " import os os.environ['CUDA_VISIBLE_DEVICES'] = ''" " to file "demo.py" and "_init_paths.py" and "setup.py". But it seems do not work , the error message is "RuntimeError: Invalid DISPLAY variable".
Which file should "os.environ['CUDA_VISIBLE_DEVICES'] = ''" " be add to?

@sidak
Copy link

sidak commented Jul 6, 2017

The method of installing Cuda mentioned by @louisquinn works for me! Thanks! 😄

@louisquinn
Copy link

@xiaoqo
Apologies for the late reply!
You should add the line to "demo.py", however it MUST be before the Tensorflow session is created, so before line 112.

Also, you guys will be interested in this: https://github.com/tensorflow/models/tree/master/object_detection
Official API for deep learning object detection with various state of the art models and frameworks, no more VGG16! It's really easy to use. If you install Tensorflow for CPU it will run out of the box, however if you installed for GPU and wish to run CPU only, you will have to use the same method I mentioned in this thread.

@sunzj
Copy link

sunzj commented Jul 13, 2017

Hi

i find the root causes of the issue. when use CPU only mode without installing cude , library roi_pooling.so compile function "ROIPoolBackwardLaucher" into it.However, the function is implemented in cuda related module and only for GPU.So when execute demo, can't find the implement of function ROIPoolBackwardLaucher,crash happen.

i prepare a patch for that issue, and verified the issue is gone after applying the patch.
when i try to push the patch, i find there was a patch there but isn't merged:

you can refer to:
0dcb55c

or use my patch:
https://drive.google.com/file/d/0BxlQuWrSazOxd29PNjVIenZneHM/view?usp=sharing

Best wishes!
Zhuojin

@lfc87
Copy link

lfc87 commented Jul 19, 2017

@louisquinn i did following:

  1. installed cuda
  2. downloaded your .so file and replaced it here Faster-RCNN_TF/lib/roi_pooling_layer
  3. these two rows i’ve pasted to Faster-RCNN_TF/lib in setup.py
    import os
    os.environ['CUDA_VISIBLE_DEVICES'] = ''
  4. now i do make and receive an error

`python setup.py build_ext --inplace
running build_ext
skipping 'utils/bbox.c' Cython extension (up-to-date)
skipping 'utils/nms.c' Cython extension (up-to-date)
skipping 'nms/cpu_nms.c' Cython extension (up-to-date)
skipping 'nms/gpu_nms.cpp' Cython extension (up-to-date)
rm -rf build
bash make.sh
/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1254): warning: calling a constexpr host function("real") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1254): warning: calling a constexpr host function("imag") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1254): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1254): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1259): warning: calling a constexpr host function("real") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1259): warning: calling a constexpr host function("imag") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1259): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1259): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(133): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(138): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(212): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/liverpool/.local/lib/python3.5/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(217): warning: calling a constexpr host function from a host device function is not all`

And if i run sudo make, i receive following:

`

  1. python setup.py build_ext --inplace
  2. running build_ext
  3. skipping 'utils/bbox.c' Cython extension (up-to-date)
  4. skipping 'utils/nms.c' Cython extension (up-to-date)
  5. skipping 'nms/cpu_nms.c' Cython extension (up-to-date)
  6. skipping 'nms/gpu_nms.cpp' Cython extension (up-to-date)
  7. rm -rf build
  8. bash make.sh
  9. Traceback (most recent call last):
  10. File "", line 1, in
  11. ImportError: No module named tensorflow
  12. make.sh: line 13: nvcc: command not found
  13. g++: error: GOOGLE_CUDA=1: No such file or directory
    `

Can anyone help me with that?

Kind Regards
Igor

@liuqi05
Copy link

liuqi05 commented Sep 1, 2017

@louisquinn , Hi, i follow your advices, and i copy your roi_pooling.so fie to my repo. And modify demo.py file to add os.environ['CUDA_VISIBLE_DEVICES'] = ''. Then i run the demo, but it display:
Traceback (most recent call last):
File "./tools/demo.py", line 11, in
from networks.factory import get_network
File "/home/joseph/test/Faster-RCNN_TF/tools/../lib/networks/init.py", line 8, in
from .VGGnet_train import VGGnet_train
File "/home/joseph/test/Faster-RCNN_TF/tools/../lib/networks/VGGnet_train.py", line 2, in
from networks.network import Network
File "/home/joseph/test/Faster-RCNN_TF/tools/../lib/networks/network.py", line 3, in
import roi_pooling_layer.roi_pooling_op as roi_pool_op
File "/home/joseph/test/Faster-RCNN_TF/tools/../lib/roi_pooling_layer/roi_pooling_op.py", line 5, in
_roi_pooling_module = tf.load_op_library(filename)
File "/home/joseph/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 64, in load_op_library
None, None, error_msg, error_code)
tensorflow.python.framework.errors_impl.NotFoundError: libcudart.so.8.0: cannot open shared object file: No such file or directory.
Btw, am trying to run this on CPU and my computer has no GPU. So can you give me some advice about this error? Thank you in advance.

@louisquinn
Copy link

@liuqi05
It looks like you didn't install CUDA 8.0 and CuDNN 5.1.
For this method to work, you have to set your system up as though you do have a GPU.
Replacing the roi_pooling.so file is just so you don't have to compile it yourself.

I would like to refer you to the official Tensorflow Object Detection API:
https://github.com/tensorflow/models/tree/master/object_detection
All you need to do is add the os.environ['CUDA_VISIBLE_DEVICES'] = '' line to run on CPU with this framework

@liuqi05
Copy link

liuqi05 commented Sep 1, 2017

@louisquinn, thank you for your quick reply. But i want to know which file i should add the os.environ['CUDA_VISIBLE_DEVICES'] = '' line to run on CPU with the framework you suggest. train.py and eval.py files ?

@louisquinn
Copy link

For the official framework:
If you installed Tensorflow without GPU support and you don't have a GPU, it will automatically process on the CPU.

If you have a GPU and installed with GPU support you will have to add the os.environ line.
If you add the os.environ line it should be defined at any point before you define your tf.Session

@liuqi05
Copy link

liuqi05 commented Sep 1, 2017

@louisquinn, Now i understand. I do not need add the line to files. Because i installed Tensorflow without GPU support. Thank you for your patience. Now i am trying to run locally step by step. When i encounter problem, may be i need your help again. And thank you again.

@louisquinn
Copy link

@liuqi05
No worries! I recommend starting with one of the pre-trained models to learn how the framework works.
You can email me direct at louisquinn.contact@gmail.com

@liuqi05
Copy link

liuqi05 commented Sep 1, 2017

@louisquinn, Thank you very much. I will send mail to you.

@dongdongrj
Copy link

Hi all, I want know if the anaconda3 and python3.6 can be run the project?
In my environment the error log report as below:
ModuleNotFoundError: No module named 'easydict'
(tensorflow) dongdong@ubuntu:~/ai/tensorflow/Faster-RCNN_TF$ conda install -c https://conda.anaconda.org/auto easydict
Fetching package metadata .............
Solving package specifications: .

UnsatisfiableError: The following specifications were found to be in conflict:

  • easydict -> python 2.7* -> openssl 1.0.1*
  • python 3.6*
    Use "conda info " to see the dependencies for each package.

Thanks!

@dongdongrj
Copy link

@louisquinn
Hi , I want know if the anaconda3 and python3.6 can be run the project?
In my environment the error log report as below:
ModuleNotFoundError: No module named 'easydict'
(tensorflow) dongdong@ubuntu:~/ai/tensorflow/Faster-RCNN_TF$ conda install -c https://conda.anaconda.org/auto easydict
Fetching package metadata .............
Solving package specifications: .

UnsatisfiableError: The following specifications were found to be in conflict:

easydict -> python 2.7* -> openssl 1.0.1*
python 3.6*
Use "conda info " to see the dependencies for each package.
Thanks!

@Nofcity
Copy link

Nofcity commented Nov 27, 2017

@jhcruvinel ,I have no NVIDIA's card ,but i run make.sh and compile with CUDA, installed the CUDA driver ,when i do "python demo.py --cpu --model /Faster-RCNN_TF-master/input_model/VGGnet_fast_rcnn_iter_70000.ckpt".The result is this :Loaded network /Faster-RCNN_TF-master/input_model/VGGnet_fast_rcnn_iter_70000.ckpt
NVIDIA: no NVIDIA devices found
unknown error
so what should i do?thanks!~~~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests