Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue compiling on Ubuntu and tensorflow 1.4 #28

Open
clausmichele opened this issue Mar 5, 2018 · 24 comments
Open

Issue compiling on Ubuntu and tensorflow 1.4 #28

clausmichele opened this issue Mar 5, 2018 · 24 comments

Comments

@clausmichele
Copy link

clausmichele commented Mar 5, 2018

Dear all,

I have an issue trying to compile the code with tensorflow 1.4. I already solve the problem of cuda_config.h missing, looking at a solved issue.
Here is the output of make all:

make all nvcc -g -std=c++11 -Ipython -c "import tensorflow; print(tensorflow.sysconfig.get_include())"` -I"/usr/local/cuda/include" -DGOOGLE_CUDA=1 -D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES -D__STRICT_ANSI__ -D_GLIBCXX_USE_CXX11_ABI=0 -c -gencode=arch=compute_30,code=sm_30 src/ops/preprocessing/kernels/data_augmentation.cu.cc -x cu -Xcompiler -fPIC -o src/ops/build/data_augmentation.o
/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1265): warning: calling a constexpr host function("real") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1265): warning: calling a constexpr host function("imag") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1265): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1265): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1270): warning: calling a constexpr host function("real") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1270): warning: calling a constexpr host function("imag") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1270): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1270): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(133): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(138): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(208): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(213): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/google/protobuf/arena_impl.h(52): warning: integer conversion resulted in a change of sign

/usr/local/lib/python2.7/dist-packages/tensorflow/include/google/protobuf/arena_impl.h(147): warning: integer conversion resulted in a change of sign

/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/util/cuda_kernel_helper.h(572): error: calling a constexpr host function("real") from a device function("CudaAtomicSub") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/util/cuda_kernel_helper.h(572): error: calling a constexpr host function("imag") from a device function("CudaAtomicSub") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/util/cuda_kernel_helper.h(577): error: calling a constexpr host function("real") from a device function("CudaAtomicSub") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/util/cuda_kernel_helper.h(577): error: calling a constexpr host function("imag") from a device function("CudaAtomicSub") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

4 errors detected in the compilation of "/tmp/tmpxft_000009d2_00000000-7_data_augmentation.cu.cpp1.ii".
Makefile:63: recipe for target 'preprocessing' failed
make: *** [preprocessing] Error 2`

@zhouqixian
Copy link

Add a flag '--expt-relaxed-constexpr' when compiling with nvcc.

here is my Makefile for tf1.4:

OUT_DIR = ./build

TF_INC = $(shell python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
TF_LIB = $(shell python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
TF_NSYNC = $(TF_INC)/external/nsync/public
CUDA_HOME = /usr/local/cuda

GPUFLAGS = -I $(TF_INC) -I$(TF_NSYNC) -I$(CUDA_HOME)/include -I/usr/local -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC
CFLAGS = -I $(TF_INC) -I$(TF_NSYNC) -fPIC -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L$(TF_LIB) -ltensorflow_framework

all: downsample.so flow_warp.so preprocessing.so correlation.so

downsample_kernel_gpu.o:
nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_52
-o $(OUT_DIR)/downsample_kernel_gpu.o downsample/downsample_kernel_gpu.cu.cc
$(GPUFLAGS)
downsample.so: downsample_kernel_gpu.o
g++ -std=c++11 -shared
-o $(OUT_DIR)/downsample.so
downsample/downsample_kernel.cc downsample/downsample_op.cc
$(OUT_DIR)/downsample_kernel_gpu.o
$(CFLAGS)
flow_warp_gpu.o:
nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_52
-o $(OUT_DIR)/flow_warp_gpu.o flow_warp/flow_warp.cu.cc
$(GPUFLAGS)
flow_warp_grad_gpu.o:
nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_52
-o $(OUT_DIR)/flow_warp_grad_gpu.o flow_warp/flow_warp_grad.cu.cc
$(GPUFLAGS)
flow_warp.so: flow_warp_gpu.o flow_warp_grad_gpu.o
g++ -std=c++11 -shared
-o $(OUT_DIR)/flow_warp.so
flow_warp/flow_warp_op.cc flow_warp/flow_warp.cc flow_warp/flow_warp_grad.cc
$(OUT_DIR)/flow_warp_gpu.o $(OUT_DIR)/flow_warp_grad_gpu.o
$(CFLAGS)

data_augmentation.o:
nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_52
-o $(OUT_DIR)/data_augmentation.o preprocessing/kernels/data_augmentation.cu.cc
$(GPUFLAGS)
flow_augmentation_gpu.o:
nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_52
-o $(OUT_DIR)/flow_augmentation_gpu.o preprocessing/kernels/flow_augmentation_gpu.cu.cc
$(GPUFLAGS)
preprocessing.so: data_augmentation.o flow_augmentation_gpu.o
g++ -std=c++11 -shared
-o $(OUT_DIR)/preprocessing.so
preprocessing/preprocessing.cc preprocessing/kernels/flow_augmentation.cc
preprocessing/kernels/augmentation_base.cc preprocessing/kernels/data_augmentation.cc
$(OUT_DIR)/data_augmentation.o $(OUT_DIR)/flow_augmentation_gpu.o
$(CFLAGS)

correlation_kernel_gpu.o:
nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_52
-o $(OUT_DIR)/correlation_kernel_gpu.o correlation/correlation_kernel.cu.cc
$(GPUFLAGS)
correlation_grad_kernel_gpu.o:
nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_52
-o $(OUT_DIR)/correlation_grad_kernel_gpu.o correlation/correlation_grad_kernel.cu.cc
$(GPUFLAGS)
correlation_pad_gpu.o:
nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_52
-o $(OUT_DIR)/correlation_pad_gpu.o correlation/pad.cu.cc
$(GPUFLAGS)
correlation.so: correlation_kernel_gpu.o correlation_grad_kernel_gpu.o correlation_pad_gpu.o
g++ -std=c++11 -shared
-o $(OUT_DIR)/correlation.so
correlation/correlation_kernel.cc correlation/correlation_grad_kernel.cc correlation/correlation_op.cc
$(OUT_DIR)/correlation_kernel_gpu.o $(OUT_DIR)/correlation_grad_kernel_gpu.o $(OUT_DIR)/correlation_pad_gpu.o
$(CFLAGS)

clean:
rm -f $(OUT_DIR)/*

@clarkren
Copy link

@zhouqixian
I used the" Makefile for tf1.4",but got error"Makefile:15: *** missing separator. Stop."
I just don't know what's wrong.

@oneTimePad
Copy link

oneTimePad commented Apr 26, 2018

@clarkren That issue is occurring because you need to add tabs to the lines beneath the ones with the colon (for example donwsample_kernel_gpu.o: )

@zhouqixian Even with this Makefile I am still receiving the error ""CUDACC_VER" is no longer supported" I am using tf1.7 Cuda9.0. I read that this issue was supposedly fixed a while back with TF.

@CQFIO
Copy link

CQFIO commented Apr 28, 2018

I finally make it work in TF 1.2 only. I could not make it run in TF 1.7

@zhouqixian
Copy link

zhouqixian commented Apr 28, 2018

@oneTimePad It works for me when using tf1.4, cuda8.0 and cudnnV6. Actually, NVCC is a Cuda compiler and error maybe occurs when you use a higher version Cuda(Cuda 9.0) for this code.
If you only want to use flow-warping operation in tf. There is a code without any custom operation.

def get_pixel_value(img, x, y):
"""
Utility function to get pixel value for coordinate
vectors x and y from a 4D tensor image.
Input
-----
- img: tensor of shape (B, H, W, C)
- x: flattened tensor of shape (BHW, )
- y: flattened tensor of shape (BHW, )
Returns
-------
- output: tensor of shape (B, H, W, C)
"""

shape = tf.shape(x)
batch_size = shape[0]
height = shape[1]
width = shape[2]

batch_idx = tf.range(0, batch_size)
batch_idx = tf.reshape(batch_idx, (batch_size, 1, 1))
b = tf.tile(batch_idx, (1, height, width))

indices = tf.stack([b, y, x], 3)

return tf.gather_nd(img, indices)

def tf_warp(img, flow, H, W):
"""
Input:
img: [B, H, W, C] of float32
flow: [B, H, W, 2] of float32
"""

flow = tf.transpose(flow, [0, 3, 1, 2])

x,y = tf.meshgrid(tf.range(W), tf.range(H))
x = tf.expand_dims(x,0)
x = tf.expand_dims(x,0)

y  =tf.expand_dims(y,0)
y = tf.expand_dims(y,0)

x = tf.cast(x, tf.float32)
y = tf.cast(y, tf.float32)
grid  = tf.concat([x,y],axis = 1)

flows = grid+flow
max_y = tf.cast(H - 1, tf.int32)
max_x = tf.cast(W - 1, tf.int32)
zero = tf.zeros([], dtype=tf.int32)

x = flows[:,0,:,:]
y = flows[:,1,:,:]

x = tf.clip_by_value(x, tf.cast(zero, tf.float32), tf.cast(max_x, tf.float32))
y = tf.clip_by_value(y, tf.cast(zero, tf.float32), tf.cast(max_y, tf.float32))

x0 = x
y0 = y
x0 = tf.cast(x0, tf.int32)
x1 = x0 + 1
y0 = tf.cast(y0,  tf.int32)
y1 = y0 + 1

# clip to range [0, H/W] to not violate img boundaries
x0 = tf.clip_by_value(x0, zero, max_x)
x1 = tf.clip_by_value(x1, zero, max_x)
y0 = tf.clip_by_value(y0, zero, max_y)
y1 = tf.clip_by_value(y1, zero, max_y)

# get pixel value at corner coords
Ia = get_pixel_value(img, x0, y0)
Ib = get_pixel_value(img, x0, y1)
Ic = get_pixel_value(img, x1, y0)
Id = get_pixel_value(img, x1, y1)

# recast as float for delta calculation
x0 = tf.cast(x0, tf.float32)
x1 = tf.cast(x1, tf.float32)
y0 = tf.cast(y0, tf.float32)
y1 = tf.cast(y1, tf.float32)


# calculate deltas
wa = (x1-x) * (y1-y)
wb = (x1-x) * (y-y0)
wc = (x-x0) * (y1-y)
wd = (x-x0) * (y-y0)

# add dimension for addition
wa = tf.expand_dims(wa, axis=3)
wb = tf.expand_dims(wb, axis=3)
wc = tf.expand_dims(wc, axis=3)
wd = tf.expand_dims(wd, axis=3)

# compute output
out = tf.add_n([wa*Ia, wb*Ib, wc*Ic, wd*Id])
return out

@shoutashi
Copy link

Hi @zhouqixian, I have tried to use your Makefile (tf 1.4.1, cuda 8.0, cudnn v6 and python 3.5), Compile is successful, but in testing, I face the problem "correlation.so: undefined symbol:_ZTIN10tensorflow8OpKernelE", could you help me, please? Thank you in advance.

@alisaaalehi
Copy link

alisaaalehi commented May 23, 2018

I'm getting the same error as @shoutashi : "correlation.so: undefined symbol:_ZTIN10tensorflow8OpKernelE". Could you please help me with this?

@DehaiZhao
Copy link

Hi, @shoutashi, @alisaaalehi , I'm facing the same problem
undefined symbol: _ZTIN10tensorflow8OpKernelE
Have you solved it? Thanks

@alisaaalehi
Copy link

alisaaalehi commented Jun 25, 2018

Hey @dehaisea, in my case removing -D_GLIBCXX_USE_CXX11_ABI=0 from the Makefile and rebuilding the project fixed it.

@Iamanorange
Copy link

@shoutashi @alisaaalehi @dehaisea
For error "correlation.so: undefined symbol:_ZTIN10tensorflow8OpKernelE":
Modify Makefile:
TF_LIB = `python -c "import tensorflow; print(tensorflow.sysconfig.get_lib())"`
CGPUFLAGS = -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L$(TF_LIB) -ltensorflow_framework

If tensorflow.sysconfig.get_lib() cannot get correct dir, manually link:
CGPUFLAGS = -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L /home/<user_name>/.local/lib/python2.7/site-packages/tensorflow -ltensorflow_framework`

Env:
cuda 9.2
cudnn 7.1
tensorflow 1.9.0
Ubuntu 16.04

For more information:
#41
tensorflow/tensorflow#13607

Here is my Makefile. I made some other changes to make it successfully.
Makefile.txt

@SHENG-KAI-HUANG
Copy link

SHENG-KAI-HUANG commented Sep 21, 2018

Hello @Iamanorange, I have tried your Makefile in Ubuntu 16.04, tensorflow 1.10.1, cuda 9.0, cudnn 7.3 and it work well! Thanks your help!

@aa-samad
Copy link

aa-samad commented Nov 1, 2018

i was facing the same issues:
Env:
tensorflow 1.11 - cuda 9.0 - python 2.7 - ubuntu 16.04

story:

1- removing -D_GLIBCXX_USE_CXX11_ABI=0 can only work for gcc < 5.0.0 refrence: mgharbi/hdrnet_legacy#2
2- the makefile that @lamanorange provided did not work for me (compile error on correlation.so)
solutions provided over internet: remove -D GOOGLE_CUDA=1
-> successful compile but that strange undefined symbol: _ZTIN10tensorflow8OpKernelE error
just like #41
3- turns out this compiling is incomplete with python2.7 and tensorflow1.11 (i don`t know the issue yet!)

solution:

switch to python 3 😄
1- change python in makefile provided by @lamanorange to python3
2- compile
3- edit src/flowlib.py:
add from future import print_function to the begining
change all print ... to print (...)

this worked for me!

@BibratRanjan
Copy link

After

@shoutashi @alisaaalehi @dehaisea
For error "correlation.so: undefined symbol:_ZTIN10tensorflow8OpKernelE":
Modify Makefile:
TF_LIB = python -c "import tensorflow; print(tensorflow.sysconfig.get_lib())"
CGPUFLAGS = -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L$(TF_LIB) -ltensorflow_framework

If tensorflow.sysconfig.get_lib() cannot get correct dir, manually link:
CGPUFLAGS = -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L /home/<user_name>/.local/lib/python2.7/site-packages/tensorflow -ltensorflow_framework`

Env:
cuda 9.2
cudnn 7.1
tensorflow 1.9.0
Ubuntu 16.04

For more information:
#41
tensorflow/tensorflow#13607

Here is my Makefile. I made some other changes to make it successfully.
Makefile.txt

After modifying the Makefile, I ran into another "undefined symbol" problem : tensorflow.python.framework.errors_impl.NotFoundError: /media/cds-iisc/DATA/Undertaker/ANT/testing/flownet2-tf-master/src/./ops/build/correlation.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringEv

Any help ?

@BibratRanjan
Copy link

I'am using Ubuntu 16.04, tensorflow 1.10.1, cuda 9.0, cudnn 7

@BibratRanjan
Copy link

and python 3.6.4

@BibratRanjan
Copy link

removing -D_GLIBCXX_USE_CXX11_ABI=0 solved the issue

@alisaaalehi
Copy link

Hi @BibratRanjan, anything that you change will cause another problem. I've encountered lots of problems and my final and simple solution is this:

  • Use tensorflow 1.2.0-gpu to fix most of the problems: It is better to use Docker image of that version (tensorflow/tensorflow 1.2.1-gpu). It has everything needed to run this code. Since you are using gpu version of the tensorflow, remember to use nvidia-docker to create a container form the image.

  • You better update the g++ to version 4.8: apt-get install g++-4.8

  • and update the MakeFile to match this new version: change this CC = gcc -O2 -pthread to this
    CC = gcc-4.8 -O2 -pthread and this CXX = g++ to this one CXX = g++-4.8

@fjchange
Copy link

fjchange commented Dec 12, 2018

@zhouqixian

That work! Thanks
but that can be simplified .
Just add "--expt-relaxed-constexpr" at the end of line 11 of origin Makefile

@mengyaaa
Copy link

@shoutashi @alisaaalehi @dehaisea
For error "correlation.so: undefined symbol:_ZTIN10tensorflow8OpKernelE":
Modify Makefile:
TF_LIB = python -c "import tensorflow; print(tensorflow.sysconfig.get_lib())"
CGPUFLAGS = -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L$(TF_LIB) -ltensorflow_framework

If tensorflow.sysconfig.get_lib() cannot get correct dir, manually link:
CGPUFLAGS = -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L /home/<user_name>/.local/lib/python2.7/site-packages/tensorflow -ltensorflow_framework`

Env:
cuda 9.2
cudnn 7.1
tensorflow 1.9.0
Ubuntu 16.04

For more information:
#41
tensorflow/tensorflow#13607

Here is my Makefile. I made some other changes to make it successfully.
Makefile.txt

Thank you so much for you Makeflie. After make all success. I ran python -m src.flownet2.test --input_a data/samples/0img0.ppm --input_b data/samples/0img1.ppm --out ./
It shows
WARNING:tensorflow:From /home/lab226/Downloads/flownet2-tf-master/src/net.py:22: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
WARNING:tensorflow:From /home/lab226/Downloads/flownet2-tf-master/src/flownet_cs/flownet_cs.py:26: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
2018-12-18 11:25:54.254685: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Traceback (most recent call last):
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
return fn(*args)
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1261, in _run_fn
self._extend_graph()
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1295, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'Correlation' with these attrs. Registered devices: [CPU], Registered kernels:
device='GPU'

 [[Node: FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/Correlation = Correlation[kernel_size=1, max_displacement=20, pad=20, stride_1=1, stride_2=2](FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/conv3/lrelu/add, FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/conv3_1/lrelu/add)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1725, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'Correlation' with these attrs. Registered devices: [CPU], Registered kernels:
device='GPU'

 [[Node: FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/Correlation = Correlation[kernel_size=1, max_displacement=20, pad=20, stride_1=1, stride_2=2](FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/conv3/lrelu/add, FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/conv3_1/lrelu/add)]]

Caused by op 'FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/Correlation', defined at:
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/lab226/Downloads/flownet2-tf-master/src/flownet2/test.py", line 51, in
main()
File "/home/lab226/Downloads/flownet2-tf-master/src/flownet2/test.py", line 18, in main
out_path=FLAGS.out,
File "/home/lab226/Downloads/flownet2-tf-master/src/net.py", line 62, in test
predictions = self.model(inputs, training_schedule)
File "/home/lab226/Downloads/flownet2-tf-master/src/flownet2/flownet2.py", line 22, in model
net_css_predictions = self.net_css.model(inputs, training_schedule, trainable=False)
File "/home/lab226/Downloads/flownet2-tf-master/src/flownet_css/flownet_css.py", line 18, in model
net_cs_predictions = self.net_cs.model(inputs, training_schedule, trainable=False)
File "/home/lab226/Downloads/flownet2-tf-master/src/flownet_cs/flownet_cs.py", line 18, in model
net_c_predictions = self.net_c.model(inputs, training_schedule, trainable=False)
File "/home/lab226/Downloads/flownet2-tf-master/src/flownet_c/flownet_c.py", line 40, in model
cc = correlation(conv_a_3, conv_b_3, 1, 20, 1, 2, 20)
File "/home/lab226/Downloads/flownet2-tf-master/src/correlation.py", line 14, in correlation
padding)
File "", line 53, in correlation
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'Correlation' with these attrs. Registered devices: [CPU], Registered kernels:
device='GPU'

 [[Node: FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/Correlation = Correlation[kernel_size=1, max_displacement=20, pad=20, stride_1=1, stride_2=2](FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/conv3/lrelu/add, FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/conv3_1/lrelu/add)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/lab226/Downloads/flownet2-tf-master/src/flownet2/test.py", line 51, in
main()
File "/home/lab226/Downloads/flownet2-tf-master/src/flownet2/test.py", line 18, in main
out_path=FLAGS.out,
File "/home/lab226/Downloads/flownet2-tf-master/src/net.py", line 68, in test
saver.restore(sess, checkpoint)
File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1759, in restore
err, "a mismatch between the current graph and the graph")
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

No OpKernel was registered to support Op 'Correlation' with these attrs. Registered devices: [CPU], Registered kernels:
device='GPU'

 [[Node: FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/Correlation = Correlation[kernel_size=1, max_displacement=20, pad=20, stride_1=1, stride_2=2](FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/conv3/lrelu/add, FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/conv3_1/lrelu/add)]]

Env : cuda 9.0 cudnn9.0 tensorflow 1.10.0 Ubuntu 16.04
Still can't run out the result...Can there any help? Thank you very much..

@Iamanorange
Copy link

@mengyaaa Should run Flownet2 on GPU only.

@aminzabardast
Copy link

@shoutashi @alisaaalehi @dehaisea
For error "correlation.so: undefined symbol:_ZTIN10tensorflow8OpKernelE":
Modify Makefile:
TF_LIB = python -c "import tensorflow; print(tensorflow.sysconfig.get_lib())"
CGPUFLAGS = -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L$(TF_LIB) -ltensorflow_framework

If tensorflow.sysconfig.get_lib() cannot get correct dir, manually link:
CGPUFLAGS = -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L /home/<user_name>/.local/lib/python2.7/site-packages/tensorflow -ltensorflow_framework`

Env:
cuda 9.2
cudnn 7.1
tensorflow 1.9.0
Ubuntu 16.04

For more information:
#41
tensorflow/tensorflow#13607

Here is my Makefile. I made some other changes to make it successfully.
Makefile.txt

Thank you! This solved my compiling issue on Tensorflow 1.10.0 with CUDA 9.0.

@ZCMax
Copy link

ZCMax commented Jun 30, 2019

Hi @BibratRanjan, anything that you change will cause another problem. I've encountered lots of problems and my final and simple solution is this:

  • Use tensorflow 1.2.0-gpu to fix most of the problems: It is better to use Docker image of that version (tensorflow/tensorflow 1.2.1-gpu). It has everything needed to run this code. Since you are using gpu version of the tensorflow, remember to use nvidia-docker to create a container form the image.
  • You better update the g++ to version 4.8: apt-get install g++-4.8
  • and update the MakeFile to match this new version: change this CC = gcc -O2 -pthread to this
    CC = gcc-4.8 -O2 -pthread and this CXX = g++ to this one CXX = g++-4.8

I use the Docker image of that version( tensorflow/tensorflow: 1.2.1-gpu) and successfully run the makefile, but I got some problems when I install the python-tk in the Docker container, I try to use " apt-get install python-tk " , it tells me that I have to run " apt-get update" , but it always stop at " 0% working" when I run "apt-get update" , Did you encounter the similar problem?

@Iamanorange
Copy link

but I got some problems when I install the python-tk in the Docker container, I try to use " apt-get install python-tk "

TkInter (python-tk) is used to draw GUI. It is not necessary in this case. You can ignore it.

@AloshkaD
Copy link

Changing "-D_GLIBCXX_USE_CXX11_ABI=0" to "-D_GLIBCXX_USE_CXX11_ABI=1" worked for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests