Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cant works on theano 1,theano.sandbox.cuda.dnn is discarded in new version #3

Open
cdb0y511 opened this issue Mar 7, 2018 · 7 comments

Comments

@cdb0y511
Copy link

cdb0y511 commented Mar 7, 2018

Could you update your source file layer.py?
Because theano.sandbox.cuda.dnn is discarded in theano 1(>theano 0.9).
from theano.sandbox.cuda.dnn import gpu_contiguous, GpuDnnConvDesc, gpu_alloc_empty, GpuDnnConv3dGradW wont work, and if lasagne.utils.theano.sandbox.cuda.dnn_available() in similarityNet.py.
Could you use theano.gpuarray.dnn instead?
I cant replace gpu_contiguous, GpuDnnConvDesc, gpu_alloc_empty, GpuDnnConv3dGradW with classes of theano.gpuarray.dnn by myself.
And I cant backwards to theano 0.9 either, due to the new vision of cudnn does not support old theano and pygpu.
plz help me,thanks

@Rubikplayer
Copy link

Rubikplayer commented Mar 8, 2018

Nice work for 3D reconstructon! I have some simliar issues here.

@mjiUST Could you give us some tips to make the code running on a newer system?
My system is:

  • Ubuntu 16.04.2 LTS (amd64)
  • CUDA 8.0 / 9.1, cuDNN 7.1 (edit: I installed cuDNN 5.1 instead)

Or:

Do you have suggestions for running/training without cuDNN?

I observed there are some if-branch, like in similarityNet.py:

if lasagne.utils.theano.sandbox.cuda.dnn_available(): # when cuDNN available
    from lasagne.layers.dnn import Conv2DDNNLayer as ConvLayer 
else:
    from lasagne.layers import Conv2DLayer as ConvLayer

But in layers.py and SurfaceNet.py, some cudnn functions are hardcoded

  • from lasagne.layers.dnn import Conv3DDNNLayer, Pool3DDNNLayer
  • from theano.sandbox.cuda.dnn import gpu_contiguous, GpuDnnConvDesc, gpu_alloc_empty, GpuDnnConv3dGradW

Following the same logic in the if-branch, maybe for Conv3DDNNLayer and Pool3DDNNLayer:

I might be able to hack it to:

from lasagne.layers import Conv3DLayer as Conv3DDNNLayer
from lasagne.layers import Pool3DLayer as Pool3DDNNLayer

But for other functions like gpu_contiguous, I haven't found any functions to replace so far. If you got any suggestion, please let us know! Thanks!

@cdb0y511 How are things going with you?

@mjiUST
Copy link
Owner

mjiUST commented Mar 8, 2018

Dear @cdb0y511 @Rubikplayer ,

Thanks for the issue report. I specified the older Theano version

conda install -c rdonnelly theano -y # 0.9.0 version theano

Since the 3D dilated conv layer was implemented using some APIs in CUDNN, I'm not sure whether we could easily discard CUDNN.

If you are worried about that the installation may affect your existing packages' version. Please feel free to use the SurfaceNet/installEnv.sh, that will not change anything of your existing python, theano, and ~/.bashrc. What you need to do is to specify the CUDA/CUDNN pathes accordingly. Please refer to the updated README.

Hope this may help.

@cdb0y511
Copy link
Author

cdb0y511 commented Mar 8, 2018

@mjiUST
Thanks a lot. And well done. I am a Ph.D. candidate too. Maybe we can disscuss about your work one day.
but frist , I want to figure out how it works.
I have read the installEnv.sh. And I totally understand how to use conda and install specified theano 0.9( even your scrpits install latest theano).
You dont need to discatd CUDNN.
The problem is theano.sandbox is an old back end. You'd better switch to a new backend theano.gpuarray. pls see https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end(gpuarray)
Otherwise new drivers and new cuda may not compatible with it. I know you use the nvidia driver 375, cuda 8.0, cudnn v5.1. But I need cuda 9.0 and cudnnv7.1.1 for tensorflow1.6. So the latest
nvidia driver has been installed.

Even I use theano 0.9.
Exception: ('The following error happened while compiling the node', <theano.sandbox.cuda.DnnVersion object at 0x7f9028151110>(), '\n', 'The nvidia driver version installed with this OS does not give good results for reduction.Installing the nvidia driver available on the same download page as the cuda package will fix the problem: http://developer.nvidia.com/cuda-downloads')

The only way is switching to a new backend theano.gpuarray. Or give up cuda 9.0 and cudnnv7.1.1. Go back to nvidia driver 375, cuda 8.0, cudnn v5.1. Its hard to choose. And it certainly limits your work.

@Rubikplayer I cant find gpu_contiguous too,even in theano 0.9's doucuments. So I guess only the original author can fix it.

@mjiUST
Copy link
Owner

mjiUST commented Mar 8, 2018

@cdb0y511
Thanks for your interest and looking forward to having further discussion.

I don't know whether you have tried this method: say you have both /usr/local/cuda-8.0 and /usr/local/cuda that linked to cuda-9.0. Change the 1st line of ~/miniconda2/envs/SurfaceNet/etc/conda/activate.d/activate-cuda.sh to export CUDA_ROOT=/usr/local/cuda-8.0 which will not affact your settings in .bashrc before you source activate SurfaceNet. In this way, even though you may have multiple cuda versions in your PC, a particular one could be specified without ANY influence with your other projects (for example, tensorflow and pytorch).

Similarly, one can also specify a cudnn without influence with other projects by changing the 1st line of ~/miniconda2/envs/SurfaceNet/etc/conda/activate.d/activate-cudnn.sh to any path where the cudnn folder located, e.g., export CUDNN_ROOT=/home/<user-name>/libs/cudnn-8.0-v5.1.

I highly recommend you install CUDNN outside of CUDA folder, so that you can have any combination of CUDA+CUDNN by defining specific environment variables in different conda_envs.

Please feel free to post any queries.

@Rubikplayer
Copy link

Rubikplayer commented Mar 8, 2018

@mjiUST @cdb0y511
Yes, yesterday I did the following, and it can start running the main.py (although some other error occurs):

  • Install CuDNN 5.1 (as you mentioned in "install outside cuda folder")
  • Install theano 0.9, by conda install theano=0.9
  • Specify CUDA version, by exporting environment variable
export CUDA_ROOT=/usr/local/cuda-8.0
export PATH=$PATH:$CUDA_ROOT/binexport 
export LD_LIBRARY_PATH=$CUDA_ROOT/lib64:$LD_LIBRARY_PATH
export CPATH=$CUDA_ROOT/include:$CPATH
export LIBRARY_PATH=$CUDA_ROOT/lib64:$LIBRARY_PATH

and setting theano config in ~/.theanorc:

[cuda] 
root=/usr/local/cuda-8.0

@cdb0y511 You can also have a try. I have multiple CUDA installed. Also I installed two versions of CuDNN (although I might have overwritten 7.1 with 5.1).

For the error I encountered, I will open another issue. Thanks for the feedback!
Edit: new issue opened: (#4)

@mjiUST
Copy link
Owner

mjiUST commented Mar 9, 2018

@Rubikplayer
Thank you for the feedback. To be precise,

  • before we specify outside cudnn, the original one should be removed OR unlinked (remove from the env variables: LD_LIBRARY_PATH, CPATH, and LIBRARY_PATH)

  • installation of 0.9 version Theano please use the command:

    conda install -c rdonnelly theano -y # 0.9.0 version theano
    Since the one you mentioned conda install theano=0.9 will result in 0.9 version with different commit hash.

@Rubikplayer
Copy link

@mjiUST Thanks for the response.

  • Yes, as I found in another thread, indeed different versions of CuDNN can result in errors.
  • Thanks for the info! It seems the conda-installed version is okay for now. If any problem, I will switch back to the version you specified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants