Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda 9 - {Question] #443

Open
bododge opened this issue Jan 29, 2018 · 6 comments
Open

Cuda 9 - {Question] #443

bododge opened this issue Jan 29, 2018 · 6 comments

Comments

@bododge
Copy link

bododge commented Jan 29, 2018

Should expect any problems when updating to Cuda 9? I have several GPU renderers that will require the update with newer versions, but I'm wondering if it might cause problems for my favorite terminal app, neural style? Has anyone tried it yet?

@ProGamerGov
Copy link

I've been running CUDA Version 9.0.176 for a while now without issue. I did run into some installation issues which were solved by disabling the half operators (also improve memory usage): torch/cutorch#797

Memory usage seems a bit worse with Cuda 9 as per this issue.

@subzerofun
Copy link

@ProGamerGov Sorry to bother you here with this issue, but it seems you got past the cutorch
"(...)THCudaHalfTensor(...)" error and i thought maybe you know a different way (or have some insight to this specific issue) to fix the torch install – other than the mentioned export of the
"-D__CUDA_NO_HALF_OPERATORS__" var before starting install.sh. I'm really desperate to fix this damn torch install. Funny thing is, i just wanted to test your Deep Photo Style Transfer Repo again after playing around with it for weeks after it came out last year. Thought reinstalling Torch with CUDA 9 would just take ~20min. Now i'm running out of ideas what to do. Don't want to revert back to CUDA 8 ... Hope you have some idea how else this error could be fixed.

@ProGamerGov
Copy link

ProGamerGov commented Feb 17, 2018

@subzerofun Whats terminal output in relation to the error?

It looks like I might have solved the issue like this according my .bash_history:

cd torch
./clean.sh
bash install-deps;
./install.sh
cd ..
export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
cd torch
./clean.sh
./install.sh
cd ..
cd torch
luarocks list

Then I appear to have had a few issues with cuDNN:

cd ~
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-9.0/lib64/
sudo cp cuda/include/cudnn.h /usr/local/cuda-9.0/include/
nvidia-smi
sudo apt-get install libprotobuf-dev protobuf-compiler
luarocks install loadcaffe
sudo apt-get install luarocks
luarocks install loadcaffe
sudo apt-get remove luarocks
bash install-deps;
cd troch
bash install-deps
cd torch
bash install-deps
. ~/torch/install/bin/torch-activate
luarocks
th
cd ..
luarocks install loadcaffe
cd neural-style
sh models/download_models.sh
th neural_style.lua -gpu -1 -print_iter 1
luarocks install cutorch
luarocks install cunn
th -e "require 'cutorch'; require 'cunn'; print(cutorch)"
cd ..
luarocks install cudnn
th
cd neural-style
th neural_style.lua -gpu 0 -backend cudnn
cd ..
sudo dpkg -i libcudnn7_7.0.3.11-1+cuda9.0_amd64.deb
th neural_style.lua -gpu 0 -backend cudnn
cd neural-style
th neural_style.lua -gpu 0 -backend cudnn
cd ..
tar -xzvf cudnn-9.0-linux-x64-v7.tgz
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h
/usr/local/cuda/lib64/libcudnn*
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
cd neural-style
th neural_style.lua -gpu 0 -backend cudnn
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
cd ..
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
luarocks install cudnn
cd neural-style
th neural_style.lua -gpu 0 -backend cudnn
echo $LD_LIBRARY_PATH
cd ..
`export CUDNN_ROOT="/usr/local/cuda-8.0/lib64/libcudnn.so.5" # you can maybe add this line into ~/.bashrc for future use

luarocks install cudnn
cd neural-style
th neural_style.lua -gpu 0 -backend cudnn
luarocks install cutorch
luarocks install cunn
luarocks install cudnn
th neural_style.lua -gpu 0 -backend cudnn
export CUDNN_ROOT="/usr/local/cuda-9.0/lib64/libcudnn.so.5
cd ..
export CUDNN_ROOT="/usr/local/cuda-9.0/lib64/libcudnn.so.5


export CUDNN_ROOT="/usr/local/cuda-9.0/lib64/libcudnn.so.5
th neural_style.lua -gpu 0 -backend cudnn
cd neural-style
th neural_style.lua -gpu 0 -backend cudnn
cd ..
export CUDNN_ROOT="/usr/local/cuda-9.0/lib64/libcudnn.so.5" # you can maybe add this line into ~/.bashrc for future use
export CUDNN_ROOT="/usr/local/cuda-9.0/lib64/libcudnn.so.7" #
cd neural-style
th neural_style.lua -gpu 0 -backend cudnn
cd ..
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:$LD_LIBRARY_PATH
cd neural-style
th neural_style.lua -gpu 0 -backend cudnn
luarocks install cutorch
cd ..
sudo dpkg -i libcudnn7-doc_7.0.3.11-1+cuda9.0_amd64.deb
sudo chmod a+r /usr/local/cuda/include/cudnn.h
cd neural-style
th neural_style.lua -gpu 0 -backend cudnn
export CUDNN_PATH = "/usr/lib/aarch64-linux-gnu//libcudnn.so.7"
export CUDNN_PATH="/usr/local/cuda-9.0/lib64/libcudnn.so.7"
th neural_style.lua -gpu 0 -backend cudnn
luarocks install cudnn
th neural_style.lua -gpu 0 -backend cudnn
git clone https://github.com/soumith/cudnn.torch.git -b R7 && cd cudnn.torch && luarocks make cudnn-scm-1.rockspec
cd ..
th neural_style.lua -gpu 0 -backend cudnn
sudo rm -rf /cudnn.torch
cd ..
cd torch
git clone https://github.com/soumith/cudnn.torch.git -b R7 && cd cudnn.torch && luarocks make cudnn-scm-1.rockspec
cd ..
cd neural-style
th neural_style.lua -gpu 0 -backend cudnn
sudo reboot
cd neural-style
th neural_style.lua -gpu 0 -backend cudnn

This was what I had up until the issue with Torch:

cd ~/
curl -s https://raw.githubusercontent.com/torch/ezinstall/master/install-deps | bash
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; ./install.sh
sudo apt-get install cmake
sudo apt-get install python-pip
sudo apt-get install python3-pip
cd ..
sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/ 
sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo apt-key adv --fetch-keys 
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda
sudo apt install nvidia-cuda-toolkit
which nvcc
nvcc --version
nvidia-smi
sudo apt remove nvidia-cuda-toolkit
export PATH=/usr/local/cuda-8.0/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-9.0/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:$LD_LIBRARY_PATH
nvidia-smi
sudo dpkg -i libcudnn7_7.0.3.11-1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-dev_7.0.3.11-1+cuda9.0_amd64.deb
uname -a
git clone https://github.com/jcjohnson/neural-style
git clone https://github.com/nagadomi/waifu2x
git clone https://github.com/chuanli11/CNNMRF
git clone https://github.com/leongatys/NeuralImageSynthesis
git clone https://github.com/leongatys/fast-neural-style
git clone https://github.com/jcjohnson/fast-neural-style
sudo apt-get install libprotobuf-dev protobuf-compiler
luarocks install loadcaffe
sudo apt install luarocks
luarocks install loadcaffe
cd torch
source ~/.bashrc
cd ..
luarocks
luarocks install loadcaffe
# in a terminal, run the commands WITHOUT sudo
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
sudo apt-get remove luarocks
# in a terminal, run the commands WITHOUT sudo
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh
nvidia-smi
sudo reboot
sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
sudo apt-get install cuda
sudo apt-get update
sudo apt-get install cuda
sudo apt autoremove
nvidia-smi

There may or may not have been some manual things that I did, so this may be incomplete.

@subzerofun
Copy link

Thanks for the quick reply and the bash commands!

The first part is nearly a mirror of my attempts, tried it ~4 times with slightly different cmds and env vars, but cutorch simply refused to compile without errors.

I've tried the nvcc flag export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__", but it did not seem to "turn off half operators" ... i got a dozens of errors from every compiled source file.

Warnings look like this:

warning: 'THCudaHalfStorage_get' has C-linkage specified, but returns user-defined type 'half' (aka '__half') which is incompatible with C [-Wreturn-type-c-linkage]
extern "C" half THCudaHalfStorage_get(THCState * state, const THCudaHalfStorage *, ptrdiff_t);

For some reason the special nvcc flag did not help in my case, although i've tried with:

  • TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" ./install.sh and
  • export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
    ./install.sh

And this line (cmake output) irritates me – the "no_half..." flag is set two times. But my environment variables are clean!
-- CUDA_NVCC_FLAGS: -D__CUDA_NO_HALF_OPERATORS__;-D__CUDA_NO_HALF_OPERATORS__;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_35,code=sm_35;-DCUDA_HAS_FP16=1

compute_61 = GTX 1080 Ti, compute_35 = GTX 780


Since the standard install (the first part of your history) always failed for me at [21%] (regardless of the flag) i've followed another tip:

1. Comment out the cutorch and cunn packages in install.sh:

https://github.com/torch/distro/blob/0219027e6c4644a0ba5c5bf137c989a0a8c9e01b/install.sh#L132-L137

if [ -x "$path_to_nvcc" ]
then
    echo "Found CUDA on your machine. Installing CUDA packages"
    #cd ${THIS_DIR}/extra/cutorch && $PREFIX/bin/luarocks make rocks/cutorch-scm-1.rockspec || exit 1
    #cd ${THIS_DIR}/extra/cunn    && $PREFIX/bin/luarocks make rocks/cunn-scm-1.rockspec    || exit 1
fi

and i can finally get to [100%], but with literally hundreds of "HalfTensor" warnings. That can't be too good, can it? Can't remember that my last torch install produced that many warnings...


2. Activate torch paths + environment:

export PATH=/Users/thatsme/torch/install/bin:$PATH
. /Users/thatsme/torch/install/bin/torch-activate

3. Manually install the two missing pkgs from the source repos:

  • git clone (...)/[cutorch, cunn]
  • cd [cutorch, cunn]
  • TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" ~/torch/install/bin/luarocks make rocks/[cutorch, cunn]-scm-1.rockspec

And again, hundreds of warnings like:

warning: 'THC_float2half' has C-linkage specified, but returns user-defined type 'half' (aka '__half') which is incompatible with C [-Wreturn-type-c-linkage]
extern "C" half THC_float2half(float a);
                ^
/Users/squanchy-birdup/torch/install/include/THC/generic/THCStorage.h:28:17: warning: 'THCudaHalfStorage_get' has C-linkage specified, but returns user-defined type 'half' (aka '__half') which is incompatible with C [-Wreturn-type-c-linkage]
extern "C" half THCudaHalfStorage_get(THCState * state, const THCudaHalfStorage *, ptrdiff_t);

4. Run test.sh:

The first batch of 164/164 passes and then the second fails here:

134/210 Kmeans ............ [WAIT]./test.sh: line 34: 89405 Abort trap: 6           
th -lnn -e "nn.test()"

My Tensorflow 1.4 + PyTorch installs were relatively painless and both frameworks are working, so i'm sure that the cudnn files (include + lib) are in the right place. My bash source only has entries for CUDA 9 & cudnn 7. Since macOS has some small deviations in the cuda directories - compared to linux – i can't follow your second part exactly. One thing i don't understand in your cmds: export PATH=/usr/local/cuda-8.0/bin:$PATH – but you are using CUDA 9 right now.

Sorry for the excessively long message 📝, but i've gone through the Issue sections of torch/distro and cutorch numerous times and somehow nothing is bringing me closer to a working install ...

Do you have any idea what's causing my issues? Why the -D__CUDA_NO_HALF_OPERATORS__ is not working?

@ProGamerGov
Copy link

@subzerofun Unfortunately I don't know a whole lot about Torch itself or cutorch. So I wouldn't know where to start in regards to your issue.

@subzerofun
Copy link

@ProGamerGov OK, thanks nonetheless for the help! Wasted a few hours too much, forcing Torch to work with CUDA 9.0 or 9.1 ... Reverted back to 8.0 and now the install went fine.

The export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" fix apparently only works on Linux (maybe WIN too?).

Now i have three different cuda versions installed 🙃... will write a bash function that can toggle between CUDA 9.0 and 8.0 for Torch: Since the standard CUDA_ROOT path on macOS is always /usr/local/cuda i just need to automate switching all cuda & cudnn symlinks, update relevant shell env variables – and that should do the trick. Hopefully. Otherwise i would need to compile Tensorflow and PyTorch again with CUDA 8.0 ...

A way to create a completely self-contained torch install (incl. all CUDA 8.0 libs and source files) would be a better solution... Docker would be perfect for it – but GPU passthrough doesn't work in the macOS version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants