Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] CUDA 10.0 and tensorflow 1.14 for docker install #682

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ RUN apt-get update && \
ARG CUDA_SUPPORT
ENV CUDA_SUPPORT=${CUDA_SUPPORT}
RUN if [ "$CUDA_SUPPORT" = "yes" ]; then \
/tmp/components/cuda/install.sh; \
/tmp/components/cuda/install-cuda-10-0.sh; \
dustindorroh marked this conversation as resolved.
Show resolved Hide resolved
fi

# TODO: CHANGE URL
Expand Down
2 changes: 1 addition & 1 deletion components/cuda/docker-compose.cuda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ services:
environment:
NVIDIA_VISIBLE_DEVICES: all
NVIDIA_DRIVER_CAPABILITIES: compute,utility
NVIDIA_REQUIRE_CUDA: "cuda>=9.0"
NVIDIA_REQUIRE_CUDA: "cuda>=10.0 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=410,driver<411"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the string work for "Tesla" only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It's not clear if you need to have tesla for using cuda for cuda 10.0 https://docs.nvidia.com/cuda/archive/10.0/cuda-toolkit-release-notes/index.html

But it may be more of a requirement of running nvidia-docker
https://github.com/NVIDIA/nvidia-docker/wiki/CUDA

I got this line from: https://gitlab.com/nvidia/container-images/cuda/blob/master/dist/ubuntu16.04/10.0/base/Dockerfile

43 changes: 43 additions & 0 deletions components/cuda/install-cuda10-0.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#!/usr/bin/env bash
#
# cuda 10.0 base - https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/10.0/base/Dockerfile
# cuda 10.0 runtime - https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/10.0/runtime/Dockerfile
# cudnn7 - https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/10.0/runtime/cudnn7/Dockerfile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you combined all these docker files together and put here instructions. It will be difficult for us to support them because Nvidia can changes original files in the future and it could break our code.

@azhavoro , I'm thinking about a separate container which can execute some registered functions (e.g. TF annotation) using https://docs.python.org/3/library/xmlrpc.html. Could you please recommend something here?

Copy link
Contributor Author

@dustindorroh dustindorroh Sep 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nmanovic I agree. I was following the previous example https://github.com/opencv/cvat/blob/develop/components/cuda/install.sh
I looked like to me the base runtime and cudnn was combined into one file. I was unsure if this was desired on your part, maybe having less docker layers was desired.
If there is interest in this I can write it this way. It may help with composing different cuda versions.

#
#
set -e

apt-get update && apt-get install -y --no-install-recommends ca-certificates apt-transport-https gnupg-curl && \
rm -rf /var/lib/apt/lists/* && \
NVIDIA_GPGKEY_SUM=d1be581509378368edeec8c1eb2958702feedf3bc3d17011adbf24efacce4ab5 && \
NVIDIA_GPGKEY_FPR=ae09fe4bbd223a84b2ccfce3f60f4b3d7fa2af80 && \
apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub && \
apt-key adv --export --no-emit-version -a ${NVIDIA_GPGKEY_FPR} | tail -n +5 > cudasign.pub && \
echo "${NVIDIA_GPGKEY_SUM} cudasign.pub" | sha256sum -c --strict - && rm cudasign.pub && \
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/cuda.list && \
echo "deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list

CUDA_VERSION=10.0.130
NCCL_VERSION=2.4.2
CUDNN_VERSION=7.6.0.64
CUDA_PKG_VERSION="10-0=${CUDA_VERSION}-1"
echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf
echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf
echo 'export PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}' >> ${HOME}/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:${LD_LIBRARY_PATH}' >> ${HOME}/.bashrc

# For libraries in the cuda-compat-* package: https://docs.nvidia.com/cuda/eula/index.html#attachment-a
apt-get update && apt-get install -y --no-install-recommends \
cuda-cudart-${CUDA_PKG_VERSION} \
cuda-compat-10-0 \
cuda-libraries-${CUDA_PKG_VERSION} \
cuda-nvtx-${CUDA_PKG_VERSION} \
libnccl2=${NCCL_VERSION}-1+cuda10.0 \
libcudnn7=${CUDNN_VERSION}-1+cuda10.0 && \
apt-mark hold libnccl2 libcudnn7 && \
ln -s cuda-10.0 /usr/local/cuda && \
rm -rf /var/lib/apt/lists/* \
/etc/apt/sources.list.d/cuda.list /etc/apt/sources.list.d/nvidia-ml.list

pip3 uninstall -y tensorflow
pip3 install --no-cache-dir tensorflow-gpu==1.14.0