Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CICD] update dev container #402

Merged
merged 1 commit into from
Apr 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,8 +144,8 @@ At the same time, we find some OPs used by TRFA have better performance, so we h

For all developers, we recommend you use the development docker containers which are all GPU enabled:
```sh
docker pull tfra/dev_container:latest-python3.9 # "3.9" “3.10” are all avaliable.
docker run --privileged --gpus all -it --rm -v $(pwd):$(pwd) tfra/dev_container:latest-3.8
docker pull tfra/dev_container:latest-tf2.15.1-python3.9 # Available tensorflow and python combinations can be found [here](https://www.tensorflow.org/install/source#linux)
docker run --privileged --gpus all -it --rm -v $(pwd):$(pwd) tfra/dev_container:latest-tf2.15.1-3.9
```

#### CPU Only
Expand Down Expand Up @@ -182,9 +182,9 @@ export CUDNN_INSTALL_PATH="/usr/lib/x86_64-linux-gnu"
python configure.py
```
And then build the pip package and install:
```sh
```sh`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bash

bazel build --enable_runfiles build_pip_pkg
bazel-bin/build_pip_pkg artifacts
bazel-bin/build_pip_pkg artifacts`
pip install artifacts/tensorflow_recommenders_addons_gpu-*.whl
```

Expand Down
82 changes: 77 additions & 5 deletions tools/build_dev_container.sh
Original file line number Diff line number Diff line change
@@ -1,12 +1,84 @@
#!/usr/bin/env bash

# To push a new version, run:
# $ TF_VERSION=2.15.1 PY_VERSION=3.10 bash ./tools/build_dev_container.sh
# $ docker push tfra/dev_container:latest-tf2.15.1-python3.10

set -x -e

if [ -z $TF_VERSION ] ; then
export TF_VERSION='2.15.1'
fi

if [ -z $PY_VERSION ] ; then
export PY_VERSION='3.9'
fi

if [ -z $HOROVOD_VERSION ] ; then
export HOROVOD_VERSION='0.28.1'
fi

export TF_NEED_CUDA=1
export TF_NAME='tensorflow'

# if tensorflow version >= 2.6.0 and <= 2.15.9
if [[ "$TF_VERSION" =~ ^2\.(16)\.[0-9]+$ ]] ; then
export BUILD_IMAGE="tfra/nosla-cuda12.3-cudnn8.9-ubuntu20.04-manylinux2014-python$PY_VERSION"
export TF_CUDA_VERSION="12.3"
export TF_CUDNN_VERSION="8.9"
elif [[ "$TF_VERSION" =~ ^2\.(15)\.[0-9]+$ ]] ; then
export BUILD_IMAGE="tfra/nosla-cuda12.2-cudnn8.9-ubuntu20.04-manylinux2014-python$PY_VERSION"
export TF_CUDA_VERSION="12.2"
export TF_CUDNN_VERSION="8.9"
elif [[ "$TF_VERSION" =~ ^2\.(14)\.[0-9]+$ ]] ; then
export BUILD_IMAGE="tfra/nosla-cuda11.8-cudnn8.7-ubuntu20.04-manylinux2014-python$PY_VERSION"
export TF_CUDA_VERSION="11.8"
export TF_CUDNN_VERSION="8.7"
elif [[ "$TF_VERSION" =~ ^2\.(12|13)\.[0-9]+$ ]] ; then
export BUILD_IMAGE="tfra/nosla-cuda11.8-cudnn8.6-ubuntu20.04-manylinux2014-python$PY_VERSION"
export TF_CUDA_VERSION="11.8"
export TF_CUDNN_VERSION="8.6"
elif [[ "$TF_VERSION" =~ ^2\.([6-9]|10|11)\.[0-9]+$ ]] ; then
export BUILD_IMAGE="tfra/nosla-cuda11.2-cudnn8-ubuntu20.04-manylinux2014-python$PY_VERSION"
export TF_CUDA_VERSION="11.2"
export TF_CUDNN_VERSION="8.1"
elif [ $TF_VERSION == "2.4.1" ] ; then
export BUILD_IMAGE='tfra/nosla-cuda11.0-cudnn8-ubuntu18.04-manylinux2010-multipython'
export TF_CUDA_VERSION="11.0"
export TF_CUDNN_VERSION="8.0"
elif [ $TF_VERSION == "1.15.2" ] ; then
export BUILD_IMAGE='tfra/nosla-cuda10.0-cudnn7-ubuntu16.04-manylinux2010-multipython'
export TF_CUDA_VERSION="10.0"
export TF_CUDNN_VERSION="7.6"
else
echo "TF_VERSION is invalid: $TF_VERSION!"
exit 1
fi

echo "BUILD_IMAGE is $BUILD_IMAGE"
echo "TF_CUDA_VERSION is $TF_CUDA_VERSION"
echo "TF_CUDNN_VERSION is $TF_CUDNN_VERSION"

if [ -z $HOROVOD_VERSION ] ; then
export HOROVOD_VERSION='0.28.1'
fi

export PROTOBUF_VERSION='3.19.6'
if [[ "$TF_VERSION" =~ ^2\.1[3-9]\.[0-9]$ ]] ; then
export PROTOBUF_VERSION='4.23.4'
fi

docker build \
-f tools/docker/dev_container.Dockerfile \
--build-arg TF_VERSION=2.15.1 \
--build-arg TF_PACKAGE=tensorflow \
--build-arg PY_VERSION=$PY_VERSION \
--build-arg HOROVOD_VERSION=$HOROVOD_VERSION \
--build-arg PY_VERSION \
--build-arg TF_VERSION \
--build-arg TF_NAME \
--build-arg TF_NEED_CUDA \
--build-arg TF_CUDA_VERSION \
--build-arg TF_CUDNN_VERSION \
--build-arg HOROVOD_VERSION \
--build-arg BUILD_IMAGE \
--build-arg PROTOBUF_VERSION \
--no-cache \
--target dev_container \
-t tfra/dev_container:latest-python$PY_VERSION ./
-t tfra/dev_container:latest-tf$TF_VERSION-python$PY_VERSION ./
98 changes: 60 additions & 38 deletions tools/docker/dev_container.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,67 +1,89 @@
#syntax=docker/dockerfile:1.1.5-experimental
ARG IMAGE_TYPE
ARG TF_VERSION
ARG PY_VERSION
ARG TF_NEED_CUDA
ARG TF_NAME
ARG HOROVOD_VERSION
ARG BUILD_IMAGE
ARG PROTOBUF_VERSION

# Currenly all of our dev images are GPU capable but at a cost of being quite large.
# See https://github.com/tensorflow/build/pull/47
FROM tensorflow/build:latest-python$PY_VERSION as dev_container
ARG TF_PACKAGE
ARG TF_VERSION
FROM ${BUILD_IMAGE} as dev_container

RUN echo "#! /usr/bin/python2.7" >> /usr/bin/lsb_release2
RUN cat /usr/bin/lsb_release >> /usr/bin/lsb_release2
RUN mv /usr/bin/lsb_release2 /usr/bin/lsb_release

ARG PY_VERSION
RUN ln -sf /usr/local/bin/python$PY_VERSION /usr/bin/python

RUN pip uninstall $TF_PACKAGE -y
RUN pip install --default-timeout=1000 $TF_PACKAGE==$TF_VERSION
ENV PATH=/dt8/usr/bin:${PATH}
ENV LD_LIBRARY_PATH=/usr/local/lib:${LD_LIBRARY_PATH}
ENV LD_LIBRARY_PATH=/dt8/user/lib64:${LD_LIBRARY_PATH}
ENV LD_LIBRARY_PATH=/dt8/user/lib:${LD_LIBRARY_PATH}
ENV MANPATH=/dt8/user/share/man:${LD_LIBRARY_PATH}
ENV INFOPATH=/dt8/user/share/info

RUN rm -rf /usr/lib/python3
RUN rm -rf /usr/lib/python
RUN ln -sf /usr/lib/python$PY_VERSION /usr/lib/python
RUN ln -sf /usr/lib/python$PY_VERSION /usr/lib/python3
ARG TF_VERSION
ARG TF_NAME
ARG HOROVOD_VERSION
ARG PROTOBUF_VERSION

RUN python -m pip install --upgrade pip
RUN python -m pip install --default-timeout=1000 $TF_NAME==$TF_VERSION

COPY tools/install_deps /install_deps
COPY requirements.txt /tmp/requirements.txt
RUN pip install -r /install_deps/yapf.txt \
-r /install_deps/typedapi.txt \
-r /tmp/requirements.txt
COPY tools/docker/install/install_horovod.sh /install/
RUN /install/install_horovod.sh $HOROVOD_VERSION

RUN pip install setuptools
COPY tools/install_deps/ ./
COPY tools/docker/install/install_pytest.sh /install/
RUN bash /install/install_pytest.sh

RUN bash /install_deps/buildifier.sh
RUN bash /install_deps/clang-format.sh
COPY requirements.txt .
RUN python -m pip install -r requirements.txt

COPY tools/docker/install/install_bazel.sh /install/
RUN /install/install_bazel.sh "5.1.1"
RUN python -m pip install tensorflow-io

ENV ADDONS_DEV_CONTAINER="1"
RUN python -m pip install --upgrade protobuf==$PROTOBUF_VERSION

RUN apt-get update && apt-get install -y \
openssh-client \
cmake
COPY tools/install_deps/yapf.txt ./
RUN pip install -r ./yapf.txt

RUN apt-get update && apt-get remove -y python3-apt \
&& apt-get install -y python3-apt
RUN pip install setuptools

COPY tools/docker/install/install_openmpi.sh /install/
RUN /install/install_openmpi.sh "4.1.1"
COPY tools/install_deps/buildifier.sh ./buildifier.sh
RUN bash buildifier.sh

COPY tools/docker/install/install_nccl.sh /install/
RUN /install/install_nccl.sh "2.8.4-1+cuda11.2"
COPY tools/install_deps/clang-format.sh ./clang-format.sh
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jq , just updated to add this clang-format installation, may you have time to review and approve it? Thanks!

RUN bash clang-format.sh

COPY tools/docker/install/install_horovod.sh /install/
RUN /install/install_horovod.sh $HOROVOD_VERSION
ARG IMAGE_TYPE
ARG TF_VERSION
ARG PY_VERSION
ARG TF_NEED_CUDA
ARG TF_CUDA_VERSION
ARG TF_CUDNN_VERSION
ARG TF_NAME
ARG HOROVOD_VERSION
ARG BUILD_IMAGE
ARG PROTOBUF_VERSION

# write default env for user
RUN echo "export PATH=$PATH" >> ~/.bashrc
RUN echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> ~/.bashrc
RUN echo "export MANPATH=$MANPATH" >> ~/.bashrc
RUN echo "export INFOPATH=$INFOPATH" >> ~/.bashrc
RUN echo "export TF_VERSION=$TF_VERSION" >> ~/.bashrc
RUN echo "export PY_VERSION=$PY_VERSION" >> ~/.bashrc
RUN echo "export TF_NAME=$TF_NAME" >> ~/.bashrc
RUN echo "export IMAGE_TYPE=$IMAGE_TYPE" >> ~/.bashrc
RUN echo "export HOROVOD_VERSION=$HOROVOD_VERSION" >> ~/.bashrc
RUN echo "export TF_NAME=$TF_NAME" >> ~/.bashrc
RUN echo "export PROTOBUF_VERSION=$PROTOBUF_VERSION" >> ~/.bashrc
RUN echo "export TF_NEED_CUDA=1" >> ~/.bashrc
RUN echo "export TF_CUDA_VERSION=11.2" >> ~/.bashrc
RUN echo "export TF_CUDNN_VERSION=8.1" >> ~/.bashrc
RUN echo "export TF_CUDA_VERSION=$TF_CUDA_VERSION" >> ~/.bashrc
RUN echo "export TF_CUDNN_VERSION=$TF_CUDNN_VERSION" >> ~/.bashrc
RUN echo "export CUDA_TOOLKIT_PATH='/usr/local/cuda'" >> ~/.bashrc
RUN echo "export CUDNN_INSTALL_PATH='/usr/lib/x86_64-linux-gnu'" >> ~/.bashrc

# Clean up
RUN apt-get autoremove -y \
&& apt-get clean -y \
&& rm -rf /var/lib/apt/lists/*
Loading