Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ready to be reviewed] Support Customized HCTR Repo in the unified containers #85

Merged
merged 35 commits into from
Jan 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
c2c9f34
Update dockerfile.ctr
zehuanw Dec 12, 2021
75d85ad
Update dockerfile.ctr
zehuanw Dec 12, 2021
2074b53
Update dockerfile.ctr
zehuanw Dec 12, 2021
4acaa53
Update dockerfile.ctr
zehuanw Dec 19, 2021
3d87c5c
Update dockerfile.ctr
zehuanw Dec 19, 2021
2f6b385
Create dockerfile.ctr.test
zehuanw Dec 19, 2021
c790e24
Update dockerfile.ctr.test
zehuanw Dec 19, 2021
3e57ebd
Update dockerfile.ctr.test
zehuanw Dec 19, 2021
c093962
Update dockerfile.ctr
zehuanw Dec 19, 2021
192ab23
Update dockerfile.tf
zehuanw Dec 19, 2021
0664115
Update dockerfile.tri
zehuanw Dec 19, 2021
800d53d
Update dockerfile.tf
zehuanw Dec 19, 2021
ee145fd
Update dockerfile.ctr
zehuanw Dec 19, 2021
f366bd2
Update dockerfile.tri
zehuanw Dec 19, 2021
d13d492
Delete dockerfile.ctr.test
zehuanw Dec 20, 2021
a7d6222
Update dockerfile.tri
zehuanw Dec 20, 2021
ccbe0c6
Update dockerfile.ctr
zehuanw Dec 21, 2021
3ca9968
Update dockerfile.tf
zehuanw Dec 21, 2021
fe5098c
Update dockerfile.tri
zehuanw Dec 21, 2021
2df9f30
Update dockerfile.ctr
zehuanw Dec 21, 2021
5f0a260
Update dockerfile.tf
zehuanw Dec 21, 2021
e400922
refine dockerfile
shijieliu Dec 28, 2021
eab7b12
refine dockerfile
shijieliu Dec 28, 2021
8b492c6
refine dockerfile
shijieliu Dec 29, 2021
e0ee69e
solve hugectr env issue
shijieliu Dec 30, 2021
f38afe9
disable install arrow with orc
shijieliu Dec 30, 2021
d8bf047
Merge pull request #1 from shijieliu/alex
zehuanw Dec 31, 2021
c11f0f6
replace DEV_MODE with HCTR_DEV_MODE
shijieliu Dec 31, 2021
c9ec460
replace HCTR_DEV_MODE with HUGECTR_DEV_MODE
shijieliu Dec 31, 2021
44784d9
Merge pull request #2 from shijieliu/alex
zehuanw Dec 31, 2021
74b8695
resolve confilicts
shijieliu Jan 19, 2022
b3bb9f8
add flag to skip nvt in dockerfile.tf
shijieliu Jan 20, 2022
7024ff0
Merge pull request #3 from shijieliu/main
zehuanw Jan 21, 2022
ab88433
add hdfs dependency
shijieliu Jan 26, 2022
6187d65
Merge pull request #4 from shijieliu/main
shijieliu Jan 26, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 49 additions & 14 deletions docker/dockerfile.ctr
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,35 @@ RUN apt-get update -y && \
./configure && make -j$(nproc) && make install && \
rm -rf /var/tmp/librdkafka

#Install Java
RUN mkdir -p /var/tmp && cd /var/tmp && wget https://download.java.net/java/GA/jdk16.0.2/d4a915d82b4c4fbb9bde534da945d746/7/GPL/openjdk-16.0.2_linux-x64_bin.tar.gz && \
mkdir -p /usr/java && tar -zxvf ./openjdk-16.0.2_linux-x64_bin.tar.gz -C /usr/java && \
rm -rf ./openjdk-16.0.2_linux-x64_bin.tar.gz

#Intall libhdfs client
RUN mkdir -p /var/tmp && cd /var/tmp && wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz && \
tar -zxvf ./hadoop-3.3.1.tar.gz && rm -rf hadoop-3.3.1.tar.gz && \
cp ./hadoop-3.3.1/lib/native/libhdfs.so.0.0.0 /usr/local/lib/ && cp hadoop-3.3.1/include/hdfs.h /usr/local/include/ && \
mv ./hadoop-3.3.1 /usr/local/hadoop && cd /usr/local/lib/ && ln -s libhdfs.so.0.0.0 libhdfs.so

ENV JAVA_HOME=/usr/java/jdk-16.0.2
ENV PATH=$JAVA_HOME/bin:$PATH
ENV LD_LIBRARY_PATH=$JAVA_HOME/lib/server
ENV HADOOP_HOME=/usr/local/hadoop
ENV PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
ENV HDFS_NAMENODE_USER=root
ENV HDFS_DATANODE_USER=root
ENV HDFS_SECONDARYNAMENODE_USER=root
ENV YARN_RESOURCEMANAGER_USER=root
ENV YARN_NODEMANAGER_USER=root

# Arguments "_XXXX" are only valid when $HUGECTR_DEV_MODE==false
ARG HUGECTR_DEV_MODE=false
ARG _HUGECTR_BRANCH=master
ARG _HUGECTR_REPO="github.com/NVIDIA-Merlin/HugeCTR.git"
ARG _CI_JOB_TOKEN=""

RUN pip3 install --no-cache-dir mpi4py ortools sklearn onnx onnxruntime
# Install HugeCTR
ENV CPATH=/usr/local/include:$CPATH

Expand All @@ -125,20 +153,27 @@ RUN ln -s /usr/lib/x86_64-linux-gnu/libibverbs.so.1 /usr/lib/x86_64-linux-gnu/li
RUN rm -rf /usr/lib/x86_64-linux-gnu/libibverbs.so && \
ln -s /usr/lib/x86_64-linux-gnu/libibverbs.so.1.14.36.0 /usr/lib/x86_64-linux-gnu/libibverbs.so

RUN git clone https://github.com/NVIDIA-Merlin/HugeCTR.git /hugectr && \
cd /hugectr && if [ "$RELEASE" == "true" ] && [ ${HUGECTR_VER} != "vnightly" ]; then git fetch --all --tags && git checkout tags/${HUGECTR_VER}; else git checkout master; fi && \
git submodule update --init --recursive && \
mkdir build && cd build && \
LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs/:$LD_LIBRARY_PATH && \
export PATH=$PATH:/usr/local/cuda-${CUDA_SHORT_VERSION}/compat/ && \
cmake -DCMAKE_CXX_COMPILER=/usr/bin/g++ -DCMAKE_C_COMPILER=/usr/bin/gcc -DCMAKE_BUILD_TYPE=Release -DSM="60;61;70;75;80" \
-DENABLE_MULTINODES=ON .. && \
make -j$(nproc) && make install && \
chmod +x /usr/local/hugectr/bin/* && \
chmod +x /usr/local/hugectr/lib/* && \
cd /hugectr/onnx_converter && \
python3 setup.py install && \
rm -rf /hugectr/build
# Install hugectr
RUN if [ "$HUGECTR_DEV_MODE" == "false" ]; then \
git clone https://${_CI_JOB_TOKEN}${_HUGECTR_REPO} /hugectr && cd /hugectr; \
if [ "$RELEASE" == "true" ] && [ "$HUGECTR_VER" != "vnightly" ]; then \
git fetch --all --tags && git checkout tags/${HUGECTR_VER}; \
else \
git checkout ${_HUGECTR_BRANCH}; \
fi; \
git submodule update --init --recursive && \
mkdir build && cd build && \
LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs/:$LD_LIBRARY_PATH && \
export PATH=$PATH:/usr/local/cuda-${CUDA_SHORT_VERSION}/compat/ && \
cmake -DCMAKE_CXX_COMPILER=/usr/bin/g++ -DCMAKE_C_COMPILER=/usr/bin/gcc -DCMAKE_BUILD_TYPE=Release -DSM="60;61;70;75;80" \
-DENABLE_MULTINODES=ON .. && \
make -j$(nproc) && make install && \
chmod +x /usr/local/hugectr/bin/* && \
chmod +x /usr/local/hugectr/lib/* && \
cd /hugectr/onnx_converter && \
python3 setup.py install && \
rm -rf /hugectr; \
fi

ENV PATH=/usr/local/hugectr/bin:$PATH
ENV LIBRARY_PATH=/usr/local/hugectr/lib:$LIBRARY_PATH
Expand Down
41 changes: 30 additions & 11 deletions docker/dockerfile.tf
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,14 @@ RUN pip install dask==2021.09.1 distributed==2021.09.1 dask[dataframe]==2021.09.
RUN pip install gevent==21.8.0
RUN git clone https://github.com/rapidsai/asvdb.git /repos/asvdb && cd /repos/asvdb && python setup.py install

ARG INSTALL_NVT=true
# Install NVTabular
ENV PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION='python'
RUN git clone https://github.com/NVIDIA-Merlin/NVTabular.git /nvtabular/ && \
cd /nvtabular/; if [ "$RELEASE" == "true" ] && [ ${NVTAB_VER} != "vnightly" ] ; then git fetch --all --tags && git checkout tags/${NVTAB_VER}; else git checkout main; fi; \
python setup.py develop;
RUN if [ "$INSTALL_NVT" == "true" ]; then \
git clone https://github.com/NVIDIA-Merlin/NVTabular.git /nvtabular/ && \
cd /nvtabular/; if [ "$RELEASE" == "true" ] && [ ${NVTAB_VER} != "vnightly" ] ; then git fetch --all --tags && git checkout tags/${NVTAB_VER}; else git checkout main; fi; \
python setup.py develop; \
fi

# Install Transformers4Rec
RUN git clone https://github.com/NVIDIA-Merlin/Transformers4Rec.git /transformers4rec && \
Expand All @@ -67,19 +70,35 @@ ENV LD_LIBRARY_PATH=/usr/local/hugectr/lib:$LD_LIBRARY_PATH \
LIBRARY_PATH=/usr/local/hugectr/lib:$LIBRARY_PATH \
PYTHONPATH=/usr/local/hugectr/lib:$PYTHONPATH

RUN git clone https://github.com/rapidsai/asvdb.git build-env && \
pushd build-env && \
python setup.py install && \
popd && \
rm -rf build-env

# Arguments "_XXXX" are only valid when $HUGECTR_DEV_MODE==false
ARG HUGECTR_DEV_MODE=false
ARG _HUGECTR_BRANCH=master
ARG _HUGECTR_REPO="github.com/NVIDIA-Merlin/HugeCTR.git"
ARG _CI_JOB_TOKEN=""

RUN mkdir -p /usr/local/nvidia/lib64 && \
ln -s /usr/local/cuda/lib64/libcusolver.so /usr/local/nvidia/lib64/libcusolver.so.10

RUN ln -s /usr/lib/x86_64-linux-gnu/libibverbs.so.1 /usr/lib/x86_64-linux-gnu/libibverbs.so

RUN git clone https://github.com/NVIDIA-Merlin/HugeCTR.git build-env && \
pushd build-env && \
if [ "$RELEASE" == "true" ] && [ ${HUGECTR_VER} != "vnightly" ] ; then git fetch --all --tags && git checkout tags/${HUGECTR_VER}; else echo ${HUGECTR_VER} && git checkout master; fi && \
cd sparse_operation_kit && \
python setup.py install && \
popd && \
rm -rf build-env && \
rm -rf /var/tmp/HugeCTR
RUN if [ "$HUGECTR_DEV_MODE" == "false" ]; then \
git clone https://${_CI_JOB_TOKEN}${_HUGECTR_REPO} build-env && pushd build-env && git fetch --all; \
if [ "$RELEASE" == "true" ] && [ ${HUGECTR_VER} != "vnightly" ]; then \
git fetch --all --tags && git checkout tags/${HUGECTR_VER}; \
else \
git checkout ${_HUGECTR_BRANCH}; \
fi; \
cd sparse_operation_kit && \
python setup.py install && \
popd && \
rm -rf build-env; \
fi

# Clean up
RUN rm -rf /repos
Expand Down
69 changes: 53 additions & 16 deletions docker/dockerfile.tri
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ RUN git clone --branch apache-arrow-5.0.0 --recurse-submodules https://github.co
-DCMAKE_LIBRARY_PATH=${CUDA_CUDA_LIBRARY} \
-DARROW_FLIGHT=ON \
-DARROW_GANDIVA=OFF \
-DARROW_ORC=ON \
-DARROW_ORC=OFF \
-DARROW_WITH_BZ2=ON \
-DARROW_WITH_ZLIB=ON \
-DARROW_WITH_ZSTD=ON \
Expand All @@ -109,7 +109,7 @@ RUN git clone --branch apache-arrow-5.0.0 --recurse-submodules https://github.co
pushd python && \
export PYARROW_WITH_PARQUET=ON && \
export PYARROW_WITH_CUDA=ON && \
export PYARROW_WITH_ORC=ON && \
export PYARROW_WITH_ORC=OFF && \
export PYARROW_WITH_DATASET=ON && \
python setup.py build_ext --build-type=release bdist_wheel && \
pip install dist/*.whl && \
Expand Down Expand Up @@ -208,31 +208,68 @@ RUN apt-get update -y && \
./configure && make -j$(nproc) && make install && \
rm -rf /var/tmp/librdkafka

# Install Java
RUN mkdir -p /var/tmp && cd /var/tmp && wget https://download.java.net/java/GA/jdk16.0.2/d4a915d82b4c4fbb9bde534da945d746/7/GPL/openjdk-16.0.2_linux-x64_bin.tar.gz && \
mkdir -p /usr/java && tar -zxvf ./openjdk-16.0.2_linux-x64_bin.tar.gz -C /usr/java && \
rm -rf ./openjdk-16.0.2_linux-x64_bin.tar.gz

#Intall libhdfs client
RUN mkdir -p /var/tmp && cd /var/tmp && wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz && \
tar -zxvf ./hadoop-3.3.1.tar.gz && rm -rf hadoop-3.3.1.tar.gz && \
cp ./hadoop-3.3.1/lib/native/libhdfs.so.0.0.0 /usr/local/lib/ && cp hadoop-3.3.1/include/hdfs.h /usr/local/include/ && \
mv ./hadoop-3.3.1 /usr/local/hadoop && cd /usr/local/lib/ && ln -s libhdfs.so.0.0.0 libhdfs.so

ENV JAVA_HOME=/usr/java/jdk-16.0.2
ENV PATH=$JAVA_HOME/bin:$PATH
ENV LD_LIBRARY_PATH=$JAVA_HOME/lib/server
ENV HADOOP_HOME=/usr/local/hadoop
ENV PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

# Arguments "_XXXX" are only valid when $HUGECTR_DEV_MODE==false
ARG HUGECTR_DEV_MODE=false
ARG _HUGECTR_BRANCH=master
ARG _HUGECTR_REPO="github.com/NVIDIA-Merlin/HugeCTR.git"
ARG _HUGECTR_BACKEND_BRANCH=main
ARG _HUGECTR_BACKEND_REPO="github.com/triton-inference-server/hugectr_backend"

ARG _CI_JOB_TOKEN=""

# Install HugeCTR
ENV CPATH=/usr/local/include:$CPATH
RUN git clone https://github.com/NVIDIA-Merlin/HugeCTR.git /hugectr && \
cd /hugectr && if [ "$RELEASE" == "true" ] && [ ${HUGECTR_VER} != "vnightly" ]; then git fetch --all --tags && git checkout tags/${HUGECTR_VER}; else git checkout master; fi && \
RUN apt update -y && apt install rapidjson-dev -y
RUN if [ "$HUGECTR_DEV_MODE" == "false" ]; then \
git clone https://${_CI_JOB_TOKEN}${_HUGECTR_REPO} /hugectr && cd /hugectr && git fetch --all; \
if [ "$RELEASE" == "true" ] && [ ${HUGECTR_VER} != "vnightly" ]; then \
git fetch --all --tags && git checkout tags/${HUGECTR_VER}; \
else \
git checkout ${_HUGECTR_BRANCH}; \
fi; \
git submodule update --init --recursive && \
mkdir -p build && cd build &&\
cmake -DCMAKE_BUILD_TYPE=Release -DSM="60;61;70;75;80" -DENABLE_INFERENCE=ON .. && \
make -j$(nproc) && make install && \
chmod +x /usr/local/hugectr/bin/* &&\
export CPATH=/usr/local/hugectr/include:$CPATH && \
export LIBRARY_PATH=/usr/local/hugectr/lib:$LIBRARY_PATH && \
git clone https://github.com/triton-inference-server/hugectr_backend /repos/hugectr_inference_backend && \
cd /repos/hugectr_inference_backend && if [ "$RELEASE" == "true" ] && [ ${HUGECTR_BACKEND_VER} != "vnightly" ] ; then git fetch --all --tags && git checkout tags/${HUGECTR_BACKEND_VER}; else git checkout main; fi && \
chmod +x /usr/local/hugectr/bin/*; \
fi

ENV CPATH=/usr/local/hugectr/include:$CPATH
ENV LIBRARY_PATH=/usr/local/hugectr/lib:$LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/hugectr/lib:$LD_LIBRARY_PATH
ENV PATH=/usr/local/hugectr/bin:$PATH

RUN if [ "$HUGECTR_DEV_MODE" == "false" ]; then \
git clone https://${_CI_JOB_TOKEN}${_HUGECTR_BACKEND_REPO} /repos/hugectr_inference_backend && cd /repos/hugectr_inference_backend && \
if [ "$RELEASE" == "true" ] && [ "$HUGECTR_BACKEND_VER" != "vnightly" ]; then \
git fetch --all --tags && git checkout tags/${HUGECTR_BACKEND_VER}; \
else \
git checkout ${_HUGECTR_BACKEND_BRANCH}; \
fi && \
mkdir -p build && cd build && \
cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr/local/hugectr \
-DTRITON_COMMON_REPO_TAG="r$TRITON_VERSION" \
-DTRITON_CORE_REPO_TAG="r$TRITON_VERSION" \
-DTRITON_BACKEND_REPO_TAG="r$TRITON_VERSION" .. && \
make -j$(nproc) && make install && \
rm -rf /hugectr/build

ENV CPATH=/usr/local/hugectr/include:$CPATH
ENV LIBRARY_PATH=/usr/local/hugectr/lib:$LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/hugectr/lib:$LD_LIBRARY_PATH
ENV PATH=/usr/local/hugectr/bin:$PATH
rm -rf /repos/hugectr_inference_backend; \
fi

RUN ln -s /usr/local/hugectr/backends/hugectr /opt/tritonserver/backends/

Expand Down