Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ready to be reviewed] Support Customized HCTR Repo in the unified containers #85

Merged
merged 35 commits into from
Jan 26, 2022
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
c2c9f34
Update dockerfile.ctr
zehuanw Dec 12, 2021
75d85ad
Update dockerfile.ctr
zehuanw Dec 12, 2021
2074b53
Update dockerfile.ctr
zehuanw Dec 12, 2021
4acaa53
Update dockerfile.ctr
zehuanw Dec 19, 2021
3d87c5c
Update dockerfile.ctr
zehuanw Dec 19, 2021
2f6b385
Create dockerfile.ctr.test
zehuanw Dec 19, 2021
c790e24
Update dockerfile.ctr.test
zehuanw Dec 19, 2021
3e57ebd
Update dockerfile.ctr.test
zehuanw Dec 19, 2021
c093962
Update dockerfile.ctr
zehuanw Dec 19, 2021
192ab23
Update dockerfile.tf
zehuanw Dec 19, 2021
0664115
Update dockerfile.tri
zehuanw Dec 19, 2021
800d53d
Update dockerfile.tf
zehuanw Dec 19, 2021
ee145fd
Update dockerfile.ctr
zehuanw Dec 19, 2021
f366bd2
Update dockerfile.tri
zehuanw Dec 19, 2021
d13d492
Delete dockerfile.ctr.test
zehuanw Dec 20, 2021
a7d6222
Update dockerfile.tri
zehuanw Dec 20, 2021
ccbe0c6
Update dockerfile.ctr
zehuanw Dec 21, 2021
3ca9968
Update dockerfile.tf
zehuanw Dec 21, 2021
fe5098c
Update dockerfile.tri
zehuanw Dec 21, 2021
2df9f30
Update dockerfile.ctr
zehuanw Dec 21, 2021
5f0a260
Update dockerfile.tf
zehuanw Dec 21, 2021
e400922
refine dockerfile
shijieliu Dec 28, 2021
eab7b12
refine dockerfile
shijieliu Dec 28, 2021
8b492c6
refine dockerfile
shijieliu Dec 29, 2021
e0ee69e
solve hugectr env issue
shijieliu Dec 30, 2021
f38afe9
disable install arrow with orc
shijieliu Dec 30, 2021
d8bf047
Merge pull request #1 from shijieliu/alex
zehuanw Dec 31, 2021
c11f0f6
replace DEV_MODE with HCTR_DEV_MODE
shijieliu Dec 31, 2021
c9ec460
replace HCTR_DEV_MODE with HUGECTR_DEV_MODE
shijieliu Dec 31, 2021
44784d9
Merge pull request #2 from shijieliu/alex
zehuanw Dec 31, 2021
74b8695
resolve confilicts
shijieliu Jan 19, 2022
b3bb9f8
add flag to skip nvt in dockerfile.tf
shijieliu Jan 20, 2022
7024ff0
Merge pull request #3 from shijieliu/main
zehuanw Jan 21, 2022
ab88433
add hdfs dependency
shijieliu Jan 26, 2022
6187d65
Merge pull request #4 from shijieliu/main
shijieliu Jan 26, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 28 additions & 16 deletions docker/dockerfile.ctr
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ RUN git clone --branch apache-arrow-4.0.1 --recurse-submodules https://github.co
-DCMAKE_LIBRARY_PATH=${CUDA_CUDA_LIBRARY} \
-DARROW_FLIGHT=ON \
-DARROW_GANDIVA=OFF \
-DARROW_ORC=ON \
-DARROW_ORC=OFF \
-DARROW_WITH_BZ2=ON \
-DARROW_WITH_ZLIB=ON \
-DARROW_WITH_ZSTD=ON \
Expand All @@ -89,7 +89,7 @@ RUN git clone --branch apache-arrow-4.0.1 --recurse-submodules https://github.co
pushd python && \
export PYARROW_WITH_PARQUET=ON && \
export PYARROW_WITH_CUDA=ON && \
export PYARROW_WITH_ORC=ON && \
export PYARROW_WITH_ORC=OFF && \
export PYARROW_WITH_DATASET=ON && \
python setup.py build_ext --build-type=release bdist_wheel && \
pip install dist/*.whl && \
Expand Down Expand Up @@ -215,6 +215,12 @@ FROM phase3 AS phase4
ARG RELEASE=false
ARG HUGECTR_VER=vnightly

# Arguments "_XXXX" are only valid when $HUGECTR_DEV_MODE==false
ARG HUGECTR_DEV_MODE=false
ARG _HUGECTR_BRANCH=master
ARG _HUGECTR_REPO="github.com/NVIDIA-Merlin/HugeCTR.git"
ARG _CI_JOB_TOKEN=""

RUN pip3 install --no-cache-dir mpi4py ortools sklearn onnx onnxruntime

ENV OMPI_MCA_plm_rsh_agent=sh
Expand All @@ -232,20 +238,26 @@ RUN rm -rf /usr/lib/x86_64-linux-gnu/libibverbs.so && \
ln -s /usr/lib/x86_64-linux-gnu/libibverbs.so.1.14.36.0 /usr/lib/x86_64-linux-gnu/libibverbs.so

# Install hugectr
RUN git clone https://github.com/NVIDIA-Merlin/HugeCTR.git /hugectr && \
cd /hugectr && if [ "$RELEASE" == "true" ] && [ ${HUGECTR_VER} != "vnightly" ]; then git fetch --all --tags && git checkout tags/${HUGECTR_VER}; else git checkout master; fi && \
git submodule update --init --recursive && \
mkdir build && cd build && \
LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs/:$LD_LIBRARY_PATH && \
export PATH=$PATH:/usr/local/cuda-${CUDA_SHORT_VERSION}/compat/ && \
cmake -DCMAKE_CXX_COMPILER=/usr/bin/g++ -DCMAKE_C_COMPILER=/usr/bin/gcc -DCMAKE_BUILD_TYPE=Release -DSM="60;61;70;75;80" \
-DENABLE_MULTINODES=ON .. && \
make -j$(nproc) && make install && \
chmod +x /usr/local/hugectr/bin/* && \
chmod +x /usr/local/hugectr/lib/* && \
cd /hugectr/onnx_converter && \
python3 setup.py install && \
rm -rf /hugectr/build
RUN if [ "$HUGECTR_DEV_MODE" == "false" ]; then \
git clone https://${_CI_JOB_TOKEN}${_HUGECTR_REPO} /hugectr && cd /hugectr; \
if [ "$RELEASE" == "true" ] && [ "$HUGECTR_VER" != "vnightly" ]; then \
git fetch --all --tags && git checkout tags/${HUGECTR_VER}; \
else \
git checkout ${_HUGECTR_BRANCH}; \
fi; \
git submodule update --init --recursive && \
mkdir build && cd build && \
LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs/:$LD_LIBRARY_PATH && \
export PATH=$PATH:/usr/local/cuda-${CUDA_SHORT_VERSION}/compat/ && \
cmake -DCMAKE_CXX_COMPILER=/usr/bin/g++ -DCMAKE_C_COMPILER=/usr/bin/gcc -DCMAKE_BUILD_TYPE=Release -DSM="60;61;70;75;80" \
-DENABLE_MULTINODES=ON .. && \
make -j$(nproc) && make install && \
chmod +x /usr/local/hugectr/bin/* && \
chmod +x /usr/local/hugectr/lib/* && \
cd /hugectr/onnx_converter && \
python3 setup.py install && \
rm -rf /hugectr; \
fi

ENV PATH=/usr/local/hugectr/bin:$PATH
ENV LIBRARY_PATH=/usr/local/hugectr/lib:$LIBRARY_PATH
Expand Down
26 changes: 18 additions & 8 deletions docker/dockerfile.tf
Original file line number Diff line number Diff line change
Expand Up @@ -95,19 +95,29 @@ ARG HUGECTR_VER=vnightly
ARG SM="60;61;70;75;80"
ARG USE_NVTX=OFF

# Arguments "_XXXX" are only valid when $HUGECTR_DEV_MODE==false
ARG HUGECTR_DEV_MODE=false
ARG _HUGECTR_BRANCH=master
ARG _HUGECTR_REPO="github.com/NVIDIA-Merlin/HugeCTR.git"
ARG _CI_JOB_TOKEN=""

RUN mkdir -p /usr/local/nvidia/lib64 && \
ln -s /usr/local/cuda/lib64/libcusolver.so /usr/local/nvidia/lib64/libcusolver.so.10

RUN ln -s /usr/lib/x86_64-linux-gnu/libibverbs.so.1 /usr/lib/x86_64-linux-gnu/libibverbs.so

RUN git clone https://github.com/NVIDIA-Merlin/HugeCTR.git build-env && \
pushd build-env && \
if [ "$RELEASE" == "true" ] && [ ${HUGECTR_VER} != "vnightly" ] ; then git fetch --all --tags && git checkout tags/${HUGECTR_VER}; else echo ${HUGECTR_VER} && git checkout master; fi && \
cd sparse_operation_kit && \
python setup.py install && \
popd && \
rm -rf build-env && \
rm -rf /var/tmp/HugeCTR
RUN if [ "$HUGECTR_DEV_MODE" == "false" ]; then \
git clone https://${_CI_JOB_TOKEN}${_HUGECTR_REPO} build-env && pushd build-env && git fetch --all; \
if [ "$RELEASE" == "true" ] && [ ${HUGECTR_VER} != "vnightly" ]; then \
git fetch --all --tags && git checkout tags/${HUGECTR_VER}; \
else \
git checkout ${_HUGECTR_BRANCH}; \
fi; \
cd sparse_operation_kit && \
python setup.py install && \
popd && \
rm -rf build-env; \
fi

RUN pip install pybind11
RUN pip install numba numpy --upgrade
Expand Down
50 changes: 35 additions & 15 deletions docker/dockerfile.tri
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ RUN git clone --branch apache-arrow-4.0.1 --recurse-submodules https://github.co
-DCMAKE_LIBRARY_PATH=${CUDA_CUDA_LIBRARY} \
-DARROW_FLIGHT=ON \
-DARROW_GANDIVA=OFF \
-DARROW_ORC=ON \
-DARROW_ORC=OFF \
-DARROW_WITH_BZ2=ON \
-DARROW_WITH_ZLIB=ON \
-DARROW_WITH_ZSTD=ON \
Expand All @@ -100,7 +100,7 @@ RUN git clone --branch apache-arrow-4.0.1 --recurse-submodules https://github.co
pushd python && \
export PYARROW_WITH_PARQUET=ON && \
export PYARROW_WITH_CUDA=ON && \
export PYARROW_WITH_ORC=ON && \
export PYARROW_WITH_ORC=OFF && \
export PYARROW_WITH_DATASET=ON && \
python setup.py build_ext --build-type=release bdist_wheel && \
pip install dist/*.whl && \
Expand Down Expand Up @@ -215,31 +215,51 @@ RUN apt-get update -y && \

ENV CPATH=/usr/local/include:$CPATH

# Arguments "_XXXX" are only valid when $HUGECTR_DEV_MODE==false
ARG HUGECTR_DEV_MODE=false
ARG _HUGECTR_BRANCH=master
ARG _HUGECTR_REPO="github.com/NVIDIA-Merlin/HugeCTR.git"
ARG _HUGECTR_BACKEND_BRANCH=main
ARG _HUGECTR_BACKEND_REPO="github.com/triton-inference-server/hugectr_backend"

ARG _CI_JOB_TOKEN=""

# Install HugeCTR
RUN apt update -y && apt install rapidjson-dev -y
RUN git clone https://github.com/NVIDIA-Merlin/HugeCTR.git /hugectr && \
cd /hugectr && if [ "$RELEASE" == "true" ] && [ ${HUGECTR_VER} != "vnightly" ]; then git fetch --all --tags && git checkout tags/${HUGECTR_VER}; else git checkout master; fi && \
RUN if [ "$HUGECTR_DEV_MODE" == "false" ]; then \
git clone https://${_CI_JOB_TOKEN}${_HUGECTR_REPO} /hugectr && cd /hugectr && git fetch --all; \
if [ "$RELEASE" == "true" ] && [ ${HUGECTR_VER} != "vnightly" ]; then \
git fetch --all --tags && git checkout tags/${HUGECTR_VER}; \
else \
git checkout ${_HUGECTR_BRANCH}; \
fi; \
git submodule update --init --recursive && \
mkdir -p build && cd build &&\
cmake -DCMAKE_BUILD_TYPE=Release -DSM=$SM -DENABLE_INFERENCE=ON .. && \
make -j$(nproc) && make install && \
chmod +x /usr/local/hugectr/bin/* &&\
export CPATH=/usr/local/hugectr/include:$CPATH && \
export LIBRARY_PATH=/usr/local/hugectr/lib:$LIBRARY_PATH && \
git clone https://github.com/triton-inference-server/hugectr_backend /repos/hugectr_inference_backend && \
cd /repos/hugectr_inference_backend && if [ "$RELEASE" == "true" ] && [ ${HUGECTR_BACKEND_VER} != "vnightly" ] ; then git fetch --all --tags && git checkout tags/${HUGECTR_BACKEND_VER}; else git checkout main; fi && \
chmod +x /usr/local/hugectr/bin/*; \
fi

ENV CPATH=/usr/local/hugectr/include:$CPATH
ENV LIBRARY_PATH=/usr/local/hugectr/lib:$LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/hugectr/lib:$LD_LIBRARY_PATH
ENV PATH=/usr/local/hugectr/bin:$PATH

RUN if [ "$HUGECTR_DEV_MODE" == "false" ]; then \
git clone https://${_CI_JOB_TOKEN}${_HUGECTR_BACKEND_REPO} /repos/hugectr_inference_backend && cd /repos/hugectr_inference_backend && \
if [ "$RELEASE" == "true" ] && [ "$HUGECTR_BACKEND_VER" != "vnightly" ]; then \
git fetch --all --tags && git checkout tags/${HUGECTR_BACKEND_VER}; \
else \
git checkout ${_HUGECTR_BACKEND_BRANCH}; \
fi && \
mkdir -p build && cd build && \
cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr/local/hugectr \
-DTRITON_COMMON_REPO_TAG="r$TRITON_VERSION" \
-DTRITON_CORE_REPO_TAG="r$TRITON_VERSION" \
-DTRITON_BACKEND_REPO_TAG="r$TRITON_VERSION" .. && \
make -j$(nproc) && make install && \
rm -rf /hugectr/build

ENV CPATH=/usr/local/hugectr/include:$CPATH
ENV LIBRARY_PATH=/usr/local/hugectr/lib:$LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/hugectr/lib:$LD_LIBRARY_PATH
ENV PATH=/usr/local/hugectr/bin:$PATH
rm -rf /repos/hugectr_inference_backend; \
fi

RUN ln -s /usr/local/hugectr/backends/hugectr /opt/tritonserver/backends/

Expand Down