Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rel 22.03 #124

Merged
merged 28 commits into from
Mar 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions ci/test_container.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
#!/bin/bash
set -e

container=$1
devices=$2
Expand Down Expand Up @@ -46,16 +45,16 @@ fi
# Integration tests #
#####################

## Test NVTabular
### Not shared storage in blossom yet
# Test NVTabular
## Not shared storage in blossom yet
regex="merlin(.)*-inference"
if [[ "$container" =~ $regex ]]; then
/nvtabular/ci/test_integration.sh $container $devices --report 1
fi
## Test HugeCTR
# Test HugeCTR
# Waiting to sync integration tests with them

## Test Transformers4Rec
# Test Transformers4Rec
if [ "$container" != "merlin-training" ]; then
/transformers4rec/ci/test_integration.sh $container $devices
fi
35 changes: 16 additions & 19 deletions docker/inference/dockerfile.ctr
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# syntax=docker/dockerfile:1.2
ARG TRITON_VERSION=22.01
ARG TRITON_VERSION=22.02
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to make this a range? >= ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a variable, it it not a comparison, it is an assignation

ARG IMAGE=nvcr.io/nvidia/tritonserver:${TRITON_VERSION}-py3-min
FROM ${IMAGE}

Expand All @@ -8,6 +8,7 @@ ARG CORE_VER=main
ARG RMM_VER=v21.12.00
ARG CUDF_VER=v21.12.02
ARG NVTAB_VER=main
ARG NVTAB_BACKEND_VER=main
ARG MODELS_VER=main
ARG HUGECTR_VER=master
ARG HUGECTR_BACKEND_VER=main
Expand All @@ -29,12 +30,13 @@ RUN apt update -y --fix-missing && \
apt-get install -y --no-install-recommends \
clang-format \
libboost-serialization-dev \
libcurl4-openssl-dev \
libssl-dev \
libtbb-dev \
protobuf-compiler \
python3-dev \
python3-pip \
rapidjson-dev \
zlib1g-dev && \
rapidjson-dev &&\
apt-get autoremove -y && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
Expand All @@ -44,7 +46,7 @@ RUN ln -s /usr/bin/python3 /usr/bin/python
# Install multiple packages
RUN pip install cupy-cuda115 nvidia-pyindex pybind11 pytest protobuf transformers==4.12 tensorflow-metadata
RUN pip install betterproto cachetools graphviz nvtx scipy sklearn
RUN pip install numba --no-deps
RUN pip install pandas numba==0.55.1 numpy==1.21.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to fix versions here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we need to change them before installin Merlin Software. We need to update Merlin Software prerequisites

RUN pip install tritonclient[all] grpcio-channelz
RUN pip install dask==2021.11.2 distributed==2021.11.2 dask[dataframe]==2021.11.2 dask-cuda
RUN pip install git+https://github.com/rapidsai/asvdb.git@main
Expand All @@ -59,7 +61,7 @@ RUN git clone --branch v1.9.2 https://github.com/gabime/spdlog.git build-env &&
mkdir build && cd build && cmake .. && make -j && make install && \
popd && \
rm -rf build-env

# Install arrow
ENV ARROW_HOME=/usr/local
RUN git clone --branch apache-arrow-5.0.0 --recurse-submodules https://github.com/apache/arrow.git build-env && \
Expand Down Expand Up @@ -156,7 +158,7 @@ ENV PYTHONPATH=/models:$PYTHONPATH
# Install NVTabular Triton Backend
ARG TRITON_VERSION
RUN git clone https://github.com/NVIDIA-Merlin/nvtabular_triton_backend.git build-env && \
cd build-env && git checkout ${NVTAB_VER} && cd .. && \
cd build-env && git checkout ${NVTAB_BACKEND_VER} && cd .. && \
pushd build-env && \
mkdir build && \
cd build && \
Expand All @@ -165,7 +167,7 @@ RUN git clone https://github.com/NVIDIA-Merlin/nvtabular_triton_backend.git buil
-D TRITON_CORE_REPO_TAG="r$TRITON_VERSION" \
-D TRITON_BACKEND_REPO_TAG="r$TRITON_VERSION" .. \
&& make -j && \
mkdir /opt/tritonserver/backends/nvtabular && \
mkdir -p /opt/tritonserver/backends/nvtabular && \
cp libtriton_nvtabular.so /opt/tritonserver/backends/nvtabular/ && \
popd && \
rm -rf build-env
Expand Down Expand Up @@ -201,21 +203,16 @@ RUN apt-get update -y && \
./configure && make -j$(nproc) && make install && \
rm -rf /var/tmp/librdkafka

# Install Java
RUN mkdir -p /var/tmp && cd /var/tmp && wget https://download.java.net/java/GA/jdk16.0.2/d4a915d82b4c4fbb9bde534da945d746/7/GPL/openjdk-16.0.2_linux-x64_bin.tar.gz && \
ARG INSTALL_HDFS=false
RUN if [ "$INSTALL_HDFS" == "true" ]; then \
mkdir -p /var/tmp && cd /var/tmp && wget https://download.java.net/java/GA/jdk16.0.2/d4a915d82b4c4fbb9bde534da945d746/7/GPL/openjdk-16.0.2_linux-x64_bin.tar.gz && \
mkdir -p /usr/java && tar -zxvf ./openjdk-16.0.2_linux-x64_bin.tar.gz -C /usr/java && \
rm -rf ./openjdk-16.0.2_linux-x64_bin.tar.gz

#Intall libhdfs client
RUN mkdir -p /var/tmp && cd /var/tmp && wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz && \
rm -rf ./openjdk-16.0.2_linux-x64_bin.tar.gz && \
mkdir -p /var/tmp && cd /var/tmp && wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz && \
tar -zxvf ./hadoop-3.3.1.tar.gz && rm -rf hadoop-3.3.1.tar.gz && \
cp ./hadoop-3.3.1/lib/native/libhdfs.so.0.0.0 /usr/local/lib/ && cp hadoop-3.3.1/include/hdfs.h /usr/local/include/ && \
mv ./hadoop-3.3.1 /usr/local/hadoop && cd /usr/local/lib/ && ln -s libhdfs.so.0.0.0 libhdfs.so && \
rm /usr/local/hadoop/share/hadoop/common/lib/jackson-databind-2.10.5.1.jar && rm /usr/local/hadoop/share/hadoop/hdfs/lib/jackson-databind-2.10.5.1.jar && \
wget https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.13.1/jackson-databind-2.13.1.jar && \
cp jackson-databind-2.13.1.jar /usr/local/hadoop/share/hadoop/hdfs/lib/ && \
mv jackson-databind-2.13.1.jar /usr/local/hadoop/share/hadoop/common/lib/ && \
rm /usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar && rm /usr/local/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar
mv ./hadoop-3.3.1 /usr/local/hadoop && cd /usr/local/lib/ && ln -s libhdfs.so.0.0.0 libhdfs.so; \
fi

ENV JAVA_HOME=/usr/java/jdk-16.0.2
ENV PATH=$JAVA_HOME/bin:$PATH
Expand Down
8 changes: 6 additions & 2 deletions docker/inference/dockerfile.tf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# syntax=docker/dockerfile:1.2
ARG TRITON_VERSION=22.01
ARG TRITON_VERSION=22.02
ARG IMAGE=nvcr.io/nvidia/tritonserver:${TRITON_VERSION}-tf2-python-py3
FROM ${IMAGE}

Expand All @@ -8,6 +8,7 @@ ARG CORE_VER=main
ARG RMM_VER=v21.12.00
ARG CUDF_VER=v21.12.02
ARG NVTAB_VER=main
ARG NVTAB_BACKEND_VER=main
ARG MODELS_VER=main
ARG HUGECTR_VER=master
ARG HUGECTR_BACKEND_VER=main
Expand All @@ -29,8 +30,11 @@ RUN apt update -y --fix-missing && \
apt-get install -y --no-install-recommends \
clang-format \
libboost-serialization-dev \
libexpat1-dev \
libsasl2-2 \
libssl-dev \
libtbb-dev \
policykit-1 \
protobuf-compiler \
rapidjson-dev \
zlib1g-dev && \
Expand Down Expand Up @@ -155,7 +159,7 @@ ENV PYTHONPATH=/models:$PYTHONPATH
# Install NVTabular Triton Backend
ARG TRITON_VERSION
RUN git clone https://github.com/NVIDIA-Merlin/nvtabular_triton_backend.git build-env && \
cd build-env && git checkout ${NVTAB_VER} && cd .. && \
cd build-env && git checkout ${NVTAB_BACKEND_VER} && cd .. && \
pushd build-env && \
mkdir build && \
cd build && \
Expand Down
8 changes: 6 additions & 2 deletions docker/inference/dockerfile.torch
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# syntax=docker/dockerfile:1.2
ARG TRITON_VERSION=22.01
ARG TRITON_VERSION=22.02
ARG IMAGE=nvcr.io/nvidia/tritonserver:${TRITON_VERSION}-pyt-python-py3
FROM ${IMAGE}

Expand All @@ -8,6 +8,7 @@ ARG CORE_VER=main
ARG RMM_VER=v21.12.00
ARG CUDF_VER=v21.12.02
ARG NVTAB_VER=main
ARG NVTAB_BACKEND_VER=main
ARG MODELS_VER=main
ARG HUGECTR_VER=master
ARG HUGECTR_BACKEND_VER=main
Expand All @@ -29,8 +30,11 @@ RUN apt update -y --fix-missing && \
apt-get install -y --no-install-recommends \
clang-format \
libboost-serialization-dev \
libexpat1-dev \
libsasl2-2 \
libssl-dev \
libtbb-dev \
policykit-1 \
protobuf-compiler \
rapidjson-dev \
zlib1g-dev && \
Expand Down Expand Up @@ -155,7 +159,7 @@ ENV PYTHONPATH=/models:$PYTHONPATH
# Install NVTabular Triton Backend
ARG TRITON_VERSION
RUN git clone https://github.com/NVIDIA-Merlin/nvtabular_triton_backend.git build-env && \
cd build-env && git checkout ${NVTAB_VER} && cd .. && \
cd build-env && git checkout ${NVTAB_BACKEND_VER} && cd .. && \
pushd build-env && \
mkdir build && \
cd build && \
Expand Down
33 changes: 20 additions & 13 deletions docker/training/dockerfile.ctr
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# syntax=docker/dockerfile:1
ARG IMAGE=nvcr.io/nvidia/tensorflow:22.01-tf2-py3
ARG IMAGE=nvcr.io/nvidia/tensorflow:22.02-tf2-py3
FROM ${IMAGE}

# Args
Expand All @@ -26,6 +26,7 @@ RUN apt update -y --fix-missing && \
clang-format \
graphviz \
libaio-dev \
libexpat1-dev \
libtbb-dev \
protobuf-compiler && \
apt-get autoremove -y && \
Expand All @@ -39,9 +40,17 @@ RUN apt remove --purge cmake -y && wget http://www.cmake.org/files/v3.21/cmake-3
# Install multiple packages
RUN pip install nvidia-pyindex mpi4py onnx onnxruntime
RUN pip install betterproto graphviz pybind11 pytest
RUN pip install --upgrade ipython
RUN pip install numba==0.55.1 numpy==1.21.5 --no-deps
RUN pip install --ignore-installed llvmlite==0.38.0 --no-deps
RUN pip install tritonclient[all] grpcio-channelz
RUN pip install git+https://github.com/rapidsai/asvdb.git@main

# Install Merlin Core
RUN git clone https://github.com/NVIDIA-Merlin/core.git /core/ && \
cd /core/ && git checkout ${CORE_VER} && pip install --no-deps -e .
ENV PYTHONPATH=/core:$PYTHONPATH

# Install NVTabular
ENV PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION='python'
RUN git clone https://github.com/NVIDIA-Merlin/NVTabular.git /nvtabular/ && \
Expand Down Expand Up @@ -88,21 +97,16 @@ RUN apt-get update -y && \
./configure && make -j$(nproc) && make install && \
rm -rf /var/tmp/librdkafka

#Install Java
RUN mkdir -p /var/tmp && cd /var/tmp && wget https://download.java.net/java/GA/jdk16.0.2/d4a915d82b4c4fbb9bde534da945d746/7/GPL/openjdk-16.0.2_linux-x64_bin.tar.gz && \
ARG INSTALL_HDFS=false
RUN if [ "$INSTALL_HDFS" == "true" ]; then \
mkdir -p /var/tmp && cd /var/tmp && wget https://download.java.net/java/GA/jdk16.0.2/d4a915d82b4c4fbb9bde534da945d746/7/GPL/openjdk-16.0.2_linux-x64_bin.tar.gz && \
mkdir -p /usr/java && tar -zxvf ./openjdk-16.0.2_linux-x64_bin.tar.gz -C /usr/java && \
rm -rf ./openjdk-16.0.2_linux-x64_bin.tar.gz

#Intall libhdfs client
RUN mkdir -p /var/tmp && cd /var/tmp && wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz && \
rm -rf ./openjdk-16.0.2_linux-x64_bin.tar.gz && \
mkdir -p /var/tmp && cd /var/tmp && wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz && \
tar -zxvf ./hadoop-3.3.1.tar.gz && rm -rf hadoop-3.3.1.tar.gz && \
cp ./hadoop-3.3.1/lib/native/libhdfs.so.0.0.0 /usr/local/lib/ && cp hadoop-3.3.1/include/hdfs.h /usr/local/include/ && \
mv ./hadoop-3.3.1 /usr/local/hadoop && cd /usr/local/lib/ && ln -s libhdfs.so.0.0.0 libhdfs.so && \
rm /usr/local/hadoop/share/hadoop/common/lib/jackson-databind-2.10.5.1.jar && rm /usr/local/hadoop/share/hadoop/hdfs/lib/jackson-databind-2.10.5.1.jar && \
wget https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.13.1/jackson-databind-2.13.1.jar && \
cp jackson-databind-2.13.1.jar /usr/local/hadoop/share/hadoop/hdfs/lib/ && \
mv jackson-databind-2.13.1.jar /usr/local/hadoop/share/hadoop/common/lib/ && \
rm /usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar && rm /usr/local/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar
mv ./hadoop-3.3.1 /usr/local/hadoop && cd /usr/local/lib/ && ln -s libhdfs.so.0.0.0 libhdfs.so ; \
fi

ENV JAVA_HOME=/usr/java/jdk-16.0.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need java if we're not installing hdfs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HugeCTR wanted to keep it as optional. They will be using it for dev

ENV PATH=$JAVA_HOME/bin:$PATH
Expand Down Expand Up @@ -169,7 +173,10 @@ ENV PYTHONPATH=/hugectr/onnx_converter:$PYTHONPATH
RUN rm /usr/local/cuda/lib64/stubs/libcuda.so.1

# Clean up
RUN pip install numba==0.53.1 numpy==1.22.2 --no-deps
RUN rm -rf /repos
RUN rm -rf /usr/local/share/jupyter/lab/staging/node_modules/marked
RUN rm -rf /usr/local/share/jupyter/lab/staging/node_modules/node-fetch

HEALTHCHECK NONE
CMD ["/bin/bash"]
7 changes: 6 additions & 1 deletion docker/training/dockerfile.tf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# syntax=docker/dockerfile:1
ARG IMAGE=nvcr.io/nvidia/tensorflow:22.01-tf2-py3
ARG IMAGE=nvcr.io/nvidia/tensorflow:22.02-tf2-py3
FROM ${IMAGE}

# Args
Expand All @@ -21,6 +21,8 @@ ENV DEBIAN_FRONTEND=noninteractive
RUN apt update -y --fix-missing && \
apt install -y --no-install-recommends software-properties-common && \
apt-get install -y --no-install-recommends \
libexpat1-dev \
libsasl2-2 \
graphviz \
protobuf-compiler && \
apt-get autoremove -y && \
Expand All @@ -29,6 +31,7 @@ RUN apt update -y --fix-missing && \

# Install multiple packages
RUN pip install betterproto graphviz pybind11 pydot pytest mpi4py
RUN pip install --upgrade ipython
RUN pip install nvidia-pyindex
RUN pip install tritonclient[all] grpcio-channelz
RUN pip install numba==0.55.1
Expand Down Expand Up @@ -87,6 +90,8 @@ RUN if [ "$HUGECTR_DEV_MODE" == "false" ]; then \

# Clean up
RUN rm -rf /repos
RUN rm -rf /usr/local/share/jupyter/lab/staging/node_modules/marked
RUN rm -rf /usr/local/share/jupyter/lab/staging/node_modules/node-fetch

HEALTHCHECK NONE
CMD ["/bin/bash"]
9 changes: 8 additions & 1 deletion docker/training/dockerfile.torch
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG IMAGE=nvcr.io/nvidia/pytorch:22.01-py3
ARG IMAGE=nvcr.io/nvidia/pytorch:22.02-py3
FROM ${IMAGE}

# Args
Expand All @@ -20,6 +20,8 @@ ENV DEBIAN_FRONTEND=noninteractive
RUN apt update -y --fix-missing && \
apt install -y --no-install-recommends software-properties-common && \
apt install -y --no-install-recommends \
libexpat1-dev \
libsasl2-2 \
graphviz && \
apt autoremove -y && \
apt clean && \
Expand All @@ -29,6 +31,8 @@ RUN apt update -y --fix-missing && \
RUN python -m pip install --upgrade pip
RUN pip install --upgrade setuptools
RUN pip install betterproto graphviz transformers==4.12 tensorflow-metadata torchmetrics
RUN pip install --upgrade ipython
RUN pip install --upgrade Pillow
RUN pip install nvidia-pyindex
RUN pip install tritonclient[all] grpcio-channelz
RUN pip install --no-deps fastai fastcore fastprogress fastdownload
Expand Down Expand Up @@ -59,7 +63,10 @@ RUN git clone https://github.com/NVIDIA-Merlin/Models.git /models/ && \
ENV PYTHONPATH=/models:$PYTHONPATH

# Clean up
RUN pip install numba==0.53.1 numpy==1.22.2 --no-deps
RUN rm -rf /repos
RUN rm -rf /opt/conda/share/jupyter/lab/staging/node_modules/marked
RUN rm -rf /opt/conda/share/jupyter/lab/staging/node_modules/node-fetch

HEALTHCHECK NONE
CMD ["/bin/bash"]