Skip to content

Commit

Permalink
Add sparse operation kit and distributed embeddings to tf image (#354)
Browse files Browse the repository at this point in the history
* add entrypoint to all containers

* remove -e for pip installs

* add keyring to fix bad key nvidia issue

* trying to move env call to see if it helps fix the install of distributed embeddings after sok install

* fix extra CVE

* fixing CI routes

* adding sok and distributed embeds to tf image

* Update docker/dockerfile.tf

Co-authored-by: Ben Frederickson <github@benfrederickson.com>

* Update docker/dockerfile.tf

Co-authored-by: Ben Frederickson <github@benfrederickson.com>

Co-authored-by: Ben Frederickson <github@benfrederickson.com>
  • Loading branch information
jperez999 and benfred authored Jun 1, 2022
1 parent f581cc2 commit dc9b37f
Showing 1 changed file with 39 additions and 1 deletion.
40 changes: 39 additions & 1 deletion docker/dockerfile.tf
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ FROM ${BASE_IMAGE} as base
COPY --chown=1000:1000 --from=triton /opt/tritonserver/backends/tensorflow2 backends/tensorflow2/

# Tensorflow dependencies (only)
RUN pip install tensorflow-gpu \
RUN pip install tensorflow-gpu \
&& pip uninstall tensorflow-gpu keras -y

# DLFW Tensorflow packages
Expand All @@ -25,6 +25,44 @@ COPY --chown=1000:1000 --from=dlfw /usr/local/lib/tensorflow/ /usr/local/lib/ten
COPY --chown=1000:1000 --from=dlfw /usr/local/lib/python3.8/dist-packages/horovod /usr/local/lib/python3.8/dist-packages/horovod/
COPY --chown=1000:1000 --from=dlfw /usr/local/bin/horovodrun /usr/local/bin/horovodrun

# Install cmake
RUN wget http://www.cmake.org/files/v3.21/cmake-3.21.1.tar.gz && \
tar xf cmake-3.21.1.tar.gz && cd cmake-3.21.1 && ./configure && make && make install


# Install HugeCTR
ENV LD_LIBRARY_PATH=/usr/local/hugectr/lib:$LD_LIBRARY_PATH \
LIBRARY_PATH=/usr/local/hugectr/lib:$LIBRARY_PATH \
PYTHONPATH=/usr/local/hugectr/lib:$PYTHONPATH

# Arguments "_XXXX" are only valid when $HUGECTR_DEV_MODE==false
ARG HUGECTR_DEV_MODE=false
ARG _HUGECTR_REPO="github.com/NVIDIA-Merlin/HugeCTR.git"
ARG _CI_JOB_TOKEN=""

RUN mkdir -p /usr/local/nvidia/lib64 && \
ln -s /usr/local/cuda/lib64/libcusolver.so /usr/local/nvidia/lib64/libcusolver.so.10

RUN ln -s /usr/lib/x86_64-linux-gnu/libibverbs.so.1 /usr/lib/x86_64-linux-gnu/libibverbs.so

RUN if [ "$HUGECTR_DEV_MODE" == "false" ]; then \
git clone https://${_CI_JOB_TOKEN}${_HUGECTR_REPO} build-env && \
pushd build-env && \
git checkout ${HUGECTR_VER} && \
cd sparse_operation_kit && \
python -m pip install . && \
popd && \
rm -rf build-env; \
fi

# Install distributed-embeddings
ARG INSTALL_DISTRIBUTED_EMBEDDINGS=true
RUN if [ "$INSTALL_DISTRIBUTED_EMBEDDINGS" == "true" ]; then \
git clone https://github.com/NVIDIA-Merlin/distributed-embeddings.git /distributed_embeddings/ && \
cd /distributed_embeddings && git checkout ${TFDE_VER} && \
make pip_pkg && pip install artifacts/*.whl && make clean; \
fi
HEALTHCHECK NONE
CMD ["/bin/bash"]

0 comments on commit dc9b37f

Please sign in to comment.