HugeCTR Backend is a GPU-accelerated recommender model deploy framework that was designed to effectively use GPU memory to accelerate the inference by decoupling the parameter server, embedding cache and model weight. HugeCTR Backend supports concurrent model inference execution across multiple GPUs, embedding cache sharing between multiple model instances. For additional information, see HugeCTR Inference User Guide.
You can either install Hugectr backend easily using the Hugectr backend Docker image in NGC, or build Hugectr backend from scratch based on your own specific requirement if you're an advanced user. We support the following compute capabilities for inference deployment:
Compute Capability | GPU |
---|---|
70 | NVIDIA V100 (Volta) |
75 | NVIDIA T4 (Turing) |
80 | NVIDIA A100 (Ampere) |
The following prerequisites must be met before installing or building HugeCTR from scratch:
- Docker version 19 and higher
- cuBLAS version 10.1
- CMake version 3.17.0
- cuDNN version 7.5
- RMM version 0.16
- GCC version 7.4.0
All NVIDIA Merlin components are available as open-source projects. However, a more convenient way to make use of these components is by using Merlin NGC containers. Containers allow you to package your software application, libraries, dependencies, and runtime compilers in a self-contained environment. When installing hugectr backend from NGC containers, the application environment remains both portable, consistent, reproducible, and agnostic to the underlying host system software configuration.
Hugectr backend docker images are available in the NVIDIA container repository on https://ngc.nvidia.com/catalog/containers/nvidia:hugectr.
You can pull and launch the container by running the following command:
docker run --runtime=nvidia --rm -it nvcr.io/nvidia/hugectr:v3.0-inference # Start interaction mode
Since the Hugectr backend building is based on Hugectr installation, the first step is to compile hugectr, generate a shared library(libhugectr_inference.so), and install it in the specified folder correctly. The default path of all the HugeCTR libraries and header files are installed in /usr/local/hugectr folder. Before building HugeCTR from scratch, you should download the HugeCTR repository and the third-party modules that it relies on by running the following commands:
git clone https://github.com/NVIDIA/HugeCTR.git
cd HugeCTR
git submodule update --init --recursive
You can build HugeCTR from scratch using the following options:
- CMAKE_BUILD_TYPE: You can use this option to build HugeCTR with Debug or Release. When using Debug to build, HugeCTR will print more verbose logs and execute GPU tasks in a synchronous manner.
- ENABLE_INFERENCE: You can use this option to build HugeCTR in inference mode, which was designed for inference framework. In this mode,inference shared library will be built for hugectr backend. Only inference related interfaces could be used, which means users can’t train models in this mode. This option is set to OFF by default.
Here is the example of how you can build HugeCTR using these build options:
$ mkdir -p build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_INFERENCE=ON ..
$ make -j
$ make install
Before building HugeCTR backend from scratch, you should download the HugeCTR backend repository by running the following commands:
git https://github.com/triton-inference-server/hugectr_backend.git
cd hugectr_backend
Use cmake to build and install in a specified folder. Please remember to specify the absolute path of the local directory that installs the HugeCTR backend for “--backend-directory” argument when launching the Triton Server.
$ mkdir build
$ cd build
$ cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install ..
$ make install
The following required Triton repositories will be pulled and used in the build. By default the "main" branch/tag will be used for each repo but the listed CMake argument can be used to override.
- triton-inference-server/backend: -DTRITON_BACKEND_REPO_TAG=[tag]
- triton-inference-server/core: -DTRITON_CORE_REPO_TAG=[tag]
- triton-inference-server/common: -DTRITON_COMMON_REPO_TAG=[tag]