Skip to content

Commit

Permalink
docs: Add Docker deployment for Bentos (#4812)
Browse files Browse the repository at this point in the history
Add Docker deployment for Bentos

Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>
  • Loading branch information
Sherlock113 authored Jun 18, 2024
1 parent e1efdc6 commit 81b2df8
Show file tree
Hide file tree
Showing 2 changed files with 85 additions and 2 deletions.
2 changes: 2 additions & 0 deletions docs/source/guides/build-options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,8 @@ To add it in your ``bentofile.yaml``:
the ``debian`` and ``alpine`` distro support ``conda``. Learn more in
the ``docker`` section below.

.. _docker-configuration:

``docker``
^^^^^^^^^^

Expand Down
85 changes: 83 additions & 2 deletions docs/source/guides/gpu-inference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,30 @@ If you want to use multiple GPUs for distributed operations (multiple GPUs for t
- PyTorch: `DataParallel <https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html>`_ and `DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`_
- TensorFlow: `Distributed training <https://www.tensorflow.org/guide/distributed_training>`_

Deployment on BentoCloud
^^^^^^^^^^^^^^^^^^^^^^^^
GPU deployment
--------------

To make sure a Bento is using GPUs during deployment, configure its required CUDA version in the ``docker`` field of ``bentofile.yaml``. BentoML will install the corresponding version in the Docker image created. An example:

.. code-block:: yaml
service: "service:GPUSVC"
labels:
owner: bentoml-team
stage: demo
include:
- "*.py"
python:
requirements_txt: "./requirements.txt"
docker:
cuda_version: "12.1.1" # Set your CUDA version
distro: debian
python_version: "3.11.7"
If the desired CUDA version is not natively supported by BentoML, you can customize the installation of CUDA driver and libraries via ``system_packages``, ``setup_script``, or ``base_image`` options under the :ref:`docker-configuration` field.

BentoCloud
^^^^^^^^^^

When deploying on BentoCloud, specify ``resources`` with ``gpu`` or ``gpu_type`` in the ``@bentoml.service`` decorator to allow BentoCloud to allocate the necessary GPU resources:

Expand Down Expand Up @@ -86,6 +108,65 @@ To list available GPU types on your BentoCloud account, run:
gpu.l4.1 * 4000m 16Gi 1 nvidia-l4
gpu.a100.1 * 6000m 43Gi 1 nvidia-tesla-a100
After your Service is ready, you can then deploy it to BentoCloud by running ``bentoml deploy .``. See :doc:`/bentocloud/how-tos/create-deployments` for details.

Docker
^^^^^^

You need to install the NVIDIA Container Toolkit for running Docker containers with Nvidia GPUs. NVIDIA provides `detailed instructions <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker>`_ for installing both ``Docker CE`` and ``nvidia-docker``.

After you build a Docker image for your Bento with ``bentoml containerize``, you can run it on all available GPUs like this:

.. code-block:: bash
docker run --gpus all -p 3000:3000 bento_image:latest
You can use the ``device`` option to specify GPUs:

.. code-block:: bash
docker run --gpus all --device /dev/nvidia0 \
--device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools \
--device /dev/nvidia-modeset --device /dev/nvidiactl <docker-args>
To view GPU usage, use the ``nvidia-smi`` tool to see if a BentoML Service or Bento is using GPU. You can run it in a separate terminal while your BentoML Service is handling requests.

.. code-block:: bash
# Refresh the output of every second
watch -n 1 nvidia-smi
Example output:

.. code-block:: bash
Every 1.0s: nvidia-smi ps49pl48tek0: Mon Jun 17 13:09:46 2024
Mon Jun 17 13:09:46 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:00:05.0 Off | 0 |
| N/A 30C P0 60W / 400W | 3493MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1813 G /usr/lib/xorg/Xorg 70MiB |
| 0 N/A N/A 1946 G /usr/bin/gnome-shell 78MiB |
| 0 N/A N/A 11197 C /Home/Documents/BentoML/demo/bin/python 3328MiB |
+---------------------------------------------------------------------------------------+
For more information, see `the Docker documentation <https://docs.docker.com/config/containers/resource_constraints/#gpu>`_.

Limit GPU visibility
--------------------

Expand Down

0 comments on commit 81b2df8

Please sign in to comment.