Implementation of End-to-End YOLO Detection and Segmentation Models for DeepStream
This repository offers an optimized implementation of End-to-End YOLO models for DeepStream, enhancing inference efficiency by integrating Non-Maximum Suppression (NMS) directly into the YOLO models. This approach supports dynamic batch sizes and input sizes, providing seamless adaptability.
Note: This feature is available only in the Python application.
DeepStream Version | dGPU/X86 | Jetson |
---|---|---|
7.1 | ✅ | ✅ |
7.0 | ✅ | ✅ |
6.4 | ✅ | ✅ |
6.3 | ✅ | ✅ |
6.2 | ✅ | ✅ |
6.1 | ❌ | ❌ |
Make sure you have an Nvidia GPU installed on your system and that the latest drivers are properly configured. Download and install the GPU drivers from the official Nvidia website: Nvidia Drivers Download
- Windows 11
- Enable WSL2 in Windows and install Linux(Ubuntu) from the Microsoft Store.
- DeepStream On WSL
- Linux (x86_64)
- NVIDIA Jetson ARM
Docker is required for creating and managing containers, simplifying development and deployment.
To install Docker on Ubuntu, use the convenience script:
Docker Installation Guide for Ubuntu
After the installation, add your user to the docker group to run Docker commands without sudo:
sudo usermod -aG docker $USER
The NVIDIA Container Toolkit allows Docker containers to utilize the Nvidia GPU to accelerate your applications. To install the toolkit, follow the official guide:
NVIDIA Container Toolkit Installation Guide
After installation, verify that the setup is correct by running a GPU-enabled container:
docker run --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
Dataset | Model | Feature | Dynamic Shape | Dynamic Batch Size | NMS-Free | Efficient NMS / RoiAlign |
---|---|---|---|---|---|---|
COCO | YOLO11 | Detection | ✅ | ✅ | 🚫 | ✅ |
COCO | YOLOv10 | Detection | ✅ | ✅ | ✅ | 🚫 |
COCO | YOLOv9 | Detection | ✅ | ✅ | ✅ | ✅ |
COCO | YOLOv8 | Detection | ✅ | ✅ | 🚫 | ✅ |
COCO | YOLOv7 | Detection | ✅ | ✅ | 🚫 | ✅ |
COCO | YOLO11 | Segmentation | ✅ | ✅ | 🚫 | ✅ |
COCO | YOLOv9 | Segmentation | ✅ | ✅ | 🚫 | ✅ |
COCO | YOLOv8 | Segmentation | ✅ | ✅ | 🚫 | ✅ |
COCO | YOLOv7 | Segmentation | ✅ | ✅ | 🚫 | ✅ |
WIDER FACE | YOLO11 | Detection | ✅ | ✅ | 🚫 | ✅ |
WIDER FACE | YOLOv10 | Detection | ✅ | ✅ | ✅ | 🚫 |
WIDER FACE | YOLOv8 | Detection | ✅ | ✅ | ✅ | ✅ |
Features | |
---|---|
Dynamic Shapes | TensorRT enables the creation of network resolutions different from the original exported ONNX. |
Dynamic Batch Size | Dynamically adjusts the batch size to maximize model performance according to the number of deepstream sources. |
NMS-Free | Models natively implement NMS-Free, available for some YOLOv9 models and all YOLOv10 detection models. |
TensorRT Plugins | TensorRT EfficientNMS plugin for detection models, and EfficientNMSX / ROIAlign plugins for segmentation models. |
This project involves important steps as outlined below:
git clone https://github.com/levipereira/deepstream-yolo-e2e.git
cd deepstream-yolo-e2e
git submodule update --init --recursive
In this example, we will use DeepStream 7.1.
Start the docker container from deepstream-yolo-e2e
dir:
Note: If you are not using the computer's display output, especially on Jetson devices, remove the
-e DISPLAY=$DISPLAY
and-v /tmp/.X11-unix/:/tmp/.X11-unix
options, as they may cause errors.
xhost +
docker run \
-it \
--privileged \
--rm \
--net=host \
--ipc=host \
--gpus all \
-e DISPLAY=$DISPLAY \
-e CUDA_CACHE_DISABLE=0 \
--device /dev/snd \
-v /tmp/.X11-unix/:/tmp/.X11-unix \
-v `pwd`:/apps/deepstream-yolo-e2e \
-w /apps/deepstream-yolo-e2e \
nvcr.io/nvidia/deepstream:7.1-triton-multiarch
xhost +
docker run \
-it \
--privileged \
--rm \
--net=host \
--ipc=host \
--gpus all \
-e DISPLAY=$DISPLAY \
-v /tmp/.X11-unix/:/tmp/.X11-unix \
-v `pwd`:/apps/deepstream-yolo-e2e \
-w /apps/deepstream-yolo-e2e \
nvcr.io/nvidia/deepstream:7.1-triton-multiarch
xhost +
docker run \
-it \
--privileged \
--rm \
--net=host \
--ipc=host \
--runtime nvidia \
-e DISPLAY=$DISPLAY \
-v /tmp/.X11-unix/:/tmp/.X11-unix \
-v `pwd`:/apps/deepstream-yolo-e2e \
-w /apps/deepstream-yolo-e2e \
nvcr.io/nvidia/deepstream:7.1-triton-multiarch
cd /apps/deepstream-yolo-e2e
bash /apps/deepstream-yolo-e2e/one_hit_install.sh
cd /apps/deepstream-yolo-e2e
./deepstream.py
The model with the highest performance and accuracy is YOLOv9-QAT (ReLU). This quantized model delivers exceptional results and supports multiple sources, depending on your GPU capabilities.
You can find it in the model selection menu:
- Coco > Detection > Balanced > YOLOv9 QAT (ReLU)
The application can be configured using the config/python_app/config.ini
file. Below are the key settings you can modify:
-
MUXER_OUTPUT_WIDTH and MUXER_OUTPUT_HEIGHT: These parameters define the dimensions (width and height) of the output video stream produced by the muxer.
-
TILED_OUTPUT_WIDTH and TILED_OUTPUT_HEIGHT: These parameters specify the dimensions of the tiled output format.
-
OUTPUT_DIRECTORY: This is the directory where output files will be saved.
-
OUTPUT_PREFIX: This parameter specifies the prefix for the output file names.
-
RTSP_PORT: This is the port used for the RTSP stream. The default value is
8554
. -
RTSP_FACTORY: This represents the path for the RTSP stream. For example, setting it to
/live
allows for streaming under this path. -
RTSP_UDPSYNC: This is the internal port used by DeepStream to connect to the RTSP server. The default value is 8255.
RTSP URL Format
When constructing the RTSP URL, it will always follow this format:
rtsp://<server_ip>:<RTSP_PORT><RTSP_FACTORY>
For example, if you use the default settings, the URL would be:
rtsp://<server_ip>:8554/live
- YOLO11: Official repository for the YOLO11 model.
- YOLOv10: The official repository for YOLOv10.
- YOLOv9: Access the official YOLOv9 repository.
- YOLOv8: Official repository for the YOLOv8 model.
- YOLOv7: Explore the YOLOv7 repository.
- YOLO-FACE: Dedicated repository for the YOLO-FACE model.
- Export YOLO 11/v10/v8: This repository for exporting YOLO11, v10, and v8 models with End2End.
- YOLOv9 QAT: Official repository for quantization-aware training (QAT) of the YOLOv9 model,.
- NvDsInferYolo: Explore the official repository for the NvDsInferYolo parsing function.
- YOLO E2E: This repository focuses on exporting YOLO models with End2End capabilities.
- TensorRT Plugin EfficientNMSX: Official repository for the EfficientNMSX plugin implemented in TensorRT.