-
-
Notifications
You must be signed in to change notification settings - Fork 16.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export, detect and validation with TensorRT engine file #5699
Conversation
I just noticed that there are already some PRs discussing make Output of TensorRT validation: val: data=data/coco.yaml, weights=['weights/yolov5s.trt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.6, task=val, device=0, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 🚀 v6.0-99-gc61bd93 torch 1.10.0a0+git36449ea CUDA:0 (NVIDIA GeForce RTX 2060, 5935MiB)
Loading weights/yolov5s.trt for TensorRT inference...
[11/18/2021-16:16:20] [TRT] [I] [MemUsageChange] Init CUDA: CPU +318, GPU +0, now: CPU 423, GPU 800 (MiB)
[11/18/2021-16:16:20] [TRT] [I] Loaded engine size: 36 MiB
[11/18/2021-16:16:20] [TRT] [I] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 460 MiB, GPU 800 MiB
[11/18/2021-16:16:20] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +497, GPU +214, now: CPU 966, GPU 1050 (MiB)
[11/18/2021-16:16:20] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +169, GPU +204, now: CPU 1135, GPU 1254 (MiB)
[11/18/2021-16:16:20] [TRT] [I] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1134 MiB, GPU 1236 MiB
[11/18/2021-16:16:20] [TRT] [I] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1098 MiB, GPU 1262 MiB
[11/18/2021-16:16:20] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 1098, GPU 1272 (MiB)
[11/18/2021-16:16:20] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1098, GPU 1280 (MiB)
[11/18/2021-16:16:20] [TRT] [I] [MemUsageSnapshot] ExecutionContext creation end: CPU 1098 MiB, GPU 1334 MiB
Forcing --batch-size 1 square inference shape(1,3,640,640) for non-PyTorch backends
val: Scanning '/datasets/11_mscoco/YOLO/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupted: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [00:00<?, ?it/s]
Class Images Labels P R mAP@.5 mAP@.5:.95: 100%|██████████| 5000/5000 [02:14<00:00, 37.18it/s]
all 5000 36335 0.658 0.505 0.552 0.358
Speed: 0.5ms pre-process, 10.2ms inference, 8.4ms NMS per image at shape (1, 3, 640, 640)
Evaluating pycocotools mAP... saving runs/val/exp10/yolov5s_predictions.json...
loading annotations into memory...
Done (t=0.48s)
creating index...
index created!
Loading and preparing results...
DONE (t=6.54s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=98.43s).
Accumulating evaluation results...
DONE (t=20.10s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.369
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.560
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.397
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.217
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.423
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.475
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.305
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.515
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.567
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.381
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.631
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.713 |
@imyhxy awesome, this looks really promising! Yes there is another TRT PR but it did not insert inference code in DetectMultiBackend properly, this PR seems like it does. I will review today or tomorrow. |
Hi, @glenn-jocher I just looked into the mentioned pull request 5700, and update my implementation. Now, we get rid of the Pytorch FP16 model inference with batch size 1: Model Summary: 213 layers, 7225885 parameters, 0 gradients
val: Scanning '/home/user/datasets/11_mscoco/YOLO/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupted: 100%|██████████| 5000/5000 [00:00<?, ?it/s]
Class Images Labels P R mAP@.5 mAP@.5:.95: 100%|██████████| 5000/5000 [03:15<00:00, 25.59it/s]
all 5000 36335 0.668 0.505 0.555 0.359
Speed: 0.3ms pre-process, 22.0ms inference, 2.3ms NMS per image at shape (1, 3, 640, 640) TensorRT FP16 inference with batch size 1: val: data=data/coco.yaml, weights=['weights/yolov5s_fp16.engine'], batch_size=1, imgsz=640, conf_thres=0.001, iou_thres=0.6, task=val, device=, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True, dnn=False
YOLOv5 🚀 v6.0-104-g038e141 torch 1.10.0a0+git36449ea CUDA:0 (NVIDIA GeForce RTX 2060, 5935MiB)
Loading weights/yolov5s_fp16.engine for TensorRT inference...
[11/19/2021-16:13:10] [TRT] [I] [MemUsageChange] Init CUDA: CPU +320, GPU +0, now: CPU 413, GPU 493 (MiB)
[11/19/2021-16:13:10] [TRT] [I] Loaded engine size: 17 MiB
[11/19/2021-16:13:10] [TRT] [I] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 431 MiB, GPU 493 MiB
[11/19/2021-16:13:10] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +496, GPU +217, now: CPU 936, GPU 728 (MiB)
[11/19/2021-16:13:11] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +169, GPU +203, now: CPU 1105, GPU 931 (MiB)
[11/19/2021-16:13:11] [TRT] [I] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1105 MiB, GPU 913 MiB
[11/19/2021-16:13:12] [TRT] [I] [MemUsageSnapshot] ExecutionContext creation begin: CPU 2219 MiB, GPU 1433 MiB
[11/19/2021-16:13:12] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 2219, GPU 1443 (MiB)
[11/19/2021-16:13:12] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2219, GPU 1451 (MiB)
[11/19/2021-16:13:12] [TRT] [I] [MemUsageSnapshot] ExecutionContext creation end: CPU 2219 MiB, GPU 1485 MiB
val: Scanning '/home/user/datasets/11_mscoco/YOLO/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupted: 100%|██████████| 5000/5000 [00:00<?, ?it/s]
Class Images Labels P R mAP@.5 mAP@.5:.95: 100%|██████████| 5000/5000 [01:43<00:00, 48.47it/s]
all 5000 36335 0.647 0.511 0.552 0.358
Speed: 0.4ms pre-process, 4.5ms inference, 2.2ms NMS per image at shape (1, 3, 640, 640) |
Hi @imyhxy, Well done! I have tried your code to export in TensorRT and it works fine. The model on TensorRT with half precision (16FP) is running almost 4x faster than native pytorch with FP16. I dont have that much experience on TensorRT. I wanted to ask you is it possible to export the model to TensorRT "int8" instead of FP16 and then load it with python? Thanks! |
@Auth0rM0rgan yes, but INT8 quantization drop the mAP massivly with current YOLOv5 model, so I didn't implement that yet. |
@Auth0rM0rgan you see a 4x speedup with TRT export and inference compared with the base PyTorch GPU inference? |
@imyhxy awesome, thanks for the updates! |
@glenn-jocher, Yes, I exported the yolov5m-objects365 and it speed up almost 4x with the same image-size and hyper parameters. It took ~0.008 for a frame with base PyTorch GPU, and 0.002 with TensorRT which is amazing! btw, My GPU is 3090RTX |
@Auth0rM0rgan wow, thanks for confirming! @imyhxy is this PR ready to merge? |
@glenn-jocher yes! I have done some tests, it works fine. |
Great! /rebase |
@imyhxy do you know how to install tensorrt in Colab? I tried running this PR but can not import tensorrt, even after |
/rebase |
@glenn-jocher Hi, the |
@glenn-jocher Hi, I wrote a script to install TensorRT package on colab, I am not sure it works or not, because my account can only get a Tesla K80 GPU to run, whose |
@imyhxy thanks! I just requested access to the notebook. |
@glenn-jocher Morning here 🌞 My mistake, I don't set the share permission right. New link |
@imyhxy thanks! I got the notebook working, but detect.py inference is very slow with yolov5s.engine (270ms). I think perhaps it's running on CPU? Is that possible? BTW I hosted the Colab TRT install file here to allow for a public notebook example: EDIT: Nevermind, I re-ran with a V100 Colab instance and speeds improved to 3 ms. Perhaps this was just an A100 issue. |
@imyhxy is there a reason you read from the new ONNX buffer instead of reading from the ONNX file? I see that there is a EDIT: If we read from the ONNX file we could simplify the PR by leaving export_onnx() function the way it is currently. |
@glenn-jocher Hi🌞
EDIT: I have checked that the EDIT: Update script. By the way, I am not sure hosting the TensorRT and cuDNN package is viloating the license or not. |
/rebase |
@imyhxy it seems we can't apply the deprecation warning fix because then export fails, so I'm reverting my previous change here.
@imyhxy thanks, got it on all points! If you can confirm license problems then I should remove the hosting. Everything else looks good, the only thing I noticed is the different handling in detect.py and val.py. Mainly this affects --half ops. In detect.py --half only applies to pt files, but in val.py we have pt or engine. Do the TRT models accept FP16 input images? Do the models need to be exported as FP16 models in this case? I'll run some tests on the PR and update here. |
@glenn-jocher Hi, Line 84 in e312b0f
And the TensorRT plan file do accept and only accept FP16 image when it was exported by For the license part, I am not a native English speaker, and make me hard to understand the license document. According to the following section, seem like that we can't just redistribution the original files. FYI, I also attach the license document here. 1.2. Distribution Requirements
These are the distribution requirements for you to exercise the distribution grant:
1. Your application must have material additional functionality, beyond the included portions of the SDK.
2. The distributable portions of the SDK shall only be accessed by your application.
3. The following notice shall be included in modifications and derivative works of sample source code distributed: “This software contains source code provided by NVIDIA Corporation.”
4. Unless a developer tool is identified in this Agreement as distributable, it is delivered for your internal use only.
5. The terms under which you distribute your application must be consistent with the terms of this Agreement, including (without limitation) terms relating to the license grant and license restrictions and protection of NVIDIA’s intellectual property rights. Additionally, you agree that you will protect the privacy, security and legal rights of your application users.
6. You agree to notify NVIDIA in writing of any known or suspected distribution or use of the SDK not in compliance with the requirements of this Agreement, and to enforce the terms of your agreements with respect to distributed SDK. |
@imyhxy I'm not a native english speaker either, it's my second language after Spanish but I'm pretty good at it and I don't really understand the language there either! Probably best just to remove the hosting to stay safe, I'll update the Colab notebook appendix section I made with the official URL. You are right about --half, I didn't note your detect.py updates! |
@imyhxy PR is merged. Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐ |
@imyhxy BTW I took your notebook and squeezed it into a single cell at the end of the Appendix section. All of YOLOv5 has only one notebook currently for everything: https://github.com/ultralytics/yolov5/blob/master/tutorial.ipynb |
|
Hi, I was doing QAT using tensorRT's tool, And now After export onnx model I need extra nvinfer_plugin.so , How to enable it? |
@wanghr323 Seems you are not set the TensorRT environment properly. Make sure your |
@imyhxy When converting pytorch model to TensorRT engine, the
With line 1, generated TensorRT network will following intput/output binding layers as follows:
If without line 1, the result is:
I noticed I don't know what TensorRT exactly do when BuilderFlag is set to FP16. Thanks! |
@passerbythesun Hi, there. The line1 will only affect the When line2 involved, the layers of the engine is always running in half precision (except the ops not support half precision). You can add |
Is not dependent on the tensorrt cudnn version when I run the tensorrt export, is it right? |
@ingbeeedd |
…5699) * Export and detect with TensorRT engine file * Resolve `isort` * Make validation works with TensorRT engine * feat: update export docstring * feat: change suffix from *.trt to *.engine * feat: get rid of pycuda * feat: make compatiable with val.py * feat: support detect with fp16 engine * Add Lite to Edge TPU string * Remove *.trt comment * Revert to standard success logger.info string * Fix Deprecation Warning ``` export.py:310: DeprecationWarning: Use build_serialized_network instead. with builder.build_engine(network, config) as engine, open(f, 'wb') as t: ``` * Revert deprecation warning fix @imyhxy it seems we can't apply the deprecation warning fix because then export fails, so I'm reverting my previous change here. * Update export.py * Update export.py * Update common.py * export onnx to file before building TensorRT engine file * feat: triger ONNX export failed early * feat: load ONNX model from file Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
@imyhxy can you implement similar mAP calculation code for .trt models of "yolov7" |
Hi, there.
I have added support for exporting and detecting with
TensorRT
plan file to the yolov5. The requirements is that you should installtensorrt
andpycuda
python package.Export :
Output of export:
Detect:
Output of detect:
Note: you have to manually specify the image size (both width and height) in detection phase
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Added support for NVIDIA TensorRT inference in YOLOv5.
📊 Key Changes
.engine
file support inmodels/common.py
,export.py
, andval.py
..onnx
using theexport_engine()
function inexport.py
.detect.py
andval.py
.export.py
.🎯 Purpose & Impact