-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Plugin EfficentNMSX #3920
base: release/10.0
Are you sure you want to change the base?
New Plugin EfficentNMSX #3920
Conversation
Signed-off-by: Levi Pereira <levi.pereira@gmail.com>
Signed-off-by: Levi Pereira <levi.pereira@gmail.com>
@samurdhikaru |
Hi @levipereira , thank you for your work. Can you provide a demo on using output indexes? As far as I know, TensorRT has a issue about data dependent shape(dds). You can look at this link. Can EfficientNMSX plugin resolve this issue? |
@demuxin I have implemented EfficientNMSX in all Segmentation Models |
Hi @levipereira , I have some problems working with your plugin.
Then I get an error:
How to fix this problem? |
Moreover, this problem occurs both in the case of |
It’s possible that the library isn’t being updated correctly in/usr/lib/x86_64-linux-gnu/ . I’ve already compiled the library for x86 environments, so you can skip the compilation steps and use the pre-compiled libraries available in my repository. You can easily update and use the models by executing the following script: GitHub Repository: deepstream-yolo-e2e - TensorRT Plugin - patch_libnvinfer.sh |
@levipereira Is it possible to somehow build "trtexec" files for x86_64 and aarch64 (Jetson) for TensorRT 8.5? I really need these files to convert from onnx to engine, which I will run using my c++ code |
You only need the |
Now I get new error:
[08/02/2024-16:18:30] [E] Error[4]: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IElementWiseLayer /model/model.22/Sub: broadcast dimensions must be conformable)
|
I just followed these step using Tip: https://github.com/levipereira/deepstream-yolo-e2e
|
@levipereira I was able to successfully create an engine for the yolov8x-seg-trt.onnx model also with resolution 640x640 (1x3x640x640). But the conversion only works for this resolution, which is very strange... Does your plugin have hardcoded fragments for working with 640x640 only?
|
I didn't check yet, but you can try disable dynamic shape and check it again. |
@levipereira I attempted to build TensorRT release 8.6 and TensorRT release 10.0, and also built the EfficientNMSX-related code on the 10.1 and 10.2 branches of NVIDIA TensorRT. When building and testing TensorRT on the 3060 Ti and 4090, exporting the ONNX model with the EfficientNMSX plugin using the built |
@andrew-93, you might want to try building TensorRT on a different device. I’ve tested it on the 2080 Ti, 3060 Ti, and 4090, and only the 2080 Ti is not working correctly. |
@laugh12321 I would recommend trying different workspace sizes, such as 4GB and then 20GB, to see if the issue persists. It's possible that this problem might also occur with the Efficient_NMS plugin, so checking this could provide additional insights. I have an RTX 2060/2070 and I'll test as soon as possible. |
This is related to dynamic shapes in YOLOv8. I reused their base code. As a workaround, exporting the ONNX model with the desired input shapes will make it work,(i.e disable dynamic shapes) |
|
[08/07/2024-07:53:41] [I] === Model Options ===
[08/07/2024-07:53:41] [I] Format: ONNX
[08/07/2024-07:53:41] [I] Model: D:\laugh\Projects\TensorRT-YOLO\demo\obb\models\yolov8s-obb.onnx
[08/07/2024-07:53:41] [I] Output:
[08/07/2024-07:53:41] [I] === Build Options ===
[08/07/2024-07:53:41] [I] Memory Pools: workspace: 2.14748e+10 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[08/07/2024-07:53:41] [I] avgTiming: 8
[08/07/2024-07:53:41] [I] Precision: FP32+FP16
[08/07/2024-07:53:41] [I] LayerPrecisions:
[08/07/2024-07:53:41] [I] Layer Device Types:
[08/07/2024-07:53:41] [I] Calibration:
[08/07/2024-07:53:41] [I] Refit: Disabled
[08/07/2024-07:53:41] [I] Strip weights: Disabled
[08/07/2024-07:53:41] [I] Version Compatible: Disabled
[08/07/2024-07:53:41] [I] ONNX Plugin InstanceNorm: Disabled
[08/07/2024-07:53:41] [I] TensorRT runtime: full
[08/07/2024-07:53:41] [I] Lean DLL Path:
[08/07/2024-07:53:41] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[08/07/2024-07:53:41] [I] Exclude Lean Runtime: Disabled
[08/07/2024-07:53:41] [I] Sparsity: Disabled
[08/07/2024-07:53:41] [I] Safe mode: Disabled
[08/07/2024-07:53:41] [I] Build DLA standalone loadable: Disabled
[08/07/2024-07:53:41] [I] Allow GPU fallback for DLA: Disabled
[08/07/2024-07:53:41] [I] DirectIO mode: Disabled
[08/07/2024-07:53:41] [I] Restricted mode: Disabled
[08/07/2024-07:53:41] [I] Skip inference: Disabled
[08/07/2024-07:53:41] [I] Save engine: D:\laugh\Projects\TensorRT-YOLO\demo\obb\models\yolov8s-obb.engine
[08/07/2024-07:53:41] [I] Load engine:
[08/07/2024-07:53:41] [I] Profiling verbosity: 0
[08/07/2024-07:53:41] [I] Tactic sources: Using default tactic sources
[08/07/2024-07:53:41] [I] timingCacheMode: local
[08/07/2024-07:53:41] [I] timingCacheFile:
[08/07/2024-07:53:41] [I] Enable Compilation Cache: Enabled
[08/07/2024-07:53:41] [I] errorOnTimingCacheMiss: Disabled
[08/07/2024-07:53:41] [I] Preview Features: Use default preview flags.
[08/07/2024-07:53:41] [I] MaxAuxStreams: -1
[08/07/2024-07:53:41] [I] BuilderOptimizationLevel: -1
[08/07/2024-07:53:41] [I] Calibration Profile Index: 0
[08/07/2024-07:53:41] [I] Weight Streaming: Disabled
[08/07/2024-07:53:41] [I] Runtime Platform: Same As Build
[08/07/2024-07:53:41] [I] Debug Tensors:
[08/07/2024-07:53:41] [I] Input(s)s format: fp32:CHW
[08/07/2024-07:53:41] [I] Output(s)s format: fp32:CHW
[08/07/2024-07:53:41] [I] Input build shapes: model
[08/07/2024-07:53:41] [I] Input calibration shapes: model
[08/07/2024-07:53:41] [I] === System Options ===
[08/07/2024-07:53:41] [I] Device: 0
[08/07/2024-07:53:41] [I] DLACore:
[08/07/2024-07:53:41] [I] Plugins:
[08/07/2024-07:53:41] [I] setPluginsToSerialize:
[08/07/2024-07:53:41] [I] dynamicPlugins:
[08/07/2024-07:53:41] [I] ignoreParsedPluginLibs: 0
[08/07/2024-07:53:41] [I]
[08/07/2024-07:53:41] [I] === Inference Options ===
[08/07/2024-07:53:41] [I] Batch: Explicit
[08/07/2024-07:53:41] [I] Input inference shapes: model
[08/07/2024-07:53:41] [I] Iterations: 10
[08/07/2024-07:53:41] [I] Duration: 3s (+ 200ms warm up)
[08/07/2024-07:53:41] [I] Sleep time: 0ms
[08/07/2024-07:53:41] [I] Idle time: 0ms
[08/07/2024-07:53:41] [I] Inference Streams: 1
[08/07/2024-07:53:41] [I] ExposeDMA: Disabled
[08/07/2024-07:53:41] [I] Data transfers: Enabled
[08/07/2024-07:53:41] [I] Spin-wait: Disabled
[08/07/2024-07:53:41] [I] Multithreading: Disabled
[08/07/2024-07:53:41] [I] CUDA Graph: Disabled
[08/07/2024-07:53:41] [I] Separate profiling: Disabled
[08/07/2024-07:53:41] [I] Time Deserialize: Disabled
[08/07/2024-07:53:41] [I] Time Refit: Disabled
[08/07/2024-07:53:41] [I] NVTX verbosity: 0
[08/07/2024-07:53:41] [I] Persistent Cache Ratio: 0
[08/07/2024-07:53:41] [I] Optimization Profile Index: 0
[08/07/2024-07:53:41] [I] Weight Streaming Budget: 100.000000%
[08/07/2024-07:53:41] [I] Inputs:
[08/07/2024-07:53:41] [I] Debug Tensor Save Destinations:
[08/07/2024-07:53:41] [I] === Reporting Options ===
[08/07/2024-07:53:41] [I] Verbose: Disabled
[08/07/2024-07:53:41] [I] Averages: 10 inferences
[08/07/2024-07:53:41] [I] Percentiles: 90,95,99
[08/07/2024-07:53:41] [I] Dump refittable layers:Disabled
[08/07/2024-07:53:41] [I] Dump output: Disabled
[08/07/2024-07:53:41] [I] Profile: Disabled
[08/07/2024-07:53:41] [I] Export timing to JSON file:
[08/07/2024-07:53:41] [I] Export output to JSON file:
[08/07/2024-07:53:41] [I] Export profile to JSON file:
[08/07/2024-07:53:41] [I]
[08/07/2024-07:53:41] [I] === Device Information ===
[08/07/2024-07:53:41] [I] Available Devices:
[08/07/2024-07:53:41] [I] Device 0: "NVIDIA GeForce RTX 2080 Ti" UUID: GPU-c0370922-f0e4-5f2b-5f7a-16ae5ab03013
[08/07/2024-07:53:41] [I] Selected Device: NVIDIA GeForce RTX 2080 Ti
[08/07/2024-07:53:41] [I] Selected Device ID: 0
[08/07/2024-07:53:41] [I] Selected Device UUID: GPU-c0370922-f0e4-5f2b-5f7a-16ae5ab03013
[08/07/2024-07:53:41] [I] Compute Capability: 7.5
[08/07/2024-07:53:41] [I] SMs: 68
[08/07/2024-07:53:41] [I] Device Global Memory: 22527 MiB
[08/07/2024-07:53:41] [I] Shared Memory per SM: 64 KiB
[08/07/2024-07:53:41] [I] Memory Bus Width: 352 bits (ECC disabled)
[08/07/2024-07:53:41] [I] Application Compute Clock Rate: 1.755 GHz
[08/07/2024-07:53:41] [I] Application Memory Clock Rate: 7 GHz
[08/07/2024-07:53:41] [I]
[08/07/2024-07:53:41] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[08/07/2024-07:53:41] [I]
[08/07/2024-07:53:41] [I] TensorRT version: 10.2.0
[08/07/2024-07:53:41] [I] Loading standard plugins
[08/07/2024-07:53:42] [I] [TRT] [MemUsageChange] Init CUDA: CPU +402, GPU +0, now: CPU 8696, GPU 1275 (MiB)
[08/07/2024-07:53:43] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1300, GPU +180, now: CPU 10309, GPU 1455 (MiB)
[08/07/2024-07:53:43] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[08/07/2024-07:53:43] [I] Start parsing network model.
[08/07/2024-07:53:43] [I] [TRT] ----------------------------------------------------------------
[08/07/2024-07:53:43] [I] [TRT] Input filename: D:\laugh\Projects\TensorRT-YOLO\demo\obb\models\yolov8s-obb.onnx
[08/07/2024-07:53:43] [I] [TRT] ONNX IR version: 0.0.6
[08/07/2024-07:53:43] [I] [TRT] Opset version: 11
[08/07/2024-07:53:43] [I] [TRT] Producer name: pytorch
[08/07/2024-07:53:43] [I] [TRT] Producer version: 2.4.0
[08/07/2024-07:53:43] [I] [TRT] Domain:
[08/07/2024-07:53:43] [I] [TRT] Model version: 0
[08/07/2024-07:53:43] [I] [TRT] Doc string:
[08/07/2024-07:53:43] [I] [TRT] ----------------------------------------------------------------
[08/07/2024-07:53:43] [I] [TRT] No checker registered for op: EfficientNMSX_TRT. Attempting to check as plugin.
[08/07/2024-07:53:43] [I] [TRT] No importer registered for op: EfficientNMSX_TRT. Attempting to import as plugin.
[08/07/2024-07:53:43] [I] [TRT] Searching for plugin: EfficientNMSX_TRT, plugin_version: 1, plugin_namespace:
[08/07/2024-07:53:43] [I] [TRT] Successfully created plugin: EfficientNMSX_TRT
[08/07/2024-07:53:43] [I] Finished parsing network model. Parse time: 0.0738637
[08/07/2024-07:53:43] [I] [TRT] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32.
[08/07/2024-07:53:43] [I] [TRT] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32.
[08/07/2024-07:53:43] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[08/07/2024-07:57:22] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception Assertion pluginUtils::isSuccess(status) failed.
[08/07/2024-07:57:22] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception Assertion pluginUtils::isSuccess(status) failed.
[08/07/2024-07:57:22] [E] Error[10]: IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node /model.22/EfficientNMSX_TRT.)
[08/07/2024-07:57:22] [E] Engine could not be created from network
[08/07/2024-07:57:22] [E] Building engine failed
[08/07/2024-07:57:22] [E] Failed to create engine from model or file.
[08/07/2024-07:57:22] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100200] # D:\laugh\Downloads\TensorRT-v10.2.0.19_build_by_CUDA-v12.4_cuDNN-v8.9.7.29\v10.2.0.19\bin\trtexec.exe --onnx=D:\laugh\Projects\TensorRT-YOLO\demo\obb\models\yolov8s-obb.onnx --saveEngine=D:\laugh\Projects\TensorRT-YOLO\demo\obb\models\yolov8s-obb.engine --fp16 --memPoolSize=workspace:21474836480 |
I solved this problem. Now you can set any (even non-square) input resolution: For example, you need to create pt-->onnx-->engine, and the input resolution you want to set: width 1024, height 512 pt-->onnx onnx-->engine |
@samurdhikaru @johnnynunez
Continuing the discussion initiated in PR #3859
I removed the YoloNMS plugin and created a new plugin named EfficientNMSX (with 'X' representing the index) within the structure of the EfficientNMSPlugin. The changes involved creating a new plugin using the current index generation logic and simply adding a new layer that returns detection indices.
Since the changes were minimal, I did not implement the IPluginV3 interface, as that would have required a complete overhaul of the entire plugin structure.
I conducted all tests on both the EfficientNMS_TRT and EfficientNMSX_TRT plugins, and both functioned correctly.
I would appreciate your suggestion regarding the IPluginV3 implementation. Should we update the entire EfficientNMSPlugin now, or should we continue with the current approach and make the switch to IPluginV3 all at once during a future upgrade?
Theses changes also was implemented on release/8.6 ( only for test purposes)
https://github.com/levipereira/TensorRT/tree/release/8.6