Questions about gpu batched nms #192

YoungjaeDev · 2022-06-22T13:51:55Z

The cluster_mode in the config is set to 4, but did you improve the post-processing by squeezing the code yourself?
In other words, it seems that you did not use clustering provided by deepstream, and you put in the code yourself
Can you tell me exactly which part it is?
Thank you

marcoslucianops · 2022-06-22T14:41:36Z

I added the GPU Batched NMS, so it's not needed to use the CPU NMS (cluster-mode=2) anymore. You can see the comparison here #142. The cluster-mode=4 disables the clustering did by DeepStream. I changed the ouptus to fit the TensorRT BatchedNMS plugin , then created a logic to sort the outputs, and used the TensorRT createBatchedNMSPlugin function to create the NMS layer.

YoungjaeDev · 2022-06-22T22:57:13Z

@marcoslucianops

Did you have that experience in tenssort7

nemosupremo · 2022-06-22T23:31:04Z

@marcoslucianops

It seems GPU Batched NMS has a 66% performance improvement which is amazing, but a drawback here is the TensorRT engine needs to be rebuilt if the iou/score/topk changes + not being able to per-class config options.

Is it possible to support both modes (use CPU when cluster-mode=2 and GPU when cluster-mode=4)?

marcoslucianops · 2022-06-23T00:03:05Z

@youngjae-avikus, in what specifically?

@nemosupremo, you can use the per-class config (class-attrs-0, class-attrs-1, etc). The score-threshold will work as minimum score, then the pre-cluster-threshold will filter the scores according to each object (the same goes for the topk, but it should be the max topk value in the config_nms.txt file). The cluster-mode=2 only uses the nms-iou-threshold value. It's possible to change the code to use each one according to the cluster-mode but, in my opinion, it's not necessary because the improvement of GPU Batched NMS is too big.

Note: Using pre-cluster-threshold and topk in [class-attrs] section will increase the CPU usage and may decrease the performance.

YoungjaeDev · 2022-06-23T00:11:35Z

@marcoslucianops

about TensorRT BatchedNMS plugin

nemosupremo · 2022-06-23T00:21:24Z

@marcoslucianops

So if I have 3 classes, 10, 1, 3 in my config_infer_primary.txt I will have:

[class-attrs-0]
pre-cluster-threshold=0.2
nms-iou-threshold=.213
[class-attrs-1]
pre-cluster-threshold=0.4
nms-iou-threshold=.4
[class-attrs-3]
pre-cluster-threshold=0.5
nms-iou-threshold=.5

Then in my config_nms I would have to do something like:

[property]
iou-threshold=min(nms-iou-threshold)
score-threshold=min(pre-cluster-threshold)
topk=300

Correct?

marcoslucianops · 2022-06-23T00:39:46Z

@youngjae-avikus, There's the same function for TensorRT 7 (createBatchedNMSPlugin()) but it's easy to use from the plugins too.

@nemosupremo, the nms-iou-threshold only works with cluster-mode=2, which is disabled ( cluster-mode=4) due to GPU BatchedNMS. You should use only the pre-cluster-threshold key.

nemosupremo · 2022-06-23T01:05:01Z

@marcoslucianops

So with this setup, my iou-threshold is identical for every class; but my class confidence can vary as long as it is greater than the score-threshold in config_nms. Ok.

marcoslucianops · 2022-06-23T14:29:38Z

@nemosupremo, yes

YoungjaeDev · 2022-06-24T01:47:13Z

I want to activate the class agnostic nms option, can I control it from the tensorrt nms plug-in to the coded?

@marcoslucianops

marcoslucianops · 2022-06-27T11:20:57Z

@youngjae-avikus, I'm not familiar with agnostic nms, but I think you need to change the yoloLayer outputs to fit the batchedNMSPlugin input with shareLocation = false and the output shape. You probably need to change the logic to add all classes to the output bbox instead of the maxProb class.

YoungjaeDev · 2022-06-28T23:29:36Z

@marcoslucianops

Thank you. I'll try it over time
Please don't close the issue for a while

adimukewar · 2022-07-14T03:45:42Z

Is topk filter for nms applied before or after the NMS GPU implementation?
When I increase the topk, higher confidence bounding boxes appear. Also, total number of objects detected is same in both scenarios.

marcoslucianops · 2022-07-14T15:02:12Z

@adimukewar, the topK is applied to limit the outputs before the NMS (yoloLayer) and during the NMS (GPU Batched NMS).

marcoslucianops · 2022-08-15T13:10:55Z

New optimized NMS #142

marcoslucianops closed this as completed Aug 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about gpu batched nms #192

Questions about gpu batched nms #192

YoungjaeDev commented Jun 22, 2022

marcoslucianops commented Jun 22, 2022

YoungjaeDev commented Jun 22, 2022

nemosupremo commented Jun 22, 2022

marcoslucianops commented Jun 23, 2022 •

edited

Loading

YoungjaeDev commented Jun 23, 2022

nemosupremo commented Jun 23, 2022 •

edited

Loading

marcoslucianops commented Jun 23, 2022

nemosupremo commented Jun 23, 2022

marcoslucianops commented Jun 23, 2022

YoungjaeDev commented Jun 24, 2022

marcoslucianops commented Jun 27, 2022

YoungjaeDev commented Jun 28, 2022

adimukewar commented Jul 14, 2022

marcoslucianops commented Jul 14, 2022

marcoslucianops commented Aug 15, 2022

Questions about gpu batched nms #192

Questions about gpu batched nms #192

Comments

YoungjaeDev commented Jun 22, 2022

marcoslucianops commented Jun 22, 2022

YoungjaeDev commented Jun 22, 2022

nemosupremo commented Jun 22, 2022

marcoslucianops commented Jun 23, 2022 • edited Loading

YoungjaeDev commented Jun 23, 2022

nemosupremo commented Jun 23, 2022 • edited Loading

marcoslucianops commented Jun 23, 2022

nemosupremo commented Jun 23, 2022

marcoslucianops commented Jun 23, 2022

YoungjaeDev commented Jun 24, 2022

marcoslucianops commented Jun 27, 2022

YoungjaeDev commented Jun 28, 2022

adimukewar commented Jul 14, 2022

marcoslucianops commented Jul 14, 2022

marcoslucianops commented Aug 15, 2022

marcoslucianops commented Jun 23, 2022 •

edited

Loading

nemosupremo commented Jun 23, 2022 •

edited

Loading