Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about gpu batched nms #192

Closed
YoungjaeDev opened this issue Jun 22, 2022 · 15 comments
Closed

Questions about gpu batched nms #192

YoungjaeDev opened this issue Jun 22, 2022 · 15 comments

Comments

@YoungjaeDev
Copy link
Contributor

The cluster_mode in the config is set to 4, but did you improve the post-processing by squeezing the code yourself?
In other words, it seems that you did not use clustering provided by deepstream, and you put in the code yourself
Can you tell me exactly which part it is?
Thank you

@marcoslucianops
Copy link
Owner

I added the GPU Batched NMS, so it's not needed to use the CPU NMS (cluster-mode=2) anymore. You can see the comparison here #142. The cluster-mode=4 disables the clustering did by DeepStream. I changed the ouptus to fit the TensorRT BatchedNMS plugin , then created a logic to sort the outputs, and used the TensorRT createBatchedNMSPlugin function to create the NMS layer.

@YoungjaeDev
Copy link
Contributor Author

@marcoslucianops

Did you have that experience in tenssort7

@nemosupremo
Copy link

@marcoslucianops

It seems GPU Batched NMS has a 66% performance improvement which is amazing, but a drawback here is the TensorRT engine needs to be rebuilt if the iou/score/topk changes + not being able to per-class config options.

Is it possible to support both modes (use CPU when cluster-mode=2 and GPU when cluster-mode=4)?

@marcoslucianops
Copy link
Owner

marcoslucianops commented Jun 23, 2022

@youngjae-avikus, in what specifically?

@nemosupremo, you can use the per-class config (class-attrs-0, class-attrs-1, etc). The score-threshold will work as minimum score, then the pre-cluster-threshold will filter the scores according to each object (the same goes for the topk, but it should be the max topk value in the config_nms.txt file). The cluster-mode=2 only uses the nms-iou-threshold value. It's possible to change the code to use each one according to the cluster-mode but, in my opinion, it's not necessary because the improvement of GPU Batched NMS is too big.

Note: Using pre-cluster-threshold and topk in [class-attrs] section will increase the CPU usage and may decrease the performance.

@YoungjaeDev
Copy link
Contributor Author

@nemosupremo
Copy link

nemosupremo commented Jun 23, 2022

@marcoslucianops

So if I have 3 classes, 10, 1, 3 in my config_infer_primary.txt I will have:

[class-attrs-0]
pre-cluster-threshold=0.2
nms-iou-threshold=.213
[class-attrs-1]
pre-cluster-threshold=0.4
nms-iou-threshold=.4
[class-attrs-3]
pre-cluster-threshold=0.5
nms-iou-threshold=.5

Then in my config_nms I would have to do something like:

[property]
iou-threshold=min(nms-iou-threshold)
score-threshold=min(pre-cluster-threshold)
topk=300

Correct?

@marcoslucianops
Copy link
Owner

@youngjae-avikus, There's the same function for TensorRT 7 (createBatchedNMSPlugin()) but it's easy to use from the plugins too.

@nemosupremo, the nms-iou-threshold only works with cluster-mode=2, which is disabled ( cluster-mode=4) due to GPU BatchedNMS. You should use only the pre-cluster-threshold key.

@nemosupremo
Copy link

@marcoslucianops

So with this setup, my iou-threshold is identical for every class; but my class confidence can vary as long as it is greater than the score-threshold in config_nms. Ok.

@marcoslucianops
Copy link
Owner

@nemosupremo, yes

@YoungjaeDev
Copy link
Contributor Author

I want to activate the class agnostic nms option, can I control it from the tensorrt nms plug-in to the coded?

@marcoslucianops

@marcoslucianops
Copy link
Owner

@youngjae-avikus, I'm not familiar with agnostic nms, but I think you need to change the yoloLayer outputs to fit the batchedNMSPlugin input with shareLocation = false and the output shape. You probably need to change the logic to add all classes to the output bbox instead of the maxProb class.

@YoungjaeDev
Copy link
Contributor Author

@marcoslucianops

Thank you. I'll try it over time
Please don't close the issue for a while

@adimukewar
Copy link

Is topk filter for nms applied before or after the NMS GPU implementation?
When I increase the topk, higher confidence bounding boxes appear. Also, total number of objects detected is same in both scenarios.

@marcoslucianops
Copy link
Owner

@adimukewar, the topK is applied to limit the outputs before the NMS (yoloLayer) and during the NMS (GPU Batched NMS).

@marcoslucianops
Copy link
Owner

New optimized NMS #142

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants