-
Notifications
You must be signed in to change notification settings - Fork 643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add gfl_trt #124
add gfl_trt #124
Conversation
It would be a good idea to add a unit test in https://github.com/open-mmlab/mmdeploy/blob/master/tests/test_codebase/test_mmdet/test_mmdet_models.py . |
Please fix the lint. You can run @VVsssssk please provide some help about unit test. |
|
@@ -0,0 +1,203 @@ | |||
import torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could add copyright at top line
# Copyright (c) OpenMMLab. All rights reserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, do I need to resubmit to add copyright:# Copyright (c) OpenMMLab. All rights reserved.
?
Also, I have a little doubt about the speed test. I have performed the speed test of the model under tensorrt on cuda and cpu respectively. Why is the FPS based on cpu very high, much higher than the speed based on cuda? Only the --device cpu/cuda
parameter was changed during testing. My laptop cpu is r9-5900hs and gpu is 3060.But I found that nvidia-smi
shows gpu usage in both cases.
cpu: python tools/test.py configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py ../MMDetection/configs/gfl/gfl_r50_fpn_1x_coco.py --model test_work/test_GFL/end2end.engine --speed-test --device cpu
cuda: python tools/test.py configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py ../MMDetection/configs/gfl/gfl_r50_fpn_1x_coco.py --model test_work/test_GFL/end2end.engine --speed-test --device cuda
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, do I need to resubmit to add copyright:
# Copyright (c) OpenMMLab. All rights reserved.
? Also, I have a little doubt about the speed test. I have performed the speed test of the model under tensorrt on cuda and cpu respectively. Why is the FPS based on cpu very high, much higher than the speed based on cuda? Only the--device cpu/cuda
parameter was changed during testing. My laptop cpu is r9-5900hs and gpu is 3060.But I found thatnvidia-smi
shows gpu usage in both cases. cpu:python tools/test.py configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py ../MMDetection/configs/gfl/gfl_r50_fpn_1x_coco.py --model test_work/test_GFL/end2end.engine --speed-test --device cpu
cuda:python tools/test.py configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py ../MMDetection/configs/gfl/gfl_r50_fpn_1x_coco.py --model test_work/test_GFL/end2end.engine --speed-test --device cuda
@Richard-mei Please add copy right in your commit. BTW, TensorRT is for cuda. Passing cpu
to --device
should raise error. @VVsssssk Could check if there is a bug here?
Codecov Report
@@ Coverage Diff @@
## master #124 +/- ##
==========================================
+ Coverage 66.21% 66.93% +0.72%
==========================================
Files 175 190 +15
Lines 5958 6273 +315
Branches 936 976 +40
==========================================
+ Hits 3945 4199 +254
- Misses 1730 1767 +37
- Partials 283 307 +24
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
for reg_conv in self.reg_convs: | ||
reg_feat = reg_conv(reg_feat) | ||
cls_score = self.gfl_cls(cls_feat) | ||
bbox_pred = scale(self.gfl_reg(reg_feat)).float().permute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could permute in gfl_head__get_bbox
before batched_integral
. So we can remove gfl_head__forward_single
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I understand what you said and did a comparison test, modify it like this: bbox_pred = batched_integral(self.integral, bbox_pred.permute(0, 2, 3, 1)) * stride[0]
in gfl_head__get_bbox
, but I'm not sure if this change will have any effect on the inference speed, as I found the laptop test results to be inconsistent.Maybe it doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope rewrite does not change the behavior of forward. We might reuse any module in mmdetection(or other repo), if we change the behavior of forward, other head which share the same forward(if it did exist) might get unexpect result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see what you mean, thank you.
Have you ever tried TensorRT7 with gfl and found the accuracy drop? |
@twmht |
@twmht @Richard-mei The rounding mode of IResizeLayer in TensorRT7 might give unexpect result on some shape, which might affect the accuracy. here is an example: import tensorrt as trt
import torch
import numpy as np
def main():
input_size = [1, 1, 1, 33]
print("create trt model")
log_level = trt.Logger.ERROR
logger = trt.Logger(log_level)
builder = trt.Builder(logger)
# build network
EXPLICIT_BATCH = 1 << (int)(
trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
network = builder.create_network(EXPLICIT_BATCH)
input_name = 'input'
output_name = 'output'
input_trt = network.add_input(name=input_name,
shape=input_size,
dtype=trt.float32)
layer = network.add_resize(input_trt)
layer.shape = tuple(input_size[:3] + [input_size[3] * 2])
layer.resize_mode = trt.ResizeMode.NEAREST
output = layer.get_output(0)
output.name = output_name
network.mark_output(output)
# builder config
max_workspace_size = 1 << 30
fp16_mode = False
builder.max_workspace_size = max_workspace_size
builder.fp16_mode = fp16_mode
config = builder.create_builder_config()
config.max_workspace_size = max_workspace_size
profile = builder.create_optimization_profile()
# set shape
input_shape = input_size
profile.set_shape(input_name, input_shape, input_shape, input_shape)
config.add_optimization_profile(profile)
if fp16_mode:
config.set_flag(trt.BuilderFlag.FP16)
# build engine
engine = builder.build_engine(network, config)
context = engine.create_execution_context()
print("inference")
input_torch = torch.zeros(input_size, dtype=torch.float32).cuda().contiguous()
input_torch[:,:,:,::2] = 1
bindings = [None] * 2
# set input
idx = engine.get_binding_index(input_name)
context.set_binding_shape(idx, tuple(input_torch.shape))
bindings[idx] = input_torch.data_ptr()
# set output
idx = engine.get_binding_index(output_name)
shape = tuple(context.get_binding_shape(idx))
output_torch = torch.empty(shape, dtype=torch.float32).cuda()
bindings[idx] = output_torch.data_ptr()
context.execute_async_v2(bindings, torch.cuda.current_stream().cuda_stream)
print("input:")
print(input_torch.view(-1)[:20])
print("output:")
print(output_torch.view(-1)[:20])
if __name__ == "__main__":
main() TensorRT8 has update the layer, add |
@@ -2,6 +2,7 @@ | |||
from .base_dense_head import (base_dense_head__get_bbox, | |||
base_dense_head__get_bboxes__ncnn) | |||
from .fovea_head import fovea_head__get_bboxes | |||
from .gfl_head import gfl_head__forward_single, gfl_head__get_bbox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gfl_head__forward_single
should be removed.
remove '**_forward_single'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hi @Richard-mei!First of all, we want to express our gratitude for your significant PR in the MMDeploy project. Your contribution is highly appreciated, and we are grateful for your efforts in helping improve this open-source project during your personal time. We believe that many developers will benefit from your PR. We would also like to invite you to join our Special Interest Group (SIG) private channel on Discord, where you can share your experiences, ideas, and build connections with like-minded peers. To join the SIG channel, simply message moderator— OpenMMLab on Discord or briefly share your open-source contributions in the #introductions channel and we will assist you. Look forward to seeing you there! Join us :https://discord.gg/UjgXkPWNqA If you have WeChat account,welcome to join our community on WeChat. You can add our assistant :openmmlabwx. Please add "mmsig + Github ID" as a remark when adding friends:) |
add
GFocalHead
support in mmdeploy.rewrite
GFLHead.get_boxes
,GFLHead.forward_single
,and replacingF.linear
withF.conv2d
inIntegral
module.