Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add gfl_trt #124

Merged
merged 12 commits into from
Feb 28, 2022
Merged

add gfl_trt #124

merged 12 commits into from
Feb 28, 2022

Conversation

Richard-mei
Copy link
Contributor

add GFocalHead support in mmdeploy.
rewrite GFLHead.get_boxesGFLHead.forward_single,and replacing F.linear with F.conv2d in Integral module.

@RunningLeon RunningLeon requested a review from VVsssssk February 7, 2022 03:05
@lvhan028 lvhan028 requested a review from grimoire February 7, 2022 06:43
@RunningLeon RunningLeon requested a review from VVsssssk February 9, 2022 02:33
@grimoire
Copy link
Member

grimoire commented Feb 9, 2022

@grimoire
Copy link
Member

grimoire commented Feb 9, 2022

Please fix the lint. You can run pre-commit run --all-file to do all the check.

@VVsssssk please provide some help about unit test.

@Richard-mei
Copy link
Contributor Author

Please fix the lint. You can run pre-commit run --all-file to do all the check.

seed isort known_third_party.............................................Passed
isort....................................................................Passed
yapf.....................................................................Passed
Trim Trailing Whitespace.................................................Passed
Check Yaml...............................................................Passed
Fix End of Files.........................................................Passed
Fix requirements.txt.....................................................Passed
Fix double quoted strings................................................Passed
Check for merge conflicts................................................Passed
Fix python encoding pragma...............................................Passed
Mixed line ending........................................................Passed
Markdownlint.............................................................Passed
codespell................................................................Passed
docformatter.............................................................Passed  

@@ -0,0 +1,203 @@
import torch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could add copyright at top line

# Copyright (c) OpenMMLab. All rights reserved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, do I need to resubmit to add copyright:# Copyright (c) OpenMMLab. All rights reserved.?
Also, I have a little doubt about the speed test. I have performed the speed test of the model under tensorrt on cuda and cpu respectively. Why is the FPS based on cpu very high, much higher than the speed based on cuda? Only the --device cpu/cuda parameter was changed during testing. My laptop cpu is r9-5900hs and gpu is 3060.But I found that nvidia-smi shows gpu usage in both cases.
cpu: python tools/test.py configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py ../MMDetection/configs/gfl/gfl_r50_fpn_1x_coco.py --model test_work/test_GFL/end2end.engine --speed-test --device cpu
Screenshot_select-area_20220210191630
cuda: python tools/test.py configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py ../MMDetection/configs/gfl/gfl_r50_fpn_1x_coco.py --model test_work/test_GFL/end2end.engine --speed-test --device cuda
Screenshot_select-area_20220210191906

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, do I need to resubmit to add copyright:# Copyright (c) OpenMMLab. All rights reserved.? Also, I have a little doubt about the speed test. I have performed the speed test of the model under tensorrt on cuda and cpu respectively. Why is the FPS based on cpu very high, much higher than the speed based on cuda? Only the --device cpu/cuda parameter was changed during testing. My laptop cpu is r9-5900hs and gpu is 3060.But I found that nvidia-smi shows gpu usage in both cases. cpu: python tools/test.py configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py ../MMDetection/configs/gfl/gfl_r50_fpn_1x_coco.py --model test_work/test_GFL/end2end.engine --speed-test --device cpu Screenshot_select-area_20220210191630 cuda: python tools/test.py configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py ../MMDetection/configs/gfl/gfl_r50_fpn_1x_coco.py --model test_work/test_GFL/end2end.engine --speed-test --device cuda Screenshot_select-area_20220210191906

@Richard-mei Please add copy right in your commit. BTW, TensorRT is for cuda. Passing cpu to --device should raise error. @VVsssssk Could check if there is a bug here?

@codecov
Copy link

codecov bot commented Feb 10, 2022

Codecov Report

Merging #124 (6bdbbc0) into master (230596b) will increase coverage by 0.72%.
The diff coverage is 63.91%.

❗ Current head 6bdbbc0 differs from pull request most recent head 6ac7909. Consider uploading reports for the commit 6ac7909 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #124      +/-   ##
==========================================
+ Coverage   66.21%   66.93%   +0.72%     
==========================================
  Files         175      190      +15     
  Lines        5958     6273     +315     
  Branches      936      976      +40     
==========================================
+ Hits         3945     4199     +254     
- Misses       1730     1767      +37     
- Partials      283      307      +24     
Flag Coverage Δ
unittests 66.93% <63.91%> (+0.72%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ploy/codebase/mmdet/models/dense_heads/gfl_head.py 63.54% <63.54%> (ø)
...ploy/codebase/mmdet/models/dense_heads/__init__.py 100.00% <100.00%> (ø)
mmdeploy/backend/sdk/__init__.py 58.82% <0.00%> (-7.85%) ⬇️
mmdeploy/codebase/mmocr/deploy/mmocr.py 92.68% <0.00%> (-4.62%) ⬇️
mmdeploy/backend/tensorrt/init_plugins.py 31.25% <0.00%> (-0.33%) ⬇️
mmdeploy/codebase/mmedit/deploy/mmediting.py 97.14% <0.00%> (-0.08%) ⬇️
mmdeploy/pytorch/ops/__init__.py 100.00% <0.00%> (ø)
mmdeploy/backend/onnxruntime/wrapper.py 89.58% <0.00%> (ø)
mmdeploy/codebase/mmdet/models/necks.py 50.00% <0.00%> (ø)
mmdeploy/codebase/mmdet/models/backbones.py 15.78% <0.00%> (ø)
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 230596b...6ac7909. Read the comment docs.

for reg_conv in self.reg_convs:
reg_feat = reg_conv(reg_feat)
cls_score = self.gfl_cls(cls_feat)
bbox_pred = scale(self.gfl_reg(reg_feat)).float().permute(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could permute in gfl_head__get_bbox before batched_integral. So we can remove gfl_head__forward_single.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I understand what you said and did a comparison test, modify it like this: bbox_pred = batched_integral(self.integral, bbox_pred.permute(0, 2, 3, 1)) * stride[0] in gfl_head__get_bbox, but I'm not sure if this change will have any effect on the inference speed, as I found the laptop test results to be inconsistent.Maybe it doesn't matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope rewrite does not change the behavior of forward. We might reuse any module in mmdetection(or other repo), if we change the behavior of forward, other head which share the same forward(if it did exist) might get unexpect result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see what you mean, thank you.

@twmht
Copy link

twmht commented Feb 21, 2022

@Richard-mei

Have you ever tried TensorRT7 with gfl and found the accuracy drop?

@Richard-mei
Copy link
Contributor Author

Richard-mei commented Feb 21, 2022

@twmht
Sorry, I just checked and found that I am using tensorrt7. But I didn't pay attention to the precision.

@grimoire
Copy link
Member

@twmht @Richard-mei The rounding mode of IResizeLayer in TensorRT7 might give unexpect result on some shape, which might affect the accuracy. here is an example:

import tensorrt as trt
import torch
import numpy as np


def main():

    input_size = [1, 1, 1, 33]

    print("create trt model")
    log_level = trt.Logger.ERROR
    logger = trt.Logger(log_level)
    builder = trt.Builder(logger)

    # build network
    EXPLICIT_BATCH = 1 << (int)(
        trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    network = builder.create_network(EXPLICIT_BATCH)
    input_name = 'input'
    output_name = 'output'
    input_trt = network.add_input(name=input_name,
                                  shape=input_size,
                                  dtype=trt.float32)

    layer = network.add_resize(input_trt)
    layer.shape = tuple(input_size[:3] + [input_size[3] * 2])
    layer.resize_mode = trt.ResizeMode.NEAREST

    output = layer.get_output(0)
    output.name = output_name
    network.mark_output(output)

    # builder config
    max_workspace_size = 1 << 30
    fp16_mode = False

    builder.max_workspace_size = max_workspace_size
    builder.fp16_mode = fp16_mode

    config = builder.create_builder_config()
    config.max_workspace_size = max_workspace_size
    profile = builder.create_optimization_profile()

    # set shape
    input_shape = input_size
    profile.set_shape(input_name, input_shape, input_shape, input_shape)
    config.add_optimization_profile(profile)
    if fp16_mode:
        config.set_flag(trt.BuilderFlag.FP16)

    # build engine
    engine = builder.build_engine(network, config)
    context = engine.create_execution_context()

    print("inference")
    input_torch = torch.zeros(input_size, dtype=torch.float32).cuda().contiguous()
    input_torch[:,:,:,::2] = 1

    bindings = [None] * 2

    # set input
    idx = engine.get_binding_index(input_name)
    context.set_binding_shape(idx, tuple(input_torch.shape))
    bindings[idx] = input_torch.data_ptr()

    # set output
    idx = engine.get_binding_index(output_name)
    shape = tuple(context.get_binding_shape(idx))
    output_torch = torch.empty(shape, dtype=torch.float32).cuda()
    bindings[idx] = output_torch.data_ptr()

    context.execute_async_v2(bindings, torch.cuda.current_stream().cuda_stream)

    print("input:")
    print(input_torch.view(-1)[:20])

    print("output:")
    print(output_torch.view(-1)[:20])


if __name__ == "__main__":
    main()

TensorRT8 has update the layer, add ResizeRoundMode to set the rounding mode.

@RunningLeon RunningLeon requested review from RunningLeon and removed request for RunningLeon February 22, 2022 02:24
@@ -2,6 +2,7 @@
from .base_dense_head import (base_dense_head__get_bbox,
base_dense_head__get_bboxes__ncnn)
from .fovea_head import fovea_head__get_bboxes
from .gfl_head import gfl_head__forward_single, gfl_head__get_bbox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gfl_head__forward_single should be removed.

remove '**_forward_single'
@grimoire grimoire requested a review from RunningLeon February 28, 2022 02:19
Copy link
Collaborator

@RunningLeon RunningLeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@VVsssssk VVsssssk merged commit ba5351e into open-mmlab:master Feb 28, 2022
@OpenMMLab-Assistant003
Copy link

Hi @Richard-mei!First of all, we want to express our gratitude for your significant PR in the MMDeploy project. Your contribution is highly appreciated, and we are grateful for your efforts in helping improve this open-source project during your personal time. We believe that many developers will benefit from your PR.

We would also like to invite you to join our Special Interest Group (SIG) private channel on Discord, where you can share your experiences, ideas, and build connections with like-minded peers. To join the SIG channel, simply message moderator— OpenMMLab on Discord or briefly share your open-source contributions in the #introductions channel and we will assist you. Look forward to seeing you there! Join us :https://discord.gg/UjgXkPWNqA

If you have WeChat account,welcome to join our community on WeChat. You can add our assistant :openmmlabwx. Please add "mmsig + Github ID" as a remark when adding friends:)
Thank you again for your contribution❤ @Richard-mei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants