Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【开源任务】PIR-TensorRT converter推全升级 #69178

Open
lizexu123 opened this issue Nov 5, 2024 · 69 comments
Open

【开源任务】PIR-TensorRT converter推全升级 #69178

lizexu123 opened this issue Nov 5, 2024 · 69 comments
Assignees
Labels

Comments

@lizexu123
Copy link
Contributor

lizexu123 commented Nov 5, 2024

一、背景和任务列表

TensorRT(以下简称TRT)具有良好的性能和广泛的模型兼容性,Paddle-TensorRT是基于pir开发的将paddle op下沉到TensorRT op的机制,该机制将原始的pir算子经过Marker Pass和Partition Pass识别出可以进入TRT的子图,经过TensorRT Converter将标记的子图转换为TensorRTEngine Op,最终大部分的执行逻辑在TensorRT中执行。
Paddle3.0后全面切换PIR-TensorRT,届时所有模型将在PIR模式下运行,目前还有62个pir算子未进入TensorRT,为了充分利用TensorRT的良好性能,需要开发剩余的converter,将pir算子映射到TensorRT支持的算子上。
注意:PIR-TensorRT只适用于Linux平台,并且默认支持TensorRT版本大于8.0

⭐️ 提交PR 模版 ⭐️:

  • // ------- PR 标题 --------
[Paddle TensorRT No.1] Add pd_op.topk converter
  • // ------- PR 内容 --------
PR types
Others

PR changes
New features

Description
新增了pd_op.topk Marker和Converter

本期需要完成的converter如下,整体进展:

序号 算子 队伍名称/状态/PR 优先级/难度
1 pd_op.softplus @lizexu123 #69315
p1 ⭐
2 pd_op.elu @enkilee #69278
p0 ⭐
3 pd_op.selu @lichen580
@fangfangssj #69663 #70429
p0 ⭐
4 pd_op.stanh @PolaKuma #69539
p0 ⭐
5 pd_op.thresholded_relu @PolaKuma #69563
p0 ⭐
6 pd_op.affine_channel pd_op.affine_channel_ @xz-alex
@lizexu123 #70507
p0 ⭐⭐
7 pd_op.numel @ooooo-create #69259
@lizexu123
p0 ⭐
8 pd_op.anchor_generator @PolaKuma #69905
p0 ⭐⭐
9 pd_op.argmin @ooooo-create #69261
p0 ⭐⭐
10 pd_op.argsort @ooooo-create #69261
p0 ⭐⭐
11 pd_op.bitwise_and @DrRyanHuang
@PolaKuma #69599
p0 ⭐
12 pd_op.bitwise_not @DrRyanHuang
@PolaKuma #69599
p0 ⭐
13 pd_op.c_allreduce_sum 不需要实现 ⭐
14 pd_op.c_allreduce_sum 不需要实现 ⭐
15 pd_op.c_allreduce_sum 不需要实现 ⭐
16 pd_op.c_allreduce_sum 不需要实现 ⭐
17 pd_op.celu @ooooo-create #69359
p0 ⭐
18 pd_op.conv3d @ZHOU05030 #69757
p0⭐⭐ ⭐
19 pd_op.cumsum pd_op.cumsum_ @Hanyonggong #69330
p0 ⭐⭐ ⭐⭐
20 pd_op.deformable_conv @yangrongxinuser
p0 ⭐⭐
21 pd_op.dequantize_linear pd_op.dequantize_linear_ @fangfangssj #70345
p0 ⭐⭐
22 pd_op.einsum @ZHOU05030
@xz-alex
p1 ⭐
23 pd_op.elementwise_pow @KDZZZZZZ
@PolaKuma #69580
p0 ⭐
24 pd_op.minimum @winffke
@PolaKuma #69736
p0 ⭐
25 pd_op.maximum @yangrongxinuser #69835
@winffke
p0 ⭐
26 pd_op.logical_or pd_op.logical_or_ @mori0umi #69706
p1 ⭐
27 pd_op.logical_or @Sylence8 #69788
p1 ⭐⭐ ⭐
28 pd_op.logical_xor @kineast #69958 #70170
p1 ⭐⭐ ⭐
29 pd_op.logical_and @mori0umi #69706
p1 ⭐⭐ ⭐
30 pd_op.less_equal pd_op.less_equal_ @PolaKuma #69751
p1 ⭐⭐ ⭐
31 pd_op.greater_equal pd_op.greater_equal_ @wwwuyan #69770
p1 ⭐⭐ ⭐
32 pd_op.fused_embedding_eltwise_layernorm @PolaKuma
p1 ⭐⭐ ⭐
33 pd_op.equal_ pd_op.equal @mori0umi #69594
p1 ⭐⭐
34 pd_op.not_equal pd_op.not_equal_ @mori0umi #69594
p1 ⭐⭐
35 pd_op.full_batch_size_like @SCUcookie #70012
p1 ⭐⭐
36 pd_op.flipe @PolaKuma #69724
p1 ⭐⭐
37 pd_op.fused_token_prune @zeroader
p1 ⭐⭐ ⭐
38 pd_op.gather @ZHOU05030 #69714
p1 ⭐
39 pd_op.group_norm @wwwuyan
p1 ⭐⭐
40 pd_op.index_select @wwwuyan #69762
p1 ⭐
41 pd_op.instance_norm @ooooo-create
@YuanRisheng
p1 ⭐
42 pd_op.isnan @ooooo-create #69435
p1 ⭐
43 pd_op.leaky_relu pd_op.leaky_relu_ @ooooo-create #69379
p1 ⭐⭐ ⭐
44 pd_op.logsigmoid @ooooo-create #69379
p1 ⭐
45 pd_op.embedding @ooooo-create #69435
p1 ⭐
46 pd_op.multihead_matmul @ooooo-create
p1 ⭐⭐ ⭐⭐
47 pd_op.p_norm @SCUcookie #69929
p1 ⭐
48 pd_op.pad @PolaKuma #69673
p1 ⭐
49 pd_op.pad3d @ooooo-create
@PolaKuma #70556
p1 ⭐⭐ ⭐
50 pd_op.prelu @ooooo-create #69394
p1 ⭐⭐
51 pd_op.pool3d @fangfangssj #69663
p1 ⭐⭐ ⭐
52 pd_op.rnn @ZHOU05030 #69900
p1 ⭐⭐ ⭐⭐⭐
53 pd_op.roi_align @wwwuyan #69866
p1 ⭐⭐⭐ ⭐⭐⭐
54 pd_op.shuffle_channell @kineast #69718
@winffke
p1 ⭐
55 pd_op.take_along_axisl @kineast #69871
p1 ⭐
56 pd_op.tanh_shrinkl @jincheng23
@PolaKuma #69693
p1 ⭐
57 pd_op.temporal_shiftl @PolaKuma #69848
p1 ⭐⭐ ⭐⭐
58 pd_op.unbind @mori0umi #70066
p1 ⭐⭐
59 pd_op.yolo_box_head @Sylence8 #69773 #69788
p1
60 pd_op.yolo_box @kineast
p1 ⭐⭐⭐⭐ ⭐⭐
61 pd_op.conv3d_transpose @mori0umi
@ZHOU05030
p1 ⭐⭐ ⭐
62 pd_op.index_put @lizexu123
@winffke
@fangfangssj #69889
p1 ⭐⭐ ⭐
63 pd_op.mish @mori0umi #69705
p1 ⭐
64 pd_op.pow @inaomIIsfarell
@winffke
@fangfangssj #69889
p1 ⭐
65 pd_op.bitwise_or @mori0umi
@PolaKuma #69599
@yangrongxinuser
p1 ⭐
66 pd_op.logical_not
@lizexu123 #70267
p1 ⭐⭐⭐⭐
67 pd_op.relu6
@fangfangssj #70433 #70429
p1 ⭐
68 pd_op.gelu
@fangfangssj
@lizexu123 #70168
p1 ⭐⭐⭐
69 pd_op.exp
@nizne9
@fangfangssj #70535
p1 ⭐
70 pd_op.abs pd_op.abs_
@nizne9
@fangfangssj #70535
p1 ⭐
71 pd_op.sin
@nizne9
@fangfangssj #70535
p1 ⭐
72 pd_op.cos
@nizne9
@fangfangssj #70535
p1 ⭐
73 pd_op.fused_bias_dropout_residual_layer_norm
@PolaKuma
p1 ⭐⭐⭐⭐
74 pd_op.sinh
@fangfangssj #70535
p1 ⭐
75 pd_op.cosh
@fangfangssj #70535
p1 ⭐
76 pd_op.asinh
@fangfangssj #70535
p1 ⭐
77 pd_op.acosh
@fangfangssj #70535
p1 ⭐
78 pd_op.atanh
@fangfangssj #70535
p1 ⭐
79 pd_op.ceil
@fangfangssj #70535
p1 ⭐
80 pd_op.rsqrt
@fangfangssj #70535
p1 ⭐
81 pd_op.reciprocal
@fangfangssj #70535
p1 ⭐
82 pd_op.erf
@fangfangssj #70535
p1 ⭐
83 pd_op.erf
@fangfangssj #70535
p1 ⭐
84 pd_op.sign
@fangfangssj #70535
p1 ⭐
85 pd_op.round
@fangfangssj #70535
p1 ⭐

二、任务详情

2.1 Marker Pass

作用:用来标记pir Op是否可以进入TensorRT子图。

2.1.1 没有限制条件进入TensorRT的算子

paddle/fluid/pir/transforms/tensorrt/trt_op_marker_pass.cc 用于判断 op 是否可翻译为 TensorRT,pir下的算子定义位于/build/paddle/fluid/pir/dialect/operator/ir/pd_op.h下,具体的实现在pd_op.xx.cc中可找到,还有部分特殊算子在/paddle/fluid/pir/dialect/operator/ir/manual_op.h中,对于没有限制条件进入TensorRT的算子,可参考 pr#68395。需要使用 DEFINE_GENERAL_PATTERN 宏定义 op 匹配规则,并用 ADD_PATTERN 将模式类实例注册到模式集合 ps 中。PIR 中算子定义位于 build/paddle/fluid/pir/dialect/operator/ir/pd_op.h,且 PIR-TensorRT 仅支持 dynamic_shape。

DEFINE_GENERAL_PATTERN(Shape, paddle::dialect::ShapeOp)
ADD_PATTERN(Shape)

2.1.2 有限制条件进入TensorRT的算子

可参考旧ir-TensorRT下的/paddle/fluid/inference/tensorrt/op_teller.cc,此文件判断算子能否进入TensorRT。以pd_op.cast算子为例子。

class CastOpPattern : public pir::OpRewritePattern<paddle::dialect::CastOp> {
 public:
  using pir::OpRewritePattern<paddle::dialect::CastOp>::OpRewritePattern;
  bool MatchAndRewrite(paddle::dialect::CastOp op,
                       pir::PatternRewriter &rewriter) const override {
    if (op->HasAttribute(kCanRunTrtAttr) &&
        op->attribute<pir::BoolAttribute>(kCanRunTrtAttr).data()) {
      return false;
    }
    if (!op->HasAttribute("dtype")) {
      VLOG(3) << "the cast op does not have attr dtype ";
      return false;
    }
    auto dtype =
        op->attribute<paddle::dialect::DataTypeAttribute>("dtype").data();
    if (dtype == phi::DataType::BOOL) {
#if IS_TRT_VERSION_LT(8400)
      VLOG(3)
          << "the cast op supports inputs and outputs of BOOL by trt8.4 above ";
      return false;
#endif
    }
    op->set_attribute(kCanRunTrtAttr, rewriter.bool_attr(true));
    return true;
  }
};

ps.Add(std::make_unique<CastOpPattern>(context));

2.2 Converter

作用:将划分出来的能够进入TensorRT的算子映射为TensorRT子图,删除子图并替换成带有TensorRT引擎的TensorRTEngineOp。
此处以pd_op.shape为例说明converter书写流程。
1.书写trt_op_marekr_pass.cc,先确保pd_op.shape能够进入TensorRT

DEFINE_GENERAL_PATTERN(Shape, paddle::dialect::ShapeOp)
ADD_PATTERN(Shape)

2.在Paddle官网搜索shape,点击源代码,shape算子的定义在python/paddle/tensor/attribute.py,pd_op.shape放在相应的python/paddle/tensorrt/impls/attribute中。
image

3.可参考旧 IR 下的 converter(路径:/paddle/fluid/inference/tensorrt/convert/shape_op.cc)。旧 IR 的算子 converter 位于 /paddle/fluid/inference/tensorrt/convert。

@converter_registry.register("pd_op.shape", trt_version="8.x")
def shape_converter(network, paddle_op, inputs):
    input_tensor = inputs[0]
    shape_layer = network.add_shape(input_tensor)
    return shape_layer.get_output(0)

4.还有部分算子的输入存在pir::value的情况,这类算子在书写converter时也需要关注到,以pd_op.roll为例说明pr#69117

import tensorrt as trt
@converter_registry.register("pd_op.roll", trt_version="8.x")
def roll_converter(network, paddle_op, inputs):
    input_tensor = inputs[0]
    axis = paddle_op.attrs()["axis"]

    shifts_op = paddle_op.operands()[1].source().get_defining_op()
    if shifts_op.name() == "pd_op.full_int_array":
        shifts = shifts_op.attrs()["value"]
    else:
        shifts = inputs[1]

    axis_size = len(axis)
    input_shape_tensor = trt_shape(network, input_tensor)

    for i in range(axis_size):
        axi = axis[i]
        if isinstance(shifts, trt.ITensor):
            shift = get_shape_tensor_element(network, shifts, i)
            input_shift = shift
        else:
            shift = shifts[i]
            input_shift = add_1D_constant_layer(network, shift)
        input_axis = get_shape_tensor_element(network, input_shape_tensor, axi)

        # 1.sub_value mod input_axis
        input1 = trt_sub(network, input_axis, input_shift)
        tmp_div_res = trt_floor_div(network, input1, input_axis)
        tmp_prod_res = trt_mul(network, tmp_div_res, input_axis)
        start = trt_sub(network, input1, tmp_prod_res)
        # 2.avoid start less than 0,start mod input_axis
        start = trt_sum(network, start, input_axis)
        tmp_div_res1 = trt_floor_div(network, start, input_axis)
        tmp_prod_res1 = trt_mul(network, tmp_div_res1, input_axis)
        start = trt_sub(network, start, tmp_prod_res1)
        zero_tensor = add_1D_constant_layer(network, 0)
        step = add_1D_constant_layer(network, 1)
        # 3.make index_tensor0
        sub_qutient = trt_sub(network, input_axis, start)
        quotient_tensor = trt_floor_div(network, sub_qutient, step)
        start1 = get_shape_tensor_element(network, start, 0, is_scalar=True)
        fill_layer0 = network.add_fill(shape=(), op=trt.FillOperation.LINSPACE)
        fill_layer0.set_input(0, quotient_tensor)
        fill_layer0.set_input(1, start1)
        fill_layer0.set_input(2, step)
        index_tensor0 = fill_layer0.get_output(0)
        # 4.make index_tensor1
        sub_qutient_tensor = trt_sub(network, start, zero_tensor)
        quotient_tensor = trt_floor_div(network, sub_qutient_tensor, step)
        start2 = add_1D_constant_layer(network, 0, is_scalar=True)
        fill_layer1 = network.add_fill(shape=(), op=trt.FillOperation.LINSPACE)
        fill_layer1.set_input(0, quotient_tensor)
        fill_layer1.set_input(1, start2)
        fill_layer1.set_input(2, step)
        index_tensor1 = fill_layer1.get_output(0)
        itensors = [index_tensor0, index_tensor1]
        concat_input_tensor = trt_concat(network, itensors)
        if i == 0:
            layer = network.add_gather(
                input=input_tensor, indices=concat_input_tensor, axis=axi
            )
        else:
            layer = network.add_gather(
                input=layer.get_output(0), indices=concat_input_tensor, axis=axi
            )

    return layer.get_output(0)

2.3 单测

pd_op.shape的单测需放在 test_converter_attribute.py 中。对于输入需设置 min_shape 和 max_shape,而对于 shape_tensor 的特殊情况,只需写在 feed_list 中,无需指定 min_shape 和 max_shape。
case1: 测试float32下converter的输出和pir原生推理的结果误差是否满足要求。注意,为了防止ci单测覆盖率不够,我们需要加一个测量Marker的单测

class TestMulticlassNMS3Marker(TensorRTBaseTest):
    def setUp(self):
        self.python_api = multiclass_nms3
        self.api_args = {
            "bboxes": np.random.randn(2, 5, 4, 1).astype(np.float32),
            "scores": np.random.randn(2, 4, 5, 1).astype(np.float32),
        }
        self.program_config = {"feed_list": ["bboxes", "scores"]}
        self.target_marker_op = "pd_op.multiclass_nms3"

    def test_trt_result(self):
        self.check_marker(expected_result=False)
class TestShapeTRTPattern(TensorRTBaseTest):
    def setUp(self):
        self.python_api = paddle.shape
        self.api_args = {
            "x": np.random.randn(2, 3).astype("float32"),
        }
        self.program_config = {"feed_list": ["x"]}
        self.min_shape = {"x": [1, 3]}
        self.max_shape = {"x": [5, 3]}

    def test_trt_result(self):
        self.check_trt_result()

case2: 测试int64下converter的输出和pir原生推理的结果误差是否满足要求。

class TestShapeTRTCase1Pattern(TensorRTBaseTest):
    def setUp(self):
        self.python_api = paddle.shape
        self.api_args = {
            "x": np.random.randn(2, 3).astype("int64"),
        }
        self.program_config = {"feed_list": ["x"]}
        self.min_shape = {"x": [1, 3]}
        self.max_shape = {"x": [5, 3]}

    def test_trt_result(self):
        self.check_trt_result()

case3:对于pd_op.roll的shift输入为pir::value的情况(单测在test_converter_manipulation.py)。

lass TestRollCase3TRTPattern(TensorRTBaseTest):
    def setUp(self):
        self.python_api = paddle.roll
        self.api_args = {
            "x": np.random.random([3, 4, 10]).astype("float32"),
            "shift": np.array([1]).astype("int64"),
            "axis": 1,
        }
        self.program_config = {"feed_list": ["x", "shift"]}
        self.min_shape = {"x": [1, 4, 10]}
        self.max_shape = {"x": [5, 4, 10]}

    def test_trt_result(self):
        self.check_trt_result()

四、代码review合入

如果有问题以及代码的review,可以找github账号:YuanRisheng ,lizexu123,Hanyonggong,anderson101866提问

五、可参考PR

pr#68686 (Marker有限制条件进入TensorRT)
pr#69117 (pd_op.roll的输入有pir::value情况)
pr#68546(Marker无限制条件进入TensorRT)
注意,Paddle编译PIR-TensorRT时,需打开WITH_PYTHON=ON,WITH_GPU=ON,WITH_TENSORRT=ON,-WITH_PIP_TENSORRT=ON,也可以指定-DTENSORRT_ROOT=/usr/local/TensorRT-8.6.1.6(根据自己环境) ,不需要单独安装python版本的tensorrt,下载paddlepaddle_gpu-0.0.0-cp39-cp39-linux_x86_64.whl包时,会自动根据环境的tensorrt版本安装对应python版本的tensorrt

看板信息

任务方向 任务数量 提交作品 / 任务认领 提交率 完成 完成率
PIR-TensorRT converter推全升级 85 71 / 81 83.53% 21 24.71%

统计信息

排名不分先后 @lizexu123 (3) @enkilee (1) @PolaKuma (8) @ooooo-create (3) @yangrongxinuser (1) @mori0umi (3) @kineast (1) @wwwuyan (1)

@lizexu123 lizexu123 added status/new-issue 新建 type/others 其他问题 labels Nov 5, 2024
@lizexu123 lizexu123 changed the title 【开源任务】新IR-TensorRT converter推全升级 【开源任务】PIR-TensorRT converter推全升级 Nov 6, 2024
@luotao1 luotao1 moved this to In Progress in Call for Contributions Nov 6, 2024
@enkilee
Copy link
Contributor

enkilee commented Nov 7, 2024

【报名】:2

@ooooo-create
Copy link
Contributor

【报名】:7

@ooooo-create
Copy link
Contributor

ooooo-create commented Nov 9, 2024

【报名】:9、10

@lizexu123
Copy link
Contributor Author

【报名】:1

@Hanyonggong
Copy link
Contributor

【报名】:19

@ooooo-create
Copy link
Contributor

ooooo-create commented Nov 13, 2024

【报名】:17、41-45、50

@DrRyanHuang
Copy link
Member

【报名】:11、12

@inaomIIsfarell
Copy link
Contributor

【报名】:64

@winffke
Copy link
Contributor

winffke commented Nov 14, 2024

【报名】:24

@KDZZZZZZ
Copy link

【报名】:23

@yangrongxinuser
Copy link
Contributor

【报名】:25

@ZHOU05030
Copy link
Contributor

【报名】:18

@PaddlePaddle PaddlePaddle deleted a comment from Hanyonggong Nov 15, 2024
@PolaKuma
Copy link
Contributor

【报名】:36

@fangfangssj
Copy link
Contributor

【报名】:21、51

@PolaKuma
Copy link
Contributor

【报名】:56

@PolaKuma
Copy link
Contributor

@PolaKuma PR赛题编号【65】不存在
Screenshot 2024-11-26 at 11 24 20
我这里显示有耶

@PolaKuma
Copy link
Contributor

【报名】:24

@PolaKuma
Copy link
Contributor

【报名】:57

@mori0umi
Copy link
Contributor

【报名】:29

@kineast
Copy link
Contributor

kineast commented Nov 27, 2024

【报名】:28

@wwwuyan
Copy link
Contributor

wwwuyan commented Nov 27, 2024

【报名】:31

@Sylence8
Copy link
Contributor

【报名】:27

@winffke
Copy link
Contributor

winffke commented Nov 28, 2024

【报名】:25、64

@ZHOU05030
Copy link
Contributor

【报名】:52

@winffke
Copy link
Contributor

winffke commented Nov 28, 2024

【报名】:62

@winffke
Copy link
Contributor

winffke commented Nov 28, 2024

【报名】:54

@PolaKuma
Copy link
Contributor

【报名】:8

@fangfangssj
Copy link
Contributor

【报名】:62、64

@PolaKuma
Copy link
Contributor

【报名】:32

@zeroader
Copy link

zeroader commented Dec 1, 2024

【报名】:37

@ZHOU05030
Copy link
Contributor

【报名】:61

@YuanRisheng
Copy link
Contributor

【报名】:41

@lizexu123
Copy link
Contributor Author

【报名】:6

@lizexu123
Copy link
Contributor Author

【报名】:66

@fangfangssj
Copy link
Contributor

【报名】:67、68

@xz-alex
Copy link
Contributor

xz-alex commented Dec 25, 2024

【报名】:22

@nizne9
Copy link
Contributor

nizne9 commented Dec 26, 2024

【报名】:69、70、71、72

@fangfangssj
Copy link
Contributor

【报名】:74、75、76、77、78

@PolaKuma
Copy link
Contributor

【报名】:49

@lizexu123
Copy link
Contributor Author

【报名】:7

@PolaKuma
Copy link
Contributor

【报名】:73

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress
Development

No branches or pull requests