-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference] Add the quant_linear_fuse_pass #58637
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
@RichardWooSJTU 吴哥麻烦帮忙看看 |
A: 有影响,对于含有conv算子的模型也都删掉了quant/dequant,但是并没有其他pass执行后续操作
A: 你的pass将matmul_v2已经删除了,并融合到了quant_linear里面。
A: 每一个都是必须的,只不过会有默认值。这需要你理解每一个attr的含义及作用 |
好的~ |
return elementwise_add_out_var; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里还是建议匹配含有quant/dequant算子的完整pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
|
||
|
||
QuantLinearFusePass::QuantLinearFusePass() { | ||
AddOpCompat(OpCompat("matmul_v2")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该是匹配matmul还是matmulv2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
计算图中显示的 op 是 matmul_v2,并且结果也表明是成功匹配到了,所以应该是 matmul_v2 吧?
63409ad
to
82c1f0d
Compare
@RichardWooSJTU 吴哥麻烦再 review 一下 |
7160228
to
3a4606f
Compare
你好,有在具体的模型上测试下这个 pass 能带来的性能提升吗? |
|
||
for (int i = 0; i < weight_tensor->dims()[1]; ++i) { | ||
scale_weights[i] = 1.0f / weight_scale_data[i]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我觉得这里拿到 weight 的 quant scale 用 for 循环的话会降低 pass 的性能,但是在目前只有 weight 的 dequant scale 的情况下,感觉也只能这样拿到 quant 的 scale 了。@zhoutianzi66
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for @unittest.skipIf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for @unittest.skipIf
* add the quant_linear_fuse_pass
PR types
New features
PR changes
APIs
Description
使得下图所示计算图中的quant dequant weight dequant 和 matmul_v2 以及 elementwise_add 融合成 quant_linear 以便进行 int8 的推理
融合后结果:
目前实现方法:
先直接调用 delete_quant_dequant_linear_op_pass 和 delete_weight_dequant_linear_op_pass 的内容把 quant dequant 以及 weight_dequant 删掉,然后再调用自己的实现来融合剩下的 matmul_v2 以及 elementwise_add 算子变成 quant_linear
想请教的地方:
我不大清楚哪些 attributes 是必须的,我看 quant_linear op 的实现里有这些 attributes,请问是否这些 attributes 都要添加?