-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PaddleInference]compile optimization of weight_only_linear #56706
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有个问题,这些cta_shape会不会在不同的SM架构下会编译失败?
之前应该就是矩阵式覆盖都支持的,我可以再验证下 |
} | ||
""" | ||
|
||
archs = [70, 75, 80] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
archs不建议写死,做成脚本的参数吧~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有考虑这个,同时只针对需要的arch编译,不过gemm_dispatch里有一处显式调用了70, 75, 80,我看看有没有方案配合改写一下
Sorry to inform you that 8b27b5a's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
…ddle#56706) * separately-compiled fpA_intB_gemm
没有paddle/phi/kernels/fusion/cutlass/cutlass_kernels/fpA_intB_gemm/autogen/arch_define.h这个头文件啊 |
PR types
Function optimization
PR changes
Others
Description
对weight_only的gemm kernel进行编译分离以提高编译并行度,加快编译速度
编译时间优化从约20m到约1min
Pcard-74871