-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CINN][Add Backend Pass Comment No.10] Add comment for replace_cross_thread_reduction #70227
base: develop
Are you sure you want to change the base?
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
/** | ||
* A pass that optimizes cross-thread reduction operations on GPU by replacing them with more efficient implementations. | ||
* | ||
* [Detailed application scenario] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
方括号的为模板的提示,在正式注释里删掉即可
* Replace cross thread reduction to external call. | ||
*/ | ||
void ReplaceCrossThreadReduction(ir::LoweredFunc fn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
注释写在函数上方
* TODO: | ||
* - Support more reduction operations (e.g., custom reduction functions) | ||
* - Add dynamic selection of reduction methods based on input size | ||
* - Optimize shared memory allocation for better bank conflicts avoidance | ||
* - Add support for multi-warp reductions within a block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
原函数没有的TODO不用加了,这个是可选项
Sorry to inform you that f48074a's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
* This pass is applicable in scenarios where multiple GPU threads need to perform reduction operations (like sum, max, min) | ||
* across thread boundaries. These scenarios are common in deep learning workloads, particularly in operations like: | ||
* - Computing sum/mean across feature dimensions | ||
* - Global pooling operations | ||
* - Softmax normalization | ||
* - Gradient aggregation in distributed training |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个描述得太高层了,需要描述一下在后端IR优化上的场景
* | ||
* | ||
* When applied, this pass will: | ||
* 1. Identify reduction operations in GPU-bound loops |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里主要做cross_thread的
* for (i = 0; i < 1024; i++) { | ||
* if (i < n) { | ||
* sum += data[i]; | ||
* } | ||
* } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ir里面应该体现一下gpu thread bind
PR-CI-Codestyle-Check 需要通过 |
@KDZZZZZZ PR-CI-Codestyle-Check 还没有通过,请再更新下。
|
test=document_fix
PR types
CINN
PR changes
Others
Description
为replace_cross_thread_reduction Pass添加了注释