-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add fused_codegeex_qkv_reshape #9927
Conversation
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally. |
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally. |
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally. |
Speed stats:
|
CI failed when running job: cpu-module. PR label automerge has been removed |
Speed stats:
|
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9927/ |
Speed stats:
|
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9927/ |
在 codegeex 的 attention 部分对 q,k,v 的reshape每次迭代都需要调用 3 次 to contiguous 操作:https://github.com/Oneflow-Inc/one-codegeex/blob/main/codegeex/oneflow/codegeex_model.py#L112-L138
预计本pr可以将这3次 to contiguous 操作去掉,并减少eager的调度开销。待提供nsys效果图。
选取同一个时间节点的self attention block,原始的 codegeex nsys:
本pr的nsys:
可以看到fused codegeex qkv shape 可以避免三次tocontiguous(这里是memcpy d2d)带来的调度开销以及单独view带来的调度开销。cuda kernel的耗时也从48us->39us。