-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the grad and enhance the cache of norm_convolution fusion ops. #36168
Conversation
Thanks for your contribution! |
handle, args_.filter_desc.desc(), args_.out_desc.desc(), | ||
args_.conv_desc.desc(), args_.in_desc.desc(), dgrad_algo_, | ||
&workspace_size)); | ||
return RoundUp(workspace_size, 512); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里如果也cache住的话,wgrad_op->GetWorkspaceSizeInBytes(ctx.cudnn_handle())
得到的dweight_workspace_size
和这里得到的dgrad_workspace_size
取最大值,这里的开销也可以减掉了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这部分开销不算大,而且cache的对象不一样。Fused方式是需要把整个CudnnFusionOp都cache下来,主要目的是cache住 FusedOpsPlan。后续可以再验证下这部分对性能是否有影响,若有则再进一步cache吧。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant work!
PR types
Performance optimization
PR changes
OPs
Describe
cudnnFusedOpsPlan_t
的CudnnNormConvolution
融合计算的写法,包括:CudnnNormConvolutionGrad
CudnnFusionOpCache
,将生成的CudnnFusionOp
cache起来,避免每次调用cudnnMakeFusedOpsPlan
造成的巨大的CPU开销