-
Notifications
You must be signed in to change notification settings - Fork 555
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Optimized Res-block Fusion without SE (#1678)
* misc changes to cudnn backend - replace all cudaMemcpyAsync used for loading weights with cudaMemcpy as source (in CPU memory) could be deleted before the async version of the function actually does the copy. - minor naming/style changes. - add comment explaining what the policy map layer does and how the layout conversion from CHW to HWC works. * fix typo in comment * clang-format * address review comment * Add 320 and 352 channel support for fused SE layer - just add template instantiations. - verified that it works and provides a (very) slight speedup. * Update fp16_kernels.cu * Simpler kernel for res-block fusion without SE - use constant block size of 64, splitting channel dimension also into multiple blocks as needed. - This allows arbitrarily large filter counts without running out of register file. * minor refactoring - allow using res block fusing opt for alternate layers (that don't have SE) even on GPUs that don't have enough shared memory. * minor functional fix * a few more fixes to get correct output hopefully functionally correct now. * fix cudnn backend build - missed the fact that it also uses Res block fusion :-/ * fix build errors * some more fixes * minor cleanup * remove --use_fast_math - as it doesn't improve performance. - some minor cleanup * fix indentation
- Loading branch information
Showing
8 changed files
with
333 additions
and
126 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.