Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add launch bound to limit the registers usage for volta architecture (#…
…38113) From --ptxas-options=-v, SegmentOpsKernel uses 66 registers in a block. There are two ways to resolve this problem: Reduce the threads per block launch configuration add __launch_bound__ to give information to nvcc compiler for reducing registers usage this PR chooses __launch_bound__ solution because changing gpu_launch_config may affect other ops.
- Loading branch information