Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Lars op optimiztion with cudaLaunchCooperativeKernel method (#35652)
* A leap of try for cudaLaunchCooperativeKernel * fix bugs * Totally replace the lar cuda kernel * Fix bugs * fix code according to comments * fix codes according to review comments * adding some function overload * relocate the power operation.
- Loading branch information