Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Metal] Reduce number of threads for reduction layers (apache#8206)
Reduced default number of threads in reduction kernels for Metal. Default code generation generated thread block with the following size: 32x32x1. With this size number of threads per threadgroup was equal to 1024 (32 * 32 * 1). Sometimes device doesn't have enough resources and in this case we will get an exception that the block size is greater than value of maxTotalThreadsPerThreadgroup. To prevent such situation we decrease default number of threads. With this fix every model should work with default codegen and auto-tuning or auto-scheduling will select the optimal number of threads.
- Loading branch information