This repository has been archived by the owner on Apr 28, 2023. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
do not demote from shared before promoting to register
In cases where a reference group promoted to registers covered exactly the same accesses as another group promoted to shared memory, the second group was demoted to save up shared memory space. However, this led to adverse effects since copying global->shared is performed in the beginning of the block while copying global->register deeper in the tree, which does not allow to hide latency from loads. Keep the group promoted to shared memory and perform a copy shared->register. Alternative solution would be to decrease the promotion scope depth for register promotion. This would require to ensure that loops indices of which are present in subscripts of "register" arrays are fully unrolled so that the elements of that array are effectively mapped to registers. Since unrolling is expensive in compilation time and is exposed to the autotuner, we would prefer to also expose the register promotion depth in the future.
- Loading branch information