You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 28, 2023. It is now read-only.
If a tensor reference group is promoted to shared memory at some scope, it may be interesting to promote it to registers at some deeper scope. There are two possibilities:
promote to registers instead of promoting to shared (freeing the shared memory for other uses or for increased occupancy);
promote to registers from shared, hiding global access latency and/or having more coalescing when copying from global to shared.
#161 and #217 attempted this behavior; first by demoting from shared memory, then by promoting from shared to private. Demotion from shared was mostly harmful, principally because promotion to registers was too deep and rarely beneficial by itself. The effect may be different with tunable promotion depth, so we can start by having this behavior controlled by a flag.
The text was updated successfully, but these errors were encountered:
If a tensor reference group is promoted to shared memory at some scope, it may be interesting to promote it to registers at some deeper scope. There are two possibilities:
#161 and #217 attempted this behavior; first by demoting from shared memory, then by promoting from shared to private. Demotion from shared was mostly harmful, principally because promotion to registers was too deep and rarely beneficial by itself. The effect may be different with tunable promotion depth, so we can start by having this behavior controlled by a flag.
The text was updated successfully, but these errors were encountered: