You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are some kernels where tunings with the same name are not implemented the same way between Base and RAJA variants. Work on the implementations to make them the same or add tunings to have an apples to apples comparison between Base and RAJA variants.
Reducers - Base reducers do a block reduction then an atomic per block to finalize the reduction but RAJA reducers do a block reduction then the last block finalizes the reduction Add RAJA GPU block atomic Tuning for Reduction Kernels #393
Reducers - Base reducers block atomics are into a contiguous buffer so have false sharing but RAJA reducers block atomics are into different buffers so they may avoid false sharing
HALOEXCHANGE_FUSED - RAJA variants have dynamic scratch memory usage, lower hipLimitStackSize or set env HSA_SCRATCH_SINGLE_LIMIT=240000000 (MI250X) to avoid dynamic scratch memory allocation
Reducers - RAJA variants don't always inline, use compiler flags from hipcc (-mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false) or increase inline threshold (-fgpu-inline-threshold=100000)
The text was updated successfully, but these errors were encountered:
There are some kernels where tunings with the same name are not implemented the same way between Base and RAJA variants. Work on the implementations to make them the same or add tunings to have an apples to apples comparison between Base and RAJA variants.
Affected kernels/algorithms:
Other things affecting performance:
hipLimitStackSize
or set envHSA_SCRATCH_SINGLE_LIMIT=240000000
(MI250X) to avoid dynamic scratch memory allocation-mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false
) or increase inline threshold (-fgpu-inline-threshold=100000
)The text was updated successfully, but these errors were encountered: