You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As of now, the transpose and shuffle layout conversions assume each work-item will own a single contiguous row in the matrix, i.e., the layout will look something like:
victor-eds
changed the title
Allow sub-group transpose and shuffles with more than one contiguos row per thread
Allow sub-group transpose and shuffles with more than one contiguous row per thread
Nov 19, 2024
Add support for layout conversion shuffles in which rows managed by a
single thread are contiguous in the output matrix.
Step 2/2 to
#2749
---------
Signed-off-by: victor-eds <victor.perez@codeplay.com>
As of now, the transpose and shuffle layout conversions assume each work-item will own a single contiguous row in the matrix, i.e., the layout will look something like:
However, allowing more than one elements per work-item in the Y dimension enables further optimizations so we want to support layouts like:
This will allow more advanced approaches in
-tritonintelgpu-optimize-reduction-locality
.The text was updated successfully, but these errors were encountered: