-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] Add option for no gpu.block_dim in GPUDistributeScfFor #17214
Conversation
I will create an issue later about the |
@@ -38,13 +42,30 @@ struct DistributeLoop final : public OpRewritePattern<scf::ForOp> { | |||
if (!numDimAttr) | |||
return failure(); | |||
|
|||
auto funcOp = forOp->getParentOfType<FunctionOpInterface>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These checks are only neccessary and to fail when useBlockDims
is false?
// CHECK: %[[YLB:.+]] = affine.apply affine_map<()[s0, s1, s2] -> (s0 * s1 + s2)>()[%[[ID]], %[[STEP]], %[[LB]]] | ||
// CHECK: %[[YSTEP:.+]] = affine.apply affine_map<()[s0, s1] -> (s0 * s1)>()[%[[DIM]], %[[STEP]]] | ||
// CHECK: scf.for %[[IV:.+]] = %[[YLB]] to %[[UB]] step %[[YSTEP]] { | ||
// CHECK: memref.store %{{.+}}, %{{.+}}[%[[IV]]] : memref<?xf32> | ||
|
||
// NO-BLOCK-DIM-LABEL: func.func @distribute_to_y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are duplicated with the above in the sense they don't check additonal stuff? It's fine to just have one of them and remove the others--saves us efforts to update all of them later.
9075fed
to
3d10c31
Compare
…g#17214) This PR adds an option to generate `arith.constant` ops with workgroup sizes in place of `gpu.block_dim` ops in `GPUDistributeScfFor`. The `gpu.block_dim` ops are not currently handled properly on ROCM backends, so this option is needed to support `GPUDistributeScfFor` on the LLVMGPU path. Signed-off-by: Lubo Litchev <lubol@google.com>
This PR adds an option to generate
arith.constant
ops with workgroup sizes in place ofgpu.block_dim
ops inGPUDistributeScfFor
. Thegpu.block_dim
ops are not currently handled properly on ROCM backends, so this option is needed to supportGPUDistributeScfFor
on the LLVMGPU path.