Skip to content

Conversation

charithaintc
Copy link
Contributor

@charithaintc charithaintc commented Jul 11, 2025

Reapply attempt for : #148291
Fix for the build failure reported in : https://lab.llvm.org/buildbot/#/builders/116/builds/15477


This crash is caused by mismatch of distributed type returned by getDistributedType and intended distributed type for forOp results.

Solution diff: 20c2cf6

Example:

func.func @warp_scf_for_broadcasted_result(%arg0: index) -> vector<1xf32> {
  %c128 = arith.constant 128 : index
  %c1 = arith.constant 1 : index
  %c0 = arith.constant 0 : index
  %2 = gpu.warp_execute_on_lane_0(%arg0)[32] -> (vector<1xf32>) {
    %ini = "some_def"() : () -> (vector<1xf32>)
    %0 = scf.for %arg3 = %c0 to %c128 step %c1 iter_args(%arg4 = %ini) -> (vector<1xf32>) {
      %1 = "some_op"(%arg4) : (vector<1xf32>) -> (vector<1xf32>)
      scf.yield %1 : vector<1xf32>
    }
    gpu.yield %0 : vector<1xf32>
  }
  return %2 : vector<1xf32>
}

In this case the distributed type for forOp result is vector<1xf32> (result is not distributed and broadcasted to all lanes instead). However, in this case getDistributedType will return NULL type.

Therefore, if the distributed type can be recovered from warpOp, we should always do that first before using getDistributedType

@charithaintc
Copy link
Contributor Author

Hi, @kurapov-peter, @Jianhui-Li Can you please review the diff here.
20c2cf6

Copy link
Contributor

@kurapov-peter kurapov-peter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd cover all the indexing logic with assertions, pretty brittle.

@charithaintc charithaintc merged commit 244ebef into llvm:main Jul 14, 2025
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants