Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DirectX] The dx.dot2/3/4 and dx.dot2add intrinsics leave vectors around post-scalarization #134569

Open
bogner opened this issue Apr 7, 2025 · 1 comment · May be fixed by #134570
Open

[DirectX] The dx.dot2/3/4 and dx.dot2add intrinsics leave vectors around post-scalarization #134569

bogner opened this issue Apr 7, 2025 · 1 comment · May be fixed by #134570
Assignees

Comments

@bogner
Copy link
Contributor

bogner commented Apr 7, 2025

The dx.dot2, dot3, dot4, and dot2add intrinsics are defined as taking vector arguments and have special handling to scalarize those arguments in DXILOpLowering. This causes issues, as the vectors don't get entirely cleaned up before we generate DXIL.

From llvm-beanz/offload-test-suite#61:

# .---command stdout------------
# | Function: main: error: Instructions must be of an allowed type.
# | note: at '%17 = insertelement <3 x float> undef, float %16, i32 0' in block '#0' of function 'main'.
# | Function: main: error: Instructions must be of an allowed type.
# | note: at '%18 = insertelement <3 x float> %17, float %15, i32 1' in block '#0' of function 'main'.
# | Function: main: error: Instructions must be of an allowed type.
# | note: at '%19 = insertelement <3 x float> %18, float %14, i32 2' in block '#0' of function 'main'.
# | Function: main: error: Instructions must be of an allowed type.
# | note: at '%20 = extractelement <3 x float> %19, i32 0' in block '#0' of function 'main'.
# | Function: main: error: Instructions must be of an allowed type.
# | note: at '%21 = extractelement <3 x float> %19, i32 1' in block '#0' of function 'main'.
# | Function: main: error: Instructions must be of an allowed type.
# | note: at '%22 = extractelement <3 x float> %19, i32 2' in block '#0' of function 'main'.
# | Function: main: error: Instructions must be of an allowed type.
# | note: at '%23 = extractelement <3 x float> %19, i32 0' in block '#0' of function 'main'.
# | Function: main: error: Instructions must be of an allowed type.
# | note: at '%24 = extractelement <3 x float> %19, i32 1' in block '#0' of function 'main'.
# | Function: main: error: Instructions must be of an allowed type.
# | note: at '%25 = extractelement <3 x float> %19, i32 2' in block '#0' of function 'main'.
# | Validation failed.

It would be simpler to make these intrinsics match their DXIL ops more closely and simply take scalar arguments. This will allow the scalarizer to do its job earlier in the pipeline.

Alternatively, we could improve the vectorArgExpansion logic in DXILOpLowering to clean up any remaining vector artifacts, but this seems like it would be redundant with the logic in the scalarizer.

@bogner bogner self-assigned this Apr 7, 2025
bogner added a commit to bogner/llvm-project that referenced this issue Apr 7, 2025
The `dx.dot2`, `dot3`, and `dot4` intrinsics exist purely to lower
`dx.fdot`, and they map exactly to the DXIL ops of the same name. Using
vectors for their arguments adds unnecessary complexity and causes us to
have vector operations that are not trivial to lower post-scalarizer.

Similarly, the `dx.dot2add` intrinsic is overly generic for something
that only needs to lower to a single `dot2AddHalf` DXIL op. Update its
signature to match the operation it lowers to.

Fixes llvm#134569.
@bogner bogner moved this to Needs Review in HLSL Support Apr 7, 2025
bogner added a commit to bogner/llvm-project that referenced this issue Apr 7, 2025
The `dx.dot2`, `dot3`, and `dot4` intrinsics exist purely to lower
`dx.fdot`, and they map exactly to the DXIL ops of the same name. Using
vectors for their arguments adds unnecessary complexity and causes us to
have vector operations that are not trivial to lower post-scalarizer.

Similarly, the `dx.dot2add` intrinsic is overly generic for something
that only needs to lower to a single `dot2AddHalf` DXIL op. Update its
signature to match the operation it lowers to.

Fixes llvm#134569.
@farzonl
Copy link
Member

farzonl commented Apr 7, 2025

Matching the dot2/3/4 intrinsics to the dxil op would mean doing custom scalarization in IntrinsicExpansion pass. I had a change that did that and I thought the team came to the conclusion we didn't want to do custom scalarizations like that.

Also for the dot2add case that would be interesting because it forces the scalarization into clang codegen.

bogner added a commit to bogner/llvm-project that referenced this issue Apr 8, 2025
The `dx.dot2`, `dot3`, and `dot4` intrinsics exist purely to lower
`dx.fdot`, and they map exactly to the DXIL ops of the same name. Using
vectors for their arguments adds unnecessary complexity and causes us to
have vector operations that are not trivial to lower post-scalarizer.

Similarly, the `dx.dot2add` intrinsic is overly generic for something
that only needs to lower to a single `dot2AddHalf` DXIL op. Update its
signature to match the operation it lowers to.

Fixes llvm#134569.
bogner added a commit to bogner/llvm-project that referenced this issue Apr 8, 2025
The `dx.dot2`, `dot3`, and `dot4` intrinsics exist purely to lower
`dx.fdot`, and they map exactly to the DXIL ops of the same name. Using
vectors for their arguments adds unnecessary complexity and causes us to
have vector operations that are not trivial to lower post-scalarizer.

Similarly, the `dx.dot2add` intrinsic is overly generic for something
that only needs to lower to a single `dot2AddHalf` DXIL op. Update its
signature to match the operation it lowers to.

Fixes llvm#134569.
bogner added a commit to bogner/llvm-project that referenced this issue Apr 9, 2025
The `dx.dot2`, `dot3`, and `dot4` intrinsics exist purely to lower
`dx.fdot`, and they map exactly to the DXIL ops of the same name. Using
vectors for their arguments adds unnecessary complexity and causes us to
have vector operations that are not trivial to lower post-scalarizer.

Similarly, the `dx.dot2add` intrinsic is overly generic for something
that only needs to lower to a single `dot2AddHalf` DXIL op. Update its
signature to match the operation it lowers to.

Fixes llvm#134569.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs Review
Development

Successfully merging a pull request may close this issue.

2 participants