-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64] Use INDEX for constant Neon step vectors #113424
[AArch64] Use INDEX for constant Neon step vectors #113424
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: Ricardo Jesus (rj-jesus) ChangesWhen compiling for an SVE target we can use INDEX to generate constant fixed-length step vectors, e.g.:
Currently:
With INDEX:
The logic for this was already in GCC implemented this recently. https://godbolt.org/z/6sP3Kj3db Full diff: https://github.com/llvm/llvm-project/pull/113424.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 4aa123b42d1966..e016a905c934bf 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -14512,7 +14512,9 @@ SDValue AArch64TargetLowering::LowerBUILD_VECTOR(SDValue Op,
SelectionDAG &DAG) const {
EVT VT = Op.getValueType();
- if (useSVEForFixedLengthVectorVT(VT, !Subtarget->isNeonAvailable()))
+ bool OverrideNEON = !Subtarget->isNeonAvailable() ||
+ cast<BuildVectorSDNode>(Op)->isConstantSequence();
+ if (useSVEForFixedLengthVectorVT(VT, OverrideNEON))
return LowerFixedLengthBuildVectorToSVE(Op, DAG);
// Try to build a simple constant vector.
diff --git a/llvm/test/CodeGen/AArch64/active_lane_mask.ll b/llvm/test/CodeGen/AArch64/active_lane_mask.ll
index bd5d076d1ba82e..025bbf749fc71b 100644
--- a/llvm/test/CodeGen/AArch64/active_lane_mask.ll
+++ b/llvm/test/CodeGen/AArch64/active_lane_mask.ll
@@ -430,10 +430,9 @@ define <2 x i1> @lane_mask_v2i1_i64(i64 %index, i64 %TC) {
define <16 x i1> @lane_mask_v16i1_i8(i8 %index, i8 %TC) {
; CHECK-LABEL: lane_mask_v16i1_i8:
; CHECK: // %bb.0:
-; CHECK-NEXT: adrp x8, .LCPI24_0
-; CHECK-NEXT: dup v0.16b, w0
-; CHECK-NEXT: ldr q1, [x8, :lo12:.LCPI24_0]
-; CHECK-NEXT: uqadd v0.16b, v0.16b, v1.16b
+; CHECK-NEXT: index z0.b, #0, #1
+; CHECK-NEXT: dup v1.16b, w0
+; CHECK-NEXT: uqadd v0.16b, v1.16b, v0.16b
; CHECK-NEXT: dup v1.16b, w1
; CHECK-NEXT: cmhi v0.16b, v1.16b, v0.16b
; CHECK-NEXT: ret
@@ -444,10 +443,9 @@ define <16 x i1> @lane_mask_v16i1_i8(i8 %index, i8 %TC) {
define <8 x i1> @lane_mask_v8i1_i8(i8 %index, i8 %TC) {
; CHECK-LABEL: lane_mask_v8i1_i8:
; CHECK: // %bb.0:
-; CHECK-NEXT: dup v0.8b, w0
-; CHECK-NEXT: adrp x8, .LCPI25_0
-; CHECK-NEXT: ldr d1, [x8, :lo12:.LCPI25_0]
-; CHECK-NEXT: uqadd v0.8b, v0.8b, v1.8b
+; CHECK-NEXT: index z0.b, #0, #1
+; CHECK-NEXT: dup v1.8b, w0
+; CHECK-NEXT: uqadd v0.8b, v1.8b, v0.8b
; CHECK-NEXT: dup v1.8b, w1
; CHECK-NEXT: cmhi v0.8b, v1.8b, v0.8b
; CHECK-NEXT: ret
@@ -459,9 +457,8 @@ define <4 x i1> @lane_mask_v4i1_i8(i8 %index, i8 %TC) {
; CHECK-LABEL: lane_mask_v4i1_i8:
; CHECK: // %bb.0:
; CHECK-NEXT: dup v0.4h, w0
-; CHECK-NEXT: adrp x8, .LCPI26_0
+; CHECK-NEXT: index z1.h, #0, #1
; CHECK-NEXT: movi d2, #0xff00ff00ff00ff
-; CHECK-NEXT: ldr d1, [x8, :lo12:.LCPI26_0]
; CHECK-NEXT: dup v3.4h, w1
; CHECK-NEXT: bic v0.4h, #255, lsl #8
; CHECK-NEXT: bic v3.4h, #255, lsl #8
@@ -478,8 +475,7 @@ define <2 x i1> @lane_mask_v2i1_i8(i8 %index, i8 %TC) {
; CHECK: // %bb.0:
; CHECK-NEXT: movi d0, #0x0000ff000000ff
; CHECK-NEXT: dup v1.2s, w0
-; CHECK-NEXT: adrp x8, .LCPI27_0
-; CHECK-NEXT: ldr d2, [x8, :lo12:.LCPI27_0]
+; CHECK-NEXT: index z2.s, #0, #1
; CHECK-NEXT: dup v3.2s, w1
; CHECK-NEXT: and v1.8b, v1.8b, v0.8b
; CHECK-NEXT: add v1.2s, v1.2s, v2.2s
|
Related issue #110410 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a more specific/dedicated test because the existing active_lane_mask tests should all ideally result in while
instructions when SVE is available, so we'll be missing test coverage when that happens.
When compiling for an SVE target we can use INDEX to generate constant fixed-length step vectors. The logic for this was already in `LowerBUILD_VECTOR`, though it was hidden under `!Subtarget->isNeonAvailable()`. This patch refactors this to enable the corresponding code path unconditionally for constant step vectors (as long as we can use SVE for them).
b3e3195
to
44a1040
Compare
Thanks, done. A couple of questions:
We generate:
This doesn't seem specific to this patch so I think we can address it separately. |
I have some work that I need to extract into a pull request to better enforce this by making the decision explicit (i.e. lowering code can choose exactly the variant it wants) rather than the current use count method we use during instruction selection.
|
Sounds great, thanks very much!
Should I then enforce the sequence to be within the immediate range, or would you rather handle this later as part of your follow-up patch? |
You can leave it with me. |
Perfect, thanks. |
When compiling for an SVE target we can use INDEX to generate constant fixed-length step vectors, e.g.: ``` uint32x4_t foo() { return (uint32x4_t){0, 1, 2, 3}; } ``` Currently: ``` foo(): adrp x8, .LCPI1_0 ldr q0, [x8, :lo12:.LCPI1_0] ret ``` With INDEX: ``` foo(): index z0.s, #0, llvm#1 ret ``` The logic for this was already in `LowerBUILD_VECTOR`, though it was hidden under a check for `!Subtarget->isNeonAvailable()`. This patch refactors this to enable the corresponding code path unconditionally for constant step vectors (as long as we can use SVE for them).
When compiling for an SVE target we can use INDEX to generate constant fixed-length step vectors, e.g.:
Currently:
With INDEX:
The logic for this was already in
LowerBUILD_VECTOR
, though it was hidden under a check for!Subtarget->isNeonAvailable()
. This patch refactors this to enable the corresponding code path unconditionally for constant step vectors (as long as we can use SVE for them).GCC implemented this recently.
https://godbolt.org/z/6sP3Kj3db
Resolves #110410