Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Convert AVLs with vlenb to VLMAX where possible #97800

Merged
merged 7 commits into from
Jul 11, 2024

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Jul 5, 2024

Given an AVL that's computed from vlenb, if it's equal to VLMAX then we can replace it with the VLMAX sentinel value.

The main motiviation is to be able to express an EVL of VLMAX in VP intrinsics whilst emitting vsetvli a0, zero, so that we can replace llvm.riscv.masked.strided.{load,store} with their VP counterparts.

This is done in RISCVVectorPeephole (previously RISCVFoldMasks, renamed to account for the fact that it no longer just folds masks) instead of SelectionDAG since there are multiple places places where VP nodes are lowered that would have need to have been handled.

This also avoids doing it in RISCVInsertVSETVLI as it's much harder to lookup the value of the AVL, and in RISCVVectorPeephole we can take advantage of DeadMachineInstrElim to remove any leftover PseudoReadVLENBs.

@llvmbot
Copy link
Member

llvmbot commented Jul 5, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

Given an AVL that's computed from vlenb, if it's equal to VLMAX then we can replace it with the VLMAX sentinel value.

The main motiviation is to be able to express an EVL of VLMAX in VP intrinsics whilst emitting vsetvli a0, zero, so that we can replace llvm.riscv.masked.strided.{load,store} with their VP counterparts.

This is done in RISCVFoldMasks instead of SelectionDAG since there are multiple places places where VP nodes are lowered that would have need to have been handled.

This also avoids doing it in RISCVInsertVSETVLI as it's much harder to lookup the value of the AVL, and in RISCVFoldMasks we can take advantage of DeadMachineInstrElim to remove any leftover PseudoReadVLENBs.


Full diff: https://github.com/llvm/llvm-project/pull/97800.diff

7 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVFoldMasks.cpp (+56)
  • (modified) llvm/test/CodeGen/RISCV/rvv/insert-subvector.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vadd-vp.ll (+5-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmax-vp.ll (+5-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmaxu-vp.ll (+5-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmin-vp.ll (+5-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vminu-vp.ll (+5-8)
diff --git a/llvm/lib/Target/RISCV/RISCVFoldMasks.cpp b/llvm/lib/Target/RISCV/RISCVFoldMasks.cpp
index 2089f5dda6fe5..f65127beaaa2b 100644
--- a/llvm/lib/Target/RISCV/RISCVFoldMasks.cpp
+++ b/llvm/lib/Target/RISCV/RISCVFoldMasks.cpp
@@ -14,6 +14,12 @@
 // ->
 // PseudoVMV_V_V %false, %true, %vl, %sew
 //
+// It also converts AVLs to VLMAX where possible
+// %vl = VLENB * something
+// PseudoVADD_V_V %a, %b, %vl
+// ->
+// PseudoVADD_V_V %a, %b, -1
+//
 //===---------------------------------------------------------------------===//
 
 #include "RISCV.h"
@@ -47,6 +53,7 @@ class RISCVFoldMasks : public MachineFunctionPass {
   StringRef getPassName() const override { return "RISC-V Fold Masks"; }
 
 private:
+  bool convertToVLMAX(MachineInstr &MI) const;
   bool convertToUnmasked(MachineInstr &MI) const;
   bool convertVMergeToVMv(MachineInstr &MI) const;
 
@@ -62,6 +69,54 @@ char RISCVFoldMasks::ID = 0;
 
 INITIALIZE_PASS(RISCVFoldMasks, DEBUG_TYPE, "RISC-V Fold Masks", false, false)
 
+// If an AVL is a VLENB that's possibly scaled to be equal to VLMAX, convert it
+// to the VLMAX sentinel value.
+bool RISCVFoldMasks::convertToVLMAX(MachineInstr &MI) const {
+  if (!RISCVII::hasVLOp(MI.getDesc().TSFlags) ||
+      !RISCVII::hasSEWOp(MI.getDesc().TSFlags))
+    return false;
+  MachineOperand &VL = MI.getOperand(RISCVII::getVLOpNum(MI.getDesc()));
+  if (!VL.isReg())
+    return false;
+  MachineInstr *Def = MRI->getVRegDef(VL.getReg());
+  if (!Def)
+    return false;
+
+  // Fixed-point value, denumerator=8
+  unsigned ScaleFixed = 8;
+  // Check if the VLENB was scaled for a possible slli/srli
+  if (Def->getOpcode() == RISCV::SLLI) {
+    ScaleFixed <<= Def->getOperand(2).getImm();
+    Def = MRI->getVRegDef(Def->getOperand(1).getReg());
+  } else if (Def->getOpcode() == RISCV::SRLI) {
+    ScaleFixed >>= Def->getOperand(2).getImm();
+    Def = MRI->getVRegDef(Def->getOperand(1).getReg());
+  }
+
+  if (!Def || Def->getOpcode() != RISCV::PseudoReadVLENB)
+    return false;
+
+  auto LMUL = RISCVVType::decodeVLMUL(RISCVII::getLMul(MI.getDesc().TSFlags));
+  // Fixed-point value, denumerator=8
+  unsigned LMULFixed = LMUL.second ? (8 / LMUL.first) : 8 * LMUL.first;
+  unsigned SEW =
+      1 << MI.getOperand(RISCVII::getSEWOpNum(MI.getDesc())).getImm();
+
+  // AVL = (VLENB * Scale)
+  //
+  // VLMAX = (VLENB * 8 * LMUL) / SEW
+  //
+  // AVL == VLMAX
+  // -> VLENB * Scale == (VLENB * 8 * LMUL) / SEW
+  // -> Scale == (8 * LMUL) / SEW
+  if (ScaleFixed != 8 * LMULFixed / SEW)
+    return false;
+
+  VL.ChangeToImmediate(RISCV::VLMaxSentinel);
+
+  return true;
+}
+
 bool RISCVFoldMasks::isAllOnesMask(const MachineInstr *MaskDef) const {
   assert(MaskDef && MaskDef->isCopy() &&
          MaskDef->getOperand(0).getReg() == RISCV::V0);
@@ -213,6 +268,7 @@ bool RISCVFoldMasks::runOnMachineFunction(MachineFunction &MF) {
 
   for (MachineBasicBlock &MBB : MF) {
     for (MachineInstr &MI : MBB) {
+      Changed |= convertToVLMAX(MI);
       Changed |= convertToUnmasked(MI);
       Changed |= convertVMergeToVMv(MI);
     }
diff --git a/llvm/test/CodeGen/RISCV/rvv/insert-subvector.ll b/llvm/test/CodeGen/RISCV/rvv/insert-subvector.ll
index 0cd4f423a9df6..8d917f286720a 100644
--- a/llvm/test/CodeGen/RISCV/rvv/insert-subvector.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/insert-subvector.ll
@@ -305,9 +305,9 @@ define <vscale x 16 x i8> @insert_nxv16i8_nxv1i8_7(<vscale x 16 x i8> %vec, <vsc
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    csrr a0, vlenb
 ; CHECK-NEXT:    srli a1, a0, 3
-; CHECK-NEXT:    sub a1, a0, a1
-; CHECK-NEXT:    vsetvli zero, a0, e8, m1, ta, ma
-; CHECK-NEXT:    vslideup.vx v8, v10, a1
+; CHECK-NEXT:    sub a0, a0, a1
+; CHECK-NEXT:    vsetvli a1, zero, e8, m1, ta, ma
+; CHECK-NEXT:    vslideup.vx v8, v10, a0
 ; CHECK-NEXT:    ret
   %v = call <vscale x 16 x i8> @llvm.vector.insert.nxv1i8.nxv16i8(<vscale x 16 x i8> %vec, <vscale x 1 x i8> %subvec, i64 7)
   ret <vscale x 16 x i8> %v
@@ -318,9 +318,9 @@ define <vscale x 16 x i8> @insert_nxv16i8_nxv1i8_15(<vscale x 16 x i8> %vec, <vs
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    csrr a0, vlenb
 ; CHECK-NEXT:    srli a1, a0, 3
-; CHECK-NEXT:    sub a1, a0, a1
-; CHECK-NEXT:    vsetvli zero, a0, e8, m1, ta, ma
-; CHECK-NEXT:    vslideup.vx v9, v10, a1
+; CHECK-NEXT:    sub a0, a0, a1
+; CHECK-NEXT:    vsetvli a1, zero, e8, m1, ta, ma
+; CHECK-NEXT:    vslideup.vx v9, v10, a0
 ; CHECK-NEXT:    ret
   %v = call <vscale x 16 x i8> @llvm.vector.insert.nxv1i8.nxv16i8(<vscale x 16 x i8> %vec, <vscale x 1 x i8> %subvec, i64 15)
   ret <vscale x 16 x i8> %v
diff --git a/llvm/test/CodeGen/RISCV/rvv/vadd-vp.ll b/llvm/test/CodeGen/RISCV/rvv/vadd-vp.ll
index ede395f4df8e1..2a4fbb248cd9c 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vadd-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vadd-vp.ll
@@ -1436,20 +1436,17 @@ define <vscale x 32 x i32> @vadd_vi_nxv32i32_evl_nx8(<vscale x 32 x i32> %va, <v
 define <vscale x 32 x i32> @vadd_vi_nxv32i32_evl_nx16(<vscale x 32 x i32> %va, <vscale x 32 x i1> %m) {
 ; RV32-LABEL: vadd_vi_nxv32i32_evl_nx16:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    csrr a0, vlenb
-; RV32-NEXT:    slli a0, a0, 1
-; RV32-NEXT:    vsetvli zero, a0, e32, m8, ta, ma
+; RV32-NEXT:    vsetvli a0, zero, e32, m8, ta, ma
 ; RV32-NEXT:    vadd.vi v8, v8, -1, v0.t
 ; RV32-NEXT:    ret
 ;
 ; RV64-LABEL: vadd_vi_nxv32i32_evl_nx16:
 ; RV64:       # %bb.0:
 ; RV64-NEXT:    csrr a0, vlenb
-; RV64-NEXT:    srli a1, a0, 2
-; RV64-NEXT:    vsetvli a2, zero, e8, mf2, ta, ma
-; RV64-NEXT:    vslidedown.vx v24, v0, a1
-; RV64-NEXT:    slli a0, a0, 1
-; RV64-NEXT:    vsetvli zero, a0, e32, m8, ta, ma
+; RV64-NEXT:    srli a0, a0, 2
+; RV64-NEXT:    vsetvli a1, zero, e8, mf2, ta, ma
+; RV64-NEXT:    vslidedown.vx v24, v0, a0
+; RV64-NEXT:    vsetvli a0, zero, e32, m8, ta, ma
 ; RV64-NEXT:    vadd.vi v8, v8, -1, v0.t
 ; RV64-NEXT:    vmv1r.v v0, v24
 ; RV64-NEXT:    vsetivli zero, 0, e32, m8, ta, ma
diff --git a/llvm/test/CodeGen/RISCV/rvv/vmax-vp.ll b/llvm/test/CodeGen/RISCV/rvv/vmax-vp.ll
index c15caa31bb098..5fdfb332da7cf 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vmax-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vmax-vp.ll
@@ -1073,20 +1073,17 @@ define <vscale x 32 x i32> @vmax_vx_nxv32i32_evl_nx8(<vscale x 32 x i32> %va, i3
 define <vscale x 32 x i32> @vmax_vx_nxv32i32_evl_nx16(<vscale x 32 x i32> %va, i32 %b, <vscale x 32 x i1> %m) {
 ; RV32-LABEL: vmax_vx_nxv32i32_evl_nx16:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    csrr a1, vlenb
-; RV32-NEXT:    slli a1, a1, 1
-; RV32-NEXT:    vsetvli zero, a1, e32, m8, ta, ma
+; RV32-NEXT:    vsetvli a1, zero, e32, m8, ta, ma
 ; RV32-NEXT:    vmax.vx v8, v8, a0, v0.t
 ; RV32-NEXT:    ret
 ;
 ; RV64-LABEL: vmax_vx_nxv32i32_evl_nx16:
 ; RV64:       # %bb.0:
 ; RV64-NEXT:    csrr a1, vlenb
-; RV64-NEXT:    srli a2, a1, 2
-; RV64-NEXT:    vsetvli a3, zero, e8, mf2, ta, ma
-; RV64-NEXT:    vslidedown.vx v24, v0, a2
-; RV64-NEXT:    slli a1, a1, 1
-; RV64-NEXT:    vsetvli zero, a1, e32, m8, ta, ma
+; RV64-NEXT:    srli a1, a1, 2
+; RV64-NEXT:    vsetvli a2, zero, e8, mf2, ta, ma
+; RV64-NEXT:    vslidedown.vx v24, v0, a1
+; RV64-NEXT:    vsetvli a1, zero, e32, m8, ta, ma
 ; RV64-NEXT:    vmax.vx v8, v8, a0, v0.t
 ; RV64-NEXT:    vmv1r.v v0, v24
 ; RV64-NEXT:    vsetivli zero, 0, e32, m8, ta, ma
diff --git a/llvm/test/CodeGen/RISCV/rvv/vmaxu-vp.ll b/llvm/test/CodeGen/RISCV/rvv/vmaxu-vp.ll
index df494f8af7387..7d678950b7a3c 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vmaxu-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vmaxu-vp.ll
@@ -1072,20 +1072,17 @@ define <vscale x 32 x i32> @vmaxu_vx_nxv32i32_evl_nx8(<vscale x 32 x i32> %va, i
 define <vscale x 32 x i32> @vmaxu_vx_nxv32i32_evl_nx16(<vscale x 32 x i32> %va, i32 %b, <vscale x 32 x i1> %m) {
 ; RV32-LABEL: vmaxu_vx_nxv32i32_evl_nx16:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    csrr a1, vlenb
-; RV32-NEXT:    slli a1, a1, 1
-; RV32-NEXT:    vsetvli zero, a1, e32, m8, ta, ma
+; RV32-NEXT:    vsetvli a1, zero, e32, m8, ta, ma
 ; RV32-NEXT:    vmaxu.vx v8, v8, a0, v0.t
 ; RV32-NEXT:    ret
 ;
 ; RV64-LABEL: vmaxu_vx_nxv32i32_evl_nx16:
 ; RV64:       # %bb.0:
 ; RV64-NEXT:    csrr a1, vlenb
-; RV64-NEXT:    srli a2, a1, 2
-; RV64-NEXT:    vsetvli a3, zero, e8, mf2, ta, ma
-; RV64-NEXT:    vslidedown.vx v24, v0, a2
-; RV64-NEXT:    slli a1, a1, 1
-; RV64-NEXT:    vsetvli zero, a1, e32, m8, ta, ma
+; RV64-NEXT:    srli a1, a1, 2
+; RV64-NEXT:    vsetvli a2, zero, e8, mf2, ta, ma
+; RV64-NEXT:    vslidedown.vx v24, v0, a1
+; RV64-NEXT:    vsetvli a1, zero, e32, m8, ta, ma
 ; RV64-NEXT:    vmaxu.vx v8, v8, a0, v0.t
 ; RV64-NEXT:    vmv1r.v v0, v24
 ; RV64-NEXT:    vsetivli zero, 0, e32, m8, ta, ma
diff --git a/llvm/test/CodeGen/RISCV/rvv/vmin-vp.ll b/llvm/test/CodeGen/RISCV/rvv/vmin-vp.ll
index 794a21c7c6aba..98a288ed68b9a 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vmin-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vmin-vp.ll
@@ -1073,20 +1073,17 @@ define <vscale x 32 x i32> @vmin_vx_nxv32i32_evl_nx8(<vscale x 32 x i32> %va, i3
 define <vscale x 32 x i32> @vmin_vx_nxv32i32_evl_nx16(<vscale x 32 x i32> %va, i32 %b, <vscale x 32 x i1> %m) {
 ; RV32-LABEL: vmin_vx_nxv32i32_evl_nx16:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    csrr a1, vlenb
-; RV32-NEXT:    slli a1, a1, 1
-; RV32-NEXT:    vsetvli zero, a1, e32, m8, ta, ma
+; RV32-NEXT:    vsetvli a1, zero, e32, m8, ta, ma
 ; RV32-NEXT:    vmin.vx v8, v8, a0, v0.t
 ; RV32-NEXT:    ret
 ;
 ; RV64-LABEL: vmin_vx_nxv32i32_evl_nx16:
 ; RV64:       # %bb.0:
 ; RV64-NEXT:    csrr a1, vlenb
-; RV64-NEXT:    srli a2, a1, 2
-; RV64-NEXT:    vsetvli a3, zero, e8, mf2, ta, ma
-; RV64-NEXT:    vslidedown.vx v24, v0, a2
-; RV64-NEXT:    slli a1, a1, 1
-; RV64-NEXT:    vsetvli zero, a1, e32, m8, ta, ma
+; RV64-NEXT:    srli a1, a1, 2
+; RV64-NEXT:    vsetvli a2, zero, e8, mf2, ta, ma
+; RV64-NEXT:    vslidedown.vx v24, v0, a1
+; RV64-NEXT:    vsetvli a1, zero, e32, m8, ta, ma
 ; RV64-NEXT:    vmin.vx v8, v8, a0, v0.t
 ; RV64-NEXT:    vmv1r.v v0, v24
 ; RV64-NEXT:    vsetivli zero, 0, e32, m8, ta, ma
diff --git a/llvm/test/CodeGen/RISCV/rvv/vminu-vp.ll b/llvm/test/CodeGen/RISCV/rvv/vminu-vp.ll
index d54de281a7fd2..34b554b7ff514 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vminu-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vminu-vp.ll
@@ -1072,20 +1072,17 @@ define <vscale x 32 x i32> @vminu_vx_nxv32i32_evl_nx8(<vscale x 32 x i32> %va, i
 define <vscale x 32 x i32> @vminu_vx_nxv32i32_evl_nx16(<vscale x 32 x i32> %va, i32 %b, <vscale x 32 x i1> %m) {
 ; RV32-LABEL: vminu_vx_nxv32i32_evl_nx16:
 ; RV32:       # %bb.0:
-; RV32-NEXT:    csrr a1, vlenb
-; RV32-NEXT:    slli a1, a1, 1
-; RV32-NEXT:    vsetvli zero, a1, e32, m8, ta, ma
+; RV32-NEXT:    vsetvli a1, zero, e32, m8, ta, ma
 ; RV32-NEXT:    vminu.vx v8, v8, a0, v0.t
 ; RV32-NEXT:    ret
 ;
 ; RV64-LABEL: vminu_vx_nxv32i32_evl_nx16:
 ; RV64:       # %bb.0:
 ; RV64-NEXT:    csrr a1, vlenb
-; RV64-NEXT:    srli a2, a1, 2
-; RV64-NEXT:    vsetvli a3, zero, e8, mf2, ta, ma
-; RV64-NEXT:    vslidedown.vx v24, v0, a2
-; RV64-NEXT:    slli a1, a1, 1
-; RV64-NEXT:    vsetvli zero, a1, e32, m8, ta, ma
+; RV64-NEXT:    srli a1, a1, 2
+; RV64-NEXT:    vsetvli a2, zero, e8, mf2, ta, ma
+; RV64-NEXT:    vslidedown.vx v24, v0, a1
+; RV64-NEXT:    vsetvli a1, zero, e32, m8, ta, ma
 ; RV64-NEXT:    vminu.vx v8, v8, a0, v0.t
 ; RV64-NEXT:    vmv1r.v v0, v24
 ; RV64-NEXT:    vsetivli zero, 0, e32, m8, ta, ma

@wangpc-pp
Copy link
Contributor

I think we can rename RISCVFoldMasks to PostISelRVVOptimization or whatever.

@lukel97
Copy link
Contributor Author

lukel97 commented Jul 5, 2024

I think we can rename RISCVFoldMasks to PostISelRVVOptimization or whatever.

Maybe something like RISCVVectorPeephole?

@wangpc-pp
Copy link
Contributor

I think we can rename RISCVFoldMasks to PostISelRVVOptimization or whatever.

Maybe something like RISCVVectorPeephole?

Make sense to me.

@lukel97 lukel97 force-pushed the avl-vlenb-vlmax branch from 06cecab to 82e8c8b Compare July 6, 2024 06:08
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Jul 9, 2024
RISCVGatherScatterLowering is the main user of riscv_masked_strided_{load,store}, which we can remove if we replace them with their VP equivalents.

Submitting early as a draft to show the regressions in the test diff that llvm#97800 and llvm#97798 (or the CGP version) are needed to fix.
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Jul 9, 2024
This combine is a duplication of the transform in RISCVGatherScatterLowering but at the SelectionDAG level, so similarly to llvm#98111 we can replace the use of riscv_masked_strided_load with a VP strided load.

Unlike llvm#98111 we don't require llvm#97800 or llvm#97798 since it only operates on fixed vectors with a non-zero stride.
lukel97 added a commit that referenced this pull request Jul 9, 2024
This combine is a duplication of the transform in
RISCVGatherScatterLowering but at the SelectionDAG level, so similarly
to #98111 we can replace the use of riscv_masked_strided_load with a VP
strided load.

Unlike #98111 we don't require #97800 or #97798 since it only operates
on fixed vectors with a non-zero stride.
@lukel97
Copy link
Contributor Author

lukel97 commented Jul 11, 2024

Ping

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

lukel97 added 7 commits July 11, 2024 13:05
Given an AVL that's computed from vlenb, if it's equal to VLMAX then we can replace it with the VLMAX sentinel value.

The main motiviation is to be able to express an EVL of VLMAX in VP intrinsics whilst emitting vsetvli a0, zero, so that we can replace llvm.riscv.masked.strided.{load,store} with their VP counterparts.

This is done in RISCVFoldMasks instead of SelectionDAG since there are multiple places places where VP nodes are lowered that would have need to have been handled.

This also avoids doing it in RISCVInsertVSETVLI as it's much harder to lookup the value of the AVL, and in RISCVFoldMasks we can take advantage of DeadMachineInstrElim to remove any leftover PseudoReadVLENBs.
Also assert that 64 * LMUL / SEW > 0, so if the scale is shifted down to 0 then the equality check should always be false.
@lukel97 lukel97 merged commit c74ba57 into llvm:main Jul 11, 2024
5 of 6 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 11, 2024

LLVM Buildbot has detected a new failure on builder flang-aarch64-dylib running on linaro-flang-aarch64-dylib while building llvm at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/50/builds/963

Here is the relevant piece of the build log for the reference:

Step 5 (build-unified-tree) failure: build (failure)
...
495.502 [2330/1/4239] Building CXX object tools/clang/tools/amdgpu-arch/CMakeFiles/amdgpu-arch.dir/AMDGPUArchByHSA.cpp.o
495.674 [2329/1/4240] Building CXX object tools/clang/tools/amdgpu-arch/CMakeFiles/amdgpu-arch.dir/AMDGPUArch.cpp.o
496.298 [2328/1/4241] Building CXX object tools/clang/tools/amdgpu-arch/CMakeFiles/amdgpu-arch.dir/AMDGPUArchByHIP.cpp.o
496.460 [2327/1/4242] Building CXX object tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o
496.634 [2326/1/4243] Building FIRAttr.h.inc...
496.786 [2325/1/4244] Building FIRDialect.cpp.inc...
496.881 [2324/1/4245] Building FIRDialect.h.inc...
497.011 [2323/1/4246] Building CGOps.h.inc...
497.103 [2322/1/4247] Building FIREnumAttr.cpp.inc...
507.309 [2321/1/4248] Building CXX object tools/flang/lib/Optimizer/Dialect/CUF/CMakeFiles/obj.CUFDialect.dir/CUFOps.cpp.o
FAILED: tools/flang/lib/Optimizer/Dialect/CUF/CMakeFiles/obj.CUFDialect.dir/CUFOps.cpp.o 
/usr/local/bin/c++ -DFLANG_INCLUDE_TESTS=1 -DFLANG_LITTLE_ENDIAN=1 -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/flang/lib/Optimizer/Dialect/CUF -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/flang/lib/Optimizer/Dialect/CUF -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/flang/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/flang/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/build/include -I/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/llvm/include -isystem /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/llvm/../mlir/include -isystem /home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/mlir/include -isystem /home/tcwg-buildbot/worker/flang-aarch64-dylib/build/tools/clang/include -isystem /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/llvm/../clang/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wno-deprecated-copy -Wno-string-conversion -Wno-ctad-maybe-unsupported -Wno-unused-command-line-argument -Wstring-conversion           -Wcovered-switch-default -Wno-nested-anon-types -O3 -DNDEBUG  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=c++17 -MD -MT tools/flang/lib/Optimizer/Dialect/CUF/CMakeFiles/obj.CUFDialect.dir/CUFOps.cpp.o -MF tools/flang/lib/Optimizer/Dialect/CUF/CMakeFiles/obj.CUFDialect.dir/CUFOps.cpp.o.d -o tools/flang/lib/Optimizer/Dialect/CUF/CMakeFiles/obj.CUFDialect.dir/CUFOps.cpp.o -c /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/flang/lib/Optimizer/Dialect/CUF/CUFOps.cpp
In file included from /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/flang/lib/Optimizer/Dialect/CUF/CUFOps.cpp:13:
In file included from /home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/flang/include/flang/Optimizer/Dialect/CUF/CUFOps.h:14:
/home/tcwg-buildbot/worker/flang-aarch64-dylib/llvm-project/flang/include/flang/Optimizer/Dialect/FIRType.h:71:10: fatal error: 'flang/Optimizer/Dialect/FIROpsTypes.h.inc' file not found
   71 | #include "flang/Optimizer/Dialect/FIROpsTypes.h.inc"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
ninja: build stopped: subcommand failed.

aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
This combine is a duplication of the transform in
RISCVGatherScatterLowering but at the SelectionDAG level, so similarly
to llvm#98111 we can replace the use of riscv_masked_strided_load with a VP
strided load.

Unlike llvm#98111 we don't require llvm#97800 or llvm#97798 since it only operates
on fixed vectors with a non-zero stride.
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
Given an AVL that's computed from vlenb, if it's equal to VLMAX then we
can replace it with the VLMAX sentinel value.

The main motiviation is to be able to express an EVL of VLMAX in VP
intrinsics whilst emitting vsetvli a0, zero, so that we can replace
llvm.riscv.masked.strided.{load,store} with their VP counterparts.

This is done in RISCVVectorPeephole (previously RISCVFoldMasks, renamed
to account for the fact that it no longer just folds masks) instead of
SelectionDAG since there are multiple places places where VP nodes are
lowered that would have need to have been handled.

This also avoids doing it in RISCVInsertVSETVLI as it's much harder to
lookup the value of the AVL, and in RISCVVectorPeephole we can take
advantage of DeadMachineInstrElim to remove any leftover
PseudoReadVLENBs.
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Jul 16, 2024
RISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after llvm#98131 and llvm#98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path.

riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after llvm#97800
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Jul 24, 2024
RISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after llvm#98131 and llvm#98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path.

riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after llvm#97800
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Jul 24, 2024
RISCVGatherScatterLowering is the last user of riscv_masked_strided_{load,store} after llvm#98131 and llvm#98112, this patch changes it to emit the VP equivalent instead. This allows us to remove the masked_strided intrinsics so we have only have one lowering path.

riscv_masked_strided_{load,store} didn't have AVL operands and were always VLMAX, so this passes in the fixed or scalable element count to the EVL instead, which RISCVVectorPeephole should now convert to VLMAX after llvm#97800
lukel97 added a commit that referenced this pull request Jul 24, 2024
…98111)

RISCVGatherScatterLowering is the last user of
riscv_masked_strided_{load,store} after #98131 and #98112, this patch
changes it to emit the VP equivalent instead. This allows us to remove
the masked_strided intrinsics so we have only have one lowering path.

riscv_masked_strided_{load,store} didn't have AVL operands and were
always VLMAX, so this passes in the fixed or scalable element count to
the EVL instead, which RISCVVectorPeephole should now convert to VLMAX
after #97800.
For loads we also use a vp_select to get passthru (mask undisturbed)
behaviour
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
…98111)

Summary:
RISCVGatherScatterLowering is the last user of
riscv_masked_strided_{load,store} after #98131 and #98112, this patch
changes it to emit the VP equivalent instead. This allows us to remove
the masked_strided intrinsics so we have only have one lowering path.

riscv_masked_strided_{load,store} didn't have AVL operands and were
always VLMAX, so this passes in the fixed or scalable element count to
the EVL instead, which RISCVVectorPeephole should now convert to VLMAX
after #97800.
For loads we also use a vp_select to get passthru (mask undisturbed)
behaviour

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250735
Harini0924 pushed a commit to Harini0924/llvm-project that referenced this pull request Aug 1, 2024
…lvm#98111)

RISCVGatherScatterLowering is the last user of
riscv_masked_strided_{load,store} after llvm#98131 and llvm#98112, this patch
changes it to emit the VP equivalent instead. This allows us to remove
the masked_strided intrinsics so we have only have one lowering path.

riscv_masked_strided_{load,store} didn't have AVL operands and were
always VLMAX, so this passes in the fixed or scalable element count to
the EVL instead, which RISCVVectorPeephole should now convert to VLMAX
after llvm#97800.
For loads we also use a vp_select to get passthru (mask undisturbed)
behaviour
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants