[RISCV] enable VTYPE before whole RVVReg move #117866

BeMg · 2024-11-27T10:40:15Z

This patch inserts the VSETVLIX0 instruction before any RVVReg move that doesn't be affected by VSETVL instruction in the same basic block. The goal is to avoid the illegal instruction trap caused by disabling vtype.

It may occur the regression due to redundant VSETVL insertion.

llvmbot · 2024-11-27T10:40:54Z

@llvm/pr-subscribers-backend-risc-v

Author: Piyou Chen (BeMg)

Changes

Address #114518

This patch inserts the VSETVLIX0 instruction before any RVVReg move that doesn't be affected by VSETVL instruction in the same basic block. The goal is to avoid the illegal instruction trap caused by disabling vtype.

It may occur the regression due to redundant VSETVL insertion.

Patch is 1.10 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117866.diff

173 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp (+54-6)
(modified) llvm/test/CodeGen/RISCV/inline-asm-v-constraint.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/abs-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/bswap-vp.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll (+4)
(modified) llvm/test/CodeGen/RISCV/rvv/calling-conv.ll (+4)
(modified) llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll (+19)
(modified) llvm/test/CodeGen/RISCV/rvv/compressstore.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/constant-folding-crash.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/ctlz-vp.ll (+4)
(modified) llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/cttz-vp.ll (+5)
(modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+486)
(modified) llvm/test/CodeGen/RISCV/rvv/extract-subvector.ll (+19)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bitreverse-vp.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-calling-conv-fastcc.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-calling-conv.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ceil-vp.ll (+10)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctpop-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-floor-vp.ll (+10)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fmaximum-vp.ll (+17)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fminimum-vp.ll (+17)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-interleave.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fptrunc-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fshr-fshl-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-interleave.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+991-1066)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-load-int.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-nearbyint-vp.ll (+8)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-mask-vp.ll (+31)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-rint-vp.ll (+8)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-round-vp.ll (+10)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-roundeven-vp.ll (+10)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-roundtozero-vp.ll (+10)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-setcc-int-vp.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-concat.ll (+9)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-reverse.ll (+11)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-vslide1up.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-vpload.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-trunc-vp.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+28-40)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vadd-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vmax-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vmaxu-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vmin-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vminu-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpgather.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpload.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpmerge.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vsadd-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vsaddu-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vselect-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vssub-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vssubu-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/floor-vp.ll (+19)
(modified) llvm/test/CodeGen/RISCV/rvv/fmaximum-sdnode.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/fmaximum-vp.ll (+25)
(modified) llvm/test/CodeGen/RISCV/rvv/fminimum-sdnode.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/fminimum-vp.ll (+25)
(modified) llvm/test/CodeGen/RISCV/rvv/fold-scalar-load-crash.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/fshr-fshl-vp.ll (+5)
(modified) llvm/test/CodeGen/RISCV/rvv/inline-asm.ll (+7)
(modified) llvm/test/CodeGen/RISCV/rvv/insert-subvector.ll (+22)
(modified) llvm/test/CodeGen/RISCV/rvv/llrint-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/lrint-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/masked-tama.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/mgather-sdnode.ll (+16)
(modified) llvm/test/CodeGen/RISCV/rvv/mscatter-sdnode.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/named-vector-shuffle-reverse.ll (+13)
(modified) llvm/test/CodeGen/RISCV/rvv/nearbyint-vp.ll (+19)
(modified) llvm/test/CodeGen/RISCV/rvv/pr88576.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/rint-vp.ll (+19)
(modified) llvm/test/CodeGen/RISCV/rvv/round-vp.ll (+19)
(modified) llvm/test/CodeGen/RISCV/rvv/roundeven-vp.ll (+19)
(modified) llvm/test/CodeGen/RISCV/rvv/roundtozero-vp.ll (+19)
(modified) llvm/test/CodeGen/RISCV/rvv/rv32-spill-vector-csr.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/rv64-spill-vector-csr.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/rvv-args-by-mem.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/setcc-fp-vp.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/setcc-int-vp.ll (+6)
(modified) llvm/test/CodeGen/RISCV/rvv/sink-splat-operands.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/strided-vpload.ll (+4)
(modified) llvm/test/CodeGen/RISCV/rvv/strided-vpstore.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/undef-earlyclobber-chain.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vadd-vp.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/vcpop.ll (+7)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-fixed.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave.ll (+8)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave-fixed.ll (+4)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll (+15)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-reassociations.ll (+4)
(modified) llvm/test/CodeGen/RISCV/rvv/vector-splice.ll (+12)
(modified) llvm/test/CodeGen/RISCV/rvv/vfabs-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vfadd-vp.ll (+6)
(modified) llvm/test/CodeGen/RISCV/rvv/vfdiv-vp.ll (+6)
(modified) llvm/test/CodeGen/RISCV/rvv/vfirst.ll (+7)
(modified) llvm/test/CodeGen/RISCV/rvv/vfma-vp.ll (+14)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmadd-constrained-sdnode.ll (+4)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmadd-sdnode.ll (+6)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmax-vp.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmin-vp.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmul-vp.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/vfmuladd-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vfneg-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vfnmadd-constrained-sdnode.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vfnmsub-constrained-sdnode.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/vfpext-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vfptosi-vp.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/vfptoui-vp.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/vfptrunc-vp.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/vfsqrt-vp.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/vfsub-vp.ll (+6)
(modified) llvm/test/CodeGen/RISCV/rvv/vl-opt.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/vlsegff-rv32-dead.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vlsegff-rv32.ll (+165)
(modified) llvm/test/CodeGen/RISCV/rvv/vlsegff-rv64-dead.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vlsegff-rv64.ll (+165)
(modified) llvm/test/CodeGen/RISCV/rvv/vmax-vp.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/vmaxu-vp.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/vmfeq.ll (+12)
(modified) llvm/test/CodeGen/RISCV/rvv/vmfge.ll (+12)
(modified) llvm/test/CodeGen/RISCV/rvv/vmfgt.ll (+12)
(modified) llvm/test/CodeGen/RISCV/rvv/vmfle.ll (+12)
(modified) llvm/test/CodeGen/RISCV/rvv/vmflt.ll (+12)
(modified) llvm/test/CodeGen/RISCV/rvv/vmfne.ll (+12)
(modified) llvm/test/CodeGen/RISCV/rvv/vmin-vp.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/vminu-vp.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsbf.ll (+7)
(modified) llvm/test/CodeGen/RISCV/rvv/vmseq.ll (+36)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsge.ll (+37)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsgeu.ll (+36)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsgt.ll (+36)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsgtu.ll (+36)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsif.ll (+7)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsle.ll (+36)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsleu.ll (+36)
(modified) llvm/test/CodeGen/RISCV/rvv/vmslt.ll (+36)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsltu.ll (+36)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsne.ll (+36)
(modified) llvm/test/CodeGen/RISCV/rvv/vmsof.ll (+7)
(modified) llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vp-cttz-elts.ll (+4)
(modified) llvm/test/CodeGen/RISCV/rvv/vp-select.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vp-splice-mask-fixed-vectors.ll (+12)
(modified) llvm/test/CodeGen/RISCV/rvv/vp-splice-mask-vectors.ll (+21)
(modified) llvm/test/CodeGen/RISCV/rvv/vpgather-sdnode.ll (+10)
(modified) llvm/test/CodeGen/RISCV/rvv/vpload.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/vpmerge-sdnode.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/vpstore.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/vreductions-mask-vp.ll (+37)
(modified) llvm/test/CodeGen/RISCV/rvv/vrgatherei16-subreg-liveness.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/vsadd-vp.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/vsaddu-vp.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/vselect-bf16.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vselect-fp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vselect-int.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vselect-vp.ll (+8)
(modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-O0.ll (+6)
(modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vsext-vp.ll (+1)
(modified) llvm/test/CodeGen/RISCV/rvv/vsitofp-vp.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/vssub-vp.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/vssubu-vp.ll (+2)
(modified) llvm/test/CodeGen/RISCV/rvv/vtrunc-vp.ll (+4)
(modified) llvm/test/CodeGen/RISCV/rvv/vuitofp-vp.ll (+3)
(modified) llvm/test/CodeGen/RISCV/rvv/vzext-vp.ll (+1)

diff --git a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
index 052b4a61298223..5433916202ec28 100644
--- a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
@@ -922,6 +922,7 @@ class RISCVInsertVSETVLI : public MachineFunctionPass {
   VSETVLIInfo getInfoForVSETVLI(const MachineInstr &MI) const;
   VSETVLIInfo computeInfoForInstr(const MachineInstr &MI) const;
   void forwardVSETVLIAVL(VSETVLIInfo &Info) const;
+  void enableVTYPEBeforeMove(MachineBasicBlock &MBB);
 };
 
 } // end anonymous namespace
@@ -1768,6 +1769,56 @@ void RISCVInsertVSETVLI::insertReadVL(MachineBasicBlock &MBB) {
   }
 }
 
+static bool isRVVCopy(const MachineInstr &MI) {
+  static const TargetRegisterClass *RVVRegClasses[] = {
+      &RISCV::VRRegClass,     &RISCV::VRM2RegClass,   &RISCV::VRM4RegClass,
+      &RISCV::VRM8RegClass,   &RISCV::VRN2M1RegClass, &RISCV::VRN2M2RegClass,
+      &RISCV::VRN2M4RegClass, &RISCV::VRN3M1RegClass, &RISCV::VRN3M2RegClass,
+      &RISCV::VRN4M1RegClass, &RISCV::VRN4M2RegClass, &RISCV::VRN5M1RegClass,
+      &RISCV::VRN6M1RegClass, &RISCV::VRN7M1RegClass, &RISCV::VRN8M1RegClass};
+
+  if (MI.getOpcode() != TargetOpcode::COPY)
+    return false;
+
+  Register DstReg = MI.getOperand(0).getReg();
+  Register SrcReg = MI.getOperand(1).getReg();
+  for (const auto &RegClass : RVVRegClasses) {
+    if (RegClass->contains(DstReg, SrcReg)) {
+      return true;
+    }
+  }
+  return false;
+}
+
+void RISCVInsertVSETVLI::enableVTYPEBeforeMove(MachineBasicBlock &MBB) {
+  bool NeedVSETVL = true;
+
+  if (!BlockInfo[MBB.getNumber()].Pred.isUnknown() &&
+      BlockInfo[MBB.getNumber()].Pred.isValid())
+    NeedVSETVL = false;
+
+  for (auto &MI : MBB) {
+    if (isVectorConfigInstr(MI) || RISCVII::hasSEWOp(MI.getDesc().TSFlags))
+      NeedVSETVL = false;
+
+    if (MI.isCall() || MI.isInlineAsm())
+      NeedVSETVL = true;
+
+    if (NeedVSETVL && isRVVCopy(MI)) {
+      auto VSETVL0MI =
+          BuildMI(MBB, &MI, MI.getDebugLoc(), TII->get(RISCV::PseudoVSETVLIX0))
+              .addReg(RISCV::X0, RegState::Define | RegState::Dead)
+              .addReg(RISCV::X0, RegState::Kill)
+              .addImm(RISCVVType::encodeVTYPE(RISCVII::VLMUL::LMUL_1, 32, false,
+                                              false))
+              .addReg(RISCV::VL, RegState::Implicit);
+      if (LIS)
+        LIS->InsertMachineInstrInMaps(*VSETVL0MI);
+      NeedVSETVL = false;
+    }
+  }
+}
+
 bool RISCVInsertVSETVLI::runOnMachineFunction(MachineFunction &MF) {
   // Skip if the vector extension is not enabled.
   ST = &MF.getSubtarget<RISCVSubtarget>();
@@ -1798,12 +1849,6 @@ bool RISCVInsertVSETVLI::runOnMachineFunction(MachineFunction &MF) {
 
   }
 
-  // If we didn't find any instructions that need VSETVLI, we're done.
-  if (!HaveVectorOp) {
-    BlockInfo.clear();
-    return false;
-  }
-
   // Phase 2 - determine the exit VL/VTYPE from each block. We add all
   // blocks to the list here, but will also add any that need to be revisited
   // during Phase 2 processing.
@@ -1842,6 +1887,9 @@ bool RISCVInsertVSETVLI::runOnMachineFunction(MachineFunction &MF) {
   for (MachineBasicBlock &MBB : MF)
     insertReadVL(MBB);
 
+  for (MachineBasicBlock &MBB : MF)
+    enableVTYPEBeforeMove(MBB);
+
   BlockInfo.clear();
   return HaveVectorOp;
 }
diff --git a/llvm/test/CodeGen/RISCV/inline-asm-v-constraint.ll b/llvm/test/CodeGen/RISCV/inline-asm-v-constraint.ll
index c04e4fea7b2c29..6b566e2df0d798 100644
--- a/llvm/test/CodeGen/RISCV/inline-asm-v-constraint.ll
+++ b/llvm/test/CodeGen/RISCV/inline-asm-v-constraint.ll
@@ -45,6 +45,7 @@ define <vscale x 1 x i8> @constraint_vd(<vscale x 1 x i8> %0, <vscale x 1 x i8>
 define <vscale x 1 x i1> @constraint_vm(<vscale x 1 x i1> %0, <vscale x 1 x i1> %1) nounwind {
 ; RV32I-LABEL: constraint_vm:
 ; RV32I:       # %bb.0:
+; RV32I-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; RV32I-NEXT:    vmv1r.v v9, v0
 ; RV32I-NEXT:    vmv1r.v v0, v8
 ; RV32I-NEXT:    #APP
@@ -54,6 +55,7 @@ define <vscale x 1 x i1> @constraint_vm(<vscale x 1 x i1> %0, <vscale x 1 x i1>
 ;
 ; RV64I-LABEL: constraint_vm:
 ; RV64I:       # %bb.0:
+; RV64I-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; RV64I-NEXT:    vmv1r.v v9, v0
 ; RV64I-NEXT:    vmv1r.v v0, v8
 ; RV64I-NEXT:    #APP
diff --git a/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll b/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll
index 163d9145bc3623..685e29ef6d9179 100644
--- a/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll
@@ -567,6 +567,7 @@ define <vscale x 16 x i64> @vp_abs_nxv16i64(<vscale x 16 x i64> %va, <vscale x 1
 ; CHECK-NEXT:    slli a1, a1, 4
 ; CHECK-NEXT:    sub sp, sp, a1
 ; CHECK-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v24, v0
 ; CHECK-NEXT:    csrr a1, vlenb
 ; CHECK-NEXT:    slli a1, a1, 3
diff --git a/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll b/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll
index 66a1178cddb66c..3d0a5cc77ef679 100644
--- a/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll
@@ -3075,6 +3075,7 @@ define <vscale x 64 x i16> @vp_bitreverse_nxv64i16(<vscale x 64 x i16> %va, <vsc
 ; CHECK-NEXT:    slli a1, a1, 4
 ; CHECK-NEXT:    sub sp, sp, a1
 ; CHECK-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v24, v0
 ; CHECK-NEXT:    csrr a1, vlenb
 ; CHECK-NEXT:    slli a1, a1, 3
@@ -3158,6 +3159,7 @@ define <vscale x 64 x i16> @vp_bitreverse_nxv64i16(<vscale x 64 x i16> %va, <vsc
 ;
 ; CHECK-ZVBB-LABEL: vp_bitreverse_nxv64i16:
 ; CHECK-ZVBB:       # %bb.0:
+; CHECK-ZVBB-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-ZVBB-NEXT:    vmv1r.v v24, v0
 ; CHECK-ZVBB-NEXT:    csrr a1, vlenb
 ; CHECK-ZVBB-NEXT:    srli a2, a1, 1
diff --git a/llvm/test/CodeGen/RISCV/rvv/bswap-vp.ll b/llvm/test/CodeGen/RISCV/rvv/bswap-vp.ll
index 1c95ec8fafd4f1..19f30a7ce438aa 100644
--- a/llvm/test/CodeGen/RISCV/rvv/bswap-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/bswap-vp.ll
@@ -1584,6 +1584,7 @@ define <vscale x 64 x i16> @vp_bswap_nxv64i16(<vscale x 64 x i16> %va, <vscale x
 ; CHECK-NEXT:    slli a1, a1, 4
 ; CHECK-NEXT:    sub sp, sp, a1
 ; CHECK-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v24, v0
 ; CHECK-NEXT:    csrr a1, vlenb
 ; CHECK-NEXT:    slli a1, a1, 3
@@ -1631,6 +1632,7 @@ define <vscale x 64 x i16> @vp_bswap_nxv64i16(<vscale x 64 x i16> %va, <vscale x
 ;
 ; CHECK-ZVKB-LABEL: vp_bswap_nxv64i16:
 ; CHECK-ZVKB:       # %bb.0:
+; CHECK-ZVKB-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-ZVKB-NEXT:    vmv1r.v v24, v0
 ; CHECK-ZVKB-NEXT:    csrr a1, vlenb
 ; CHECK-ZVKB-NEXT:    srli a2, a1, 1
diff --git a/llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll b/llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll
index a4e5ab661c5285..e85a7af56cc497 100644
--- a/llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll
@@ -336,6 +336,7 @@ define fastcc <vscale x 32 x i32> @ret_nxv32i32_call_nxv32i32_nxv32i32_i32(<vsca
 ; RV32-NEXT:    add a1, a3, a1
 ; RV32-NEXT:    li a3, 2
 ; RV32-NEXT:    vs8r.v v16, (a1)
+; RV32-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; RV32-NEXT:    vmv8r.v v8, v0
 ; RV32-NEXT:    vmv8r.v v16, v24
 ; RV32-NEXT:    call ext2
@@ -374,6 +375,7 @@ define fastcc <vscale x 32 x i32> @ret_nxv32i32_call_nxv32i32_nxv32i32_i32(<vsca
 ; RV64-NEXT:    add a1, a3, a1
 ; RV64-NEXT:    li a3, 2
 ; RV64-NEXT:    vs8r.v v16, (a1)
+; RV64-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; RV64-NEXT:    vmv8r.v v8, v0
 ; RV64-NEXT:    vmv8r.v v16, v24
 ; RV64-NEXT:    call ext2
@@ -451,6 +453,7 @@ define fastcc <vscale x 32 x i32> @ret_nxv32i32_call_nxv32i32_nxv32i32_nxv32i32_
 ; RV32-NEXT:    add a1, sp, a1
 ; RV32-NEXT:    addi a1, a1, 128
 ; RV32-NEXT:    vl8r.v v8, (a1) # Unknown-size Folded Reload
+; RV32-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; RV32-NEXT:    vmv8r.v v16, v0
 ; RV32-NEXT:    call ext3
 ; RV32-NEXT:    addi sp, s0, -144
@@ -523,6 +526,7 @@ define fastcc <vscale x 32 x i32> @ret_nxv32i32_call_nxv32i32_nxv32i32_nxv32i32_
 ; RV64-NEXT:    add a1, sp, a1
 ; RV64-NEXT:    addi a1, a1, 128
 ; RV64-NEXT:    vl8r.v v8, (a1) # Unknown-size Folded Reload
+; RV64-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; RV64-NEXT:    vmv8r.v v16, v0
 ; RV64-NEXT:    call ext3
 ; RV64-NEXT:    addi sp, s0, -144
diff --git a/llvm/test/CodeGen/RISCV/rvv/calling-conv.ll b/llvm/test/CodeGen/RISCV/rvv/calling-conv.ll
index 9b27116fef7cae..05873a4e83aa29 100644
--- a/llvm/test/CodeGen/RISCV/rvv/calling-conv.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/calling-conv.ll
@@ -103,6 +103,7 @@ define target("riscv.vector.tuple", <vscale x 16 x i8>, 2) @caller_tuple_return(
 ; RV32-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32-NEXT:    .cfi_offset ra, -4
 ; RV32-NEXT:    call callee_tuple_return
+; RV32-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; RV32-NEXT:    vmv2r.v v6, v8
 ; RV32-NEXT:    vmv2r.v v8, v10
 ; RV32-NEXT:    vmv2r.v v10, v6
@@ -119,6 +120,7 @@ define target("riscv.vector.tuple", <vscale x 16 x i8>, 2) @caller_tuple_return(
 ; RV64-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
 ; RV64-NEXT:    .cfi_offset ra, -8
 ; RV64-NEXT:    call callee_tuple_return
+; RV64-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; RV64-NEXT:    vmv2r.v v6, v8
 ; RV64-NEXT:    vmv2r.v v8, v10
 ; RV64-NEXT:    vmv2r.v v10, v6
@@ -144,6 +146,7 @@ define void @caller_tuple_argument(target("riscv.vector.tuple", <vscale x 16 x i
 ; RV32-NEXT:    .cfi_def_cfa_offset 16
 ; RV32-NEXT:    sw ra, 12(sp) # 4-byte Folded Spill
 ; RV32-NEXT:    .cfi_offset ra, -4
+; RV32-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; RV32-NEXT:    vmv2r.v v6, v8
 ; RV32-NEXT:    vmv2r.v v8, v10
 ; RV32-NEXT:    vmv2r.v v10, v6
@@ -160,6 +163,7 @@ define void @caller_tuple_argument(target("riscv.vector.tuple", <vscale x 16 x i
 ; RV64-NEXT:    .cfi_def_cfa_offset 16
 ; RV64-NEXT:    sd ra, 8(sp) # 8-byte Folded Spill
 ; RV64-NEXT:    .cfi_offset ra, -8
+; RV64-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; RV64-NEXT:    vmv2r.v v6, v8
 ; RV64-NEXT:    vmv2r.v v8, v10
 ; RV64-NEXT:    vmv2r.v v10, v6
diff --git a/llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll b/llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll
index 7d0b0118a72725..b5cf302605f885 100644
--- a/llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll
@@ -117,6 +117,7 @@ declare <vscale x 4 x bfloat> @llvm.vp.ceil.nxv4bf16(<vscale x 4 x bfloat>, <vsc
 define <vscale x 4 x bfloat> @vp_ceil_vv_nxv4bf16(<vscale x 4 x bfloat> %va, <vscale x 4 x i1> %m, i32 zeroext %evl) {
 ; CHECK-LABEL: vp_ceil_vv_nxv4bf16:
 ; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v9, v0
 ; CHECK-NEXT:    vsetvli a1, zero, e16, m1, ta, ma
 ; CHECK-NEXT:    vfwcvtbf16.f.f.v v10, v8
@@ -169,6 +170,7 @@ declare <vscale x 8 x bfloat> @llvm.vp.ceil.nxv8bf16(<vscale x 8 x bfloat>, <vsc
 define <vscale x 8 x bfloat> @vp_ceil_vv_nxv8bf16(<vscale x 8 x bfloat> %va, <vscale x 8 x i1> %m, i32 zeroext %evl) {
 ; CHECK-LABEL: vp_ceil_vv_nxv8bf16:
 ; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v10, v0
 ; CHECK-NEXT:    vsetvli a1, zero, e16, m2, ta, ma
 ; CHECK-NEXT:    vfwcvtbf16.f.f.v v12, v8
@@ -221,6 +223,7 @@ declare <vscale x 16 x bfloat> @llvm.vp.ceil.nxv16bf16(<vscale x 16 x bfloat>, <
 define <vscale x 16 x bfloat> @vp_ceil_vv_nxv16bf16(<vscale x 16 x bfloat> %va, <vscale x 16 x i1> %m, i32 zeroext %evl) {
 ; CHECK-LABEL: vp_ceil_vv_nxv16bf16:
 ; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v12, v0
 ; CHECK-NEXT:    vsetvli a1, zero, e16, m4, ta, ma
 ; CHECK-NEXT:    vfwcvtbf16.f.f.v v16, v8
@@ -279,6 +282,7 @@ define <vscale x 32 x bfloat> @vp_ceil_vv_nxv32bf16(<vscale x 32 x bfloat> %va,
 ; CHECK-NEXT:    slli a1, a1, 3
 ; CHECK-NEXT:    sub sp, sp, a1
 ; CHECK-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 8 * vlenb
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v7, v0
 ; CHECK-NEXT:    csrr a2, vlenb
 ; CHECK-NEXT:    vsetvli a1, zero, e16, m4, ta, ma
@@ -582,6 +586,7 @@ define <vscale x 4 x half> @vp_ceil_vv_nxv4f16(<vscale x 4 x half> %va, <vscale
 ;
 ; ZVFHMIN-LABEL: vp_ceil_vv_nxv4f16:
 ; ZVFHMIN:       # %bb.0:
+; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; ZVFHMIN-NEXT:    vmv1r.v v9, v0
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m1, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v10, v8
@@ -649,6 +654,7 @@ declare <vscale x 8 x half> @llvm.vp.ceil.nxv8f16(<vscale x 8 x half>, <vscale x
 define <vscale x 8 x half> @vp_ceil_vv_nxv8f16(<vscale x 8 x half> %va, <vscale x 8 x i1> %m, i32 zeroext %evl) {
 ; ZVFH-LABEL: vp_ceil_vv_nxv8f16:
 ; ZVFH:       # %bb.0:
+; ZVFH-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; ZVFH-NEXT:    vmv1r.v v10, v0
 ; ZVFH-NEXT:    lui a1, %hi(.LCPI18_0)
 ; ZVFH-NEXT:    flh fa5, %lo(.LCPI18_0)(a1)
@@ -668,6 +674,7 @@ define <vscale x 8 x half> @vp_ceil_vv_nxv8f16(<vscale x 8 x half> %va, <vscale
 ;
 ; ZVFHMIN-LABEL: vp_ceil_vv_nxv8f16:
 ; ZVFHMIN:       # %bb.0:
+; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; ZVFHMIN-NEXT:    vmv1r.v v10, v0
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m2, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v12, v8
@@ -735,6 +742,7 @@ declare <vscale x 16 x half> @llvm.vp.ceil.nxv16f16(<vscale x 16 x half>, <vscal
 define <vscale x 16 x half> @vp_ceil_vv_nxv16f16(<vscale x 16 x half> %va, <vscale x 16 x i1> %m, i32 zeroext %evl) {
 ; ZVFH-LABEL: vp_ceil_vv_nxv16f16:
 ; ZVFH:       # %bb.0:
+; ZVFH-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; ZVFH-NEXT:    vmv1r.v v12, v0
 ; ZVFH-NEXT:    lui a1, %hi(.LCPI20_0)
 ; ZVFH-NEXT:    flh fa5, %lo(.LCPI20_0)(a1)
@@ -754,6 +762,7 @@ define <vscale x 16 x half> @vp_ceil_vv_nxv16f16(<vscale x 16 x half> %va, <vsca
 ;
 ; ZVFHMIN-LABEL: vp_ceil_vv_nxv16f16:
 ; ZVFHMIN:       # %bb.0:
+; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; ZVFHMIN-NEXT:    vmv1r.v v12, v0
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m4, ta, ma
 ; ZVFHMIN-NEXT:    vfwcvt.f.f.v v16, v8
@@ -821,6 +830,7 @@ declare <vscale x 32 x half> @llvm.vp.ceil.nxv32f16(<vscale x 32 x half>, <vscal
 define <vscale x 32 x half> @vp_ceil_vv_nxv32f16(<vscale x 32 x half> %va, <vscale x 32 x i1> %m, i32 zeroext %evl) {
 ; ZVFH-LABEL: vp_ceil_vv_nxv32f16:
 ; ZVFH:       # %bb.0:
+; ZVFH-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; ZVFH-NEXT:    vmv1r.v v16, v0
 ; ZVFH-NEXT:    lui a1, %hi(.LCPI22_0)
 ; ZVFH-NEXT:    flh fa5, %lo(.LCPI22_0)(a1)
@@ -846,6 +856,7 @@ define <vscale x 32 x half> @vp_ceil_vv_nxv32f16(<vscale x 32 x half> %va, <vsca
 ; ZVFHMIN-NEXT:    slli a1, a1, 3
 ; ZVFHMIN-NEXT:    sub sp, sp, a1
 ; ZVFHMIN-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 8 * vlenb
+; ZVFHMIN-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; ZVFHMIN-NEXT:    vmv1r.v v7, v0
 ; ZVFHMIN-NEXT:    csrr a2, vlenb
 ; ZVFHMIN-NEXT:    vsetvli a1, zero, e16, m4, ta, ma
@@ -1068,6 +1079,7 @@ declare <vscale x 4 x float> @llvm.vp.ceil.nxv4f32(<vscale x 4 x float>, <vscale
 define <vscale x 4 x float> @vp_ceil_vv_nxv4f32(<vscale x 4 x float> %va, <vscale x 4 x i1> %m, i32 zeroext %evl) {
 ; CHECK-LABEL: vp_ceil_vv_nxv4f32:
 ; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v10, v0
 ; CHECK-NEXT:    vsetvli zero, a0, e32, m2, ta, ma
 ; CHECK-NEXT:    vfabs.v v12, v8, v0.t
@@ -1112,6 +1124,7 @@ declare <vscale x 8 x float> @llvm.vp.ceil.nxv8f32(<vscale x 8 x float>, <vscale
 define <vscale x 8 x float> @vp_ceil_vv_nxv8f32(<vscale x 8 x float> %va, <vscale x 8 x i1> %m, i32 zeroext %evl) {
 ; CHECK-LABEL: vp_ceil_vv_nxv8f32:
 ; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v12, v0
 ; CHECK-NEXT:    vsetvli zero, a0, e32, m4, ta, ma
 ; CHECK-NEXT:    vfabs.v v16, v8, v0.t
@@ -1156,6 +1169,7 @@ declare <vscale x 16 x float> @llvm.vp.ceil.nxv16f32(<vscale x 16 x float>, <vsc
 define <vscale x 16 x float> @vp_ceil_vv_nxv16f32(<vscale x 16 x float> %va, <vscale x 16 x i1> %m, i32 zeroext %evl) {
 ; CHECK-LABEL: vp_ceil_vv_nxv16f32:
 ; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v16, v0
 ; CHECK-NEXT:    vsetvli zero, a0, e32, m8, ta, ma
 ; CHECK-NEXT:    vfabs.v v24, v8, v0.t
@@ -1242,6 +1256,7 @@ declare <vscale x 2 x double> @llvm.vp.ceil.nxv2f64(<vscale x 2 x double>, <vsca
 define <vscale x 2 x double> @vp_ceil_vv_nxv2f64(<vscale x 2 x double> %va, <vscale x 2 x i1> %m, i32 zeroext %evl) {
 ; CHECK-LABEL: vp_ceil_vv_nxv2f64:
 ; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v10, v0
 ; CHECK-NEXT:    lui a1, %hi(.LCPI36_0)
 ; CHECK-NEXT:    fld fa5, %lo(.LCPI36_0)(a1)
@@ -1286,6 +1301,7 @@ declare <vscale x 4 x double> @llvm.vp.ceil.nxv4f64(<vscale x 4 x double>, <vsca
 define <vscale x 4 x double> @vp_ceil_vv_nxv4f64(<vscale x 4 x double> %va, <vscale x 4 x i1> %m, i32 zeroext %evl) {
 ; CHECK-LABEL: vp_ceil_vv_nxv4f64:
 ; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v12, v0
 ; CHECK-NEXT:    lui a1, %hi(.LCPI38_0)
 ; CHECK-NEXT:    fld fa5, %lo(.LCPI38_0)(a1)
@@ -1330,6 +1346,7 @@ declare <vscale x 7 x double> @llvm.vp.ceil.nxv7f64(<vscale x 7 x double>, <vsca
 define <vscale x 7 x double> @vp_ceil_vv_nxv7f64(<vscale x 7 x double> %va, <vscale x 7 x i1> %m, i32 zeroext %evl) {
 ; CHECK-LABEL: vp_ceil_vv_nxv7f64:
 ; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v16, v0
 ; CHECK-NEXT:    lui a1, %hi(.LCPI40_0)
 ; CHECK-NEXT:    fld fa5, %lo(.LCPI40_0)(a1)
@@ -1374,6 +1391,7 @@ declare <vscale x 8 x double> @llvm.vp.ceil.nxv8f64(<vscale x 8 x double>, <vsca
 define <vscale x 8 x double> @vp_ceil_vv_nxv8f64(<vscale x 8 x double> %va, <vscale x 8 x i1> %m, i32 zeroext %evl) {
 ; CHECK-LABEL: vp_ceil_vv_nxv8f64:
 ; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v16, v0
 ; CHECK-NEXT:    lui a1, %hi(.LCPI42_0)
 ; CHECK-NEXT:    fld fa5, %lo(.LCPI42_0)(a1)
@@ -1425,6 +1443,7 @@ define <vscale x 16 x double> @vp_ceil_vv_nxv16f64(<vscale x 16 x double> %va, <
 ; CHECK-NEXT:    slli a1, a1, 3
 ; CHECK-NEXT:    sub sp, sp, a1
 ; CHECK-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 8 * vlenb
+; CHECK-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; CHECK-NEXT:    vmv1r.v v7, v0
 ; CHECK-NEXT:    csrr a1, vlenb
 ; CHECK-NEXT:    lui a2, %hi(.LCPI44_0)
diff --git a/llvm/test/CodeGen/RISCV/rvv/compressstore.ll b/llvm/test/CodeGen/RISCV/rvv/compressstore.ll
index bfb2d0a3accc44..d1679b6e2d7fdf 100644
--- a/llvm/test/CodeGen/RISCV/rvv/compressstore.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/compressstore.ll
@@ -197,6 +197,7 @@ entry:
 define void @test_compresstore_v256i8(ptr %p, <256 x i1> %mask, <256 x i8> %data) {
 ; RV64-LABEL: test_compresstore_v256i8:
 ; RV64:       # %bb.0: # %entry
+; RV64-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; RV64-NEXT:    vmv1r.v v7, v8
 ; RV64-NEXT:    li a2, 128
 ; RV64-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
@@ -230,6 +231,7 @@ define void @test_compresstore_v256i8(ptr %p, <256 x i1> %mask, <256 x i8> %data
 ; RV32-NEXT:    slli a2, a2, 3
 ; RV32-NEXT:    sub sp, sp, a2
 ; RV32-NEXT:    .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 8 * vlenb
+; RV32-NEXT:    vsetvli zero, zero, e32, m1, tu, mu
 ; RV32-NEXT:    vmv8r.v v24, v16
 ; RV32-NEXT:    li a2, 128
 ; RV32-NEXT:    vsetivli zero, 1, e64, m1, ta, ma
diff --git a/llvm/test/CodeGen/RISCV/rvv/constant-folding-crash....
[truncated]

wangpc-pp

(小聲bb: why can't this be the default hardware behavior...)

wangpc-pp · 2024-11-27T13:31:27Z

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

+  Register DstReg = MI.getOperand(0).getReg();
+  Register SrcReg = MI.getOperand(1).getReg();
+  for (const auto &RegClass : RVVRegClasses) {
+    if (RegClass->contains(DstReg, SrcReg)) {


You can use isRVVRegClass here?

IIRC isRVVRegClass is for virtual registers, and the insertvsetvl pass runs after RVV register allocation.

Oh thanks for reminding! Would getMinimalPhysRegClass and check the TSFlags better than a loop?

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

preames · 2024-11-27T15:51:33Z

Do you have any performance data available with this patch? My primary concern with the entire "treat this as a compiler problem" is that we may have some lurking performance problem exposed by the mitigation. I would strongly like to see measurements of this on representative hardware - in particular, hardware like the BP3 which doesn't have the faulting behavior.

For the code itself, this is working in terms of COPY instructions. I'm not sure this is (or isn't) safe. Are we absolutely sure we don't have any code anywhere that moves a COPY between vector register allocation and lowering? I have not done this due diligence, but that seems like a strong claim to make. To be clear, the code structure bits are addressable. I suggest using this patch as a test vehicle to check the performance impact before bothering to get the patch "just right".

topperc · 2024-11-27T18:15:21Z

Do you have any performance data available with this patch? My primary concern with the entire "treat this as a compiler problem" is that we may have some lurking performance problem exposed by the mitigation. I would strongly like to see measurements of this on representative hardware - in particular, hardware like the BP3 which doesn't have the faulting behavior.

For most workloads this probably doesn't show up much. The call would have also invalidated the vector registers so there's nothing to copy unless we also just did a whole register load and immediately copied it's result to another register. I think the case that's more likely to show performance issues is when there are vector arguments/returns. That's going to require intrinsic code.

So what should we measure performance on?

preames · 2024-11-27T19:06:35Z

So what should we measure performance on?

I mean, anything is better than nothing? The claim that this doesn't effect most workloads seems reasonable, but I'd like to see performance data that actually shows that. Once we have an established baseline, we can refine to workloads of concern.

topperc · 2024-11-27T21:34:03Z

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

@@ -922,6 +922,7 @@ class RISCVInsertVSETVLI : public MachineFunctionPass {
  VSETVLIInfo getInfoForVSETVLI(const MachineInstr &MI) const;
  VSETVLIInfo computeInfoForInstr(const MachineInstr &MI) const;
  void forwardVSETVLIAVL(VSETVLIInfo &Info) const;
+  void enableVTYPEBeforeMove(MachineBasicBlock &MBB);


"enable" is an odd word choice here.

topperc · 2024-11-27T21:38:28Z

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

+    if (NeedVSETVL && isRVVCopy(MI)) {
+      auto VSETVL0MI =
+          BuildMI(MBB, &MI, MI.getDebugLoc(), TII->get(RISCV::PseudoVSETVLIX0))
+              .addReg(RISCV::X0, RegState::Define | RegState::Dead)


You can't use x0, x0 when vtype is vill. The vtype is invalid and thus vlmax is invalid. You can only use x0, x0 when vlmax isn't being reduced since reducing vlmax may need to change VL.

If vlmax is invalid due to vill, I don't think we can know that VL won't need to be reduced for the new vlmax.

Does RISCV::PseudoVSETIVLI that emit the vsetivli zero, 0, e8, m1, tu, mu work for whole register move's vtype?

That should work.

BeMg · 2024-12-02T05:29:40Z

For the code itself, this is working in terms of COPY instructions. I'm not sure this is (or isn't) safe. Are we absolutely sure we don't have any code anywhere that moves a COPY between vector register allocation and lowering? I have not done this due diligence, but that seems like a strong claim to make. To be clear, the code structure bits are addressable. I suggest using this patch as a test vehicle to check the performance impact before bothering to get the patch "just right".

I created another patch (#118252) that performs the same task during the COPY Pseudo expanding hook to handle any COPY pseudo generated after insertvsetvl. And current patch could be an optimization to reduce redundant VSETVL by inserting it with analysis.

This is an alternative to llvm#117866 that works by demanding a valid vtype instead using a separate pass. The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block. To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible. Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else. This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.

BeMg · 2024-12-03T06:34:33Z

#118283 is better when we try to resolve it in insert vsetvl pass.

This is an alternative to llvm#117866 that works by demanding a valid vtype instead using a separate pass. The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block. To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible. Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else. This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.

…on (#118283) This is an alternative to #117866 that works by demanding a valid vtype instead of using a separate pass. The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block. To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible. Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else. This is one part of a fix for #114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.

…on (llvm#118283) This is an alternative to llvm#117866 that works by demanding a valid vtype instead of using a separate pass. The main advantage of this is that it allows coalesceVSETVLIs to just reuse an existing vsetvli later in the block. To do this we need to first transfer the vsetvli info to some arbitrary valid state in transferBefore when we encounter a vector copy. Then we add a new vill demanded field that will happily accept any other known vtype, which allows us to coalesce these where possible. Note we also need to check for vector copies in computeVLVTYPEChanges, otherwise the pass will completely skip over functions that only have vector copies and nothing else. This is one part of a fix for llvm#114518. We still need to check if there's other cases where vector copies/whole register moves that are inserted after vsetvli insertion.

BeMg added 2 commits November 27, 2024 02:08

[RISCV] Add VSETVLI to whole RVVReg Copy

b0d5b06

Retrieve predecessors VSETVLInfo

aa51793

BeMg requested review from preames, lukel97, kito-cheng and topperc November 27, 2024 10:40

llvmbot added the backend:RISC-V label Nov 27, 2024

wangpc-pp reviewed Nov 27, 2024

View reviewed changes

topperc reviewed Nov 27, 2024

View reviewed changes

BeMg mentioned this pull request Dec 2, 2024

[RISCV] Ensure the valid vtype during copyPhysReg #118252

Closed

BeMg added 5 commits December 1, 2024 21:33

enableVTYPEBeforeMove -> insertVSETVLIBeforeCopy

fa361d5

Use vsetivli x0, 0, *

88facd4

insertVSETVLIBeforeCopy -> insertVSETIVLIBeforeCopy

7530065

Update testcase

9740196

EEW=32 -> EEW=8

9e4eab8

lukel97 mentioned this pull request Dec 2, 2024

[RISCV] Clear vill for whole vector register moves in vsetvli insertion #118283

Merged

BeMg closed this Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] enable VTYPE before whole RVVReg move #117866

[RISCV] enable VTYPE before whole RVVReg move #117866

BeMg commented Nov 27, 2024

llvmbot commented Nov 27, 2024

wangpc-pp left a comment

wangpc-pp Nov 27, 2024

BeMg Dec 2, 2024

wangpc-pp Dec 2, 2024 •

edited

Loading

preames commented Nov 27, 2024

topperc commented Nov 27, 2024

preames commented Nov 27, 2024

topperc Nov 27, 2024

BeMg Dec 2, 2024

topperc Nov 27, 2024

BeMg Dec 2, 2024

topperc Dec 2, 2024

BeMg commented Dec 2, 2024

BeMg commented Dec 3, 2024

[RISCV] enable VTYPE before whole RVVReg move #117866

[RISCV] enable VTYPE before whole RVVReg move #117866

Conversation

BeMg commented Nov 27, 2024

llvmbot commented Nov 27, 2024

wangpc-pp left a comment

Choose a reason for hiding this comment

wangpc-pp Nov 27, 2024

Choose a reason for hiding this comment

BeMg Dec 2, 2024

Choose a reason for hiding this comment

wangpc-pp Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

preames commented Nov 27, 2024

topperc commented Nov 27, 2024

preames commented Nov 27, 2024

topperc Nov 27, 2024

Choose a reason for hiding this comment

BeMg Dec 2, 2024

Choose a reason for hiding this comment

topperc Nov 27, 2024

Choose a reason for hiding this comment

BeMg Dec 2, 2024

Choose a reason for hiding this comment

topperc Dec 2, 2024

Choose a reason for hiding this comment

BeMg commented Dec 2, 2024

BeMg commented Dec 3, 2024

wangpc-pp Dec 2, 2024 •

edited

Loading