Skip to content

Conversation

SamTebbs33
Copy link
Collaborator

The VPExpressionRecipe class uses a set to store its bundled recipes. If repeated recipes are bundled then the duplicates will be lost, causing the following recipes to not be at the expected place in the set.

When printing a reduce.add(mul(ext, ext)) bundle, for example, if the extends are the same then the 3rd element of the set will be the reduction, rather than the expected mul, causing a cast error. With this change, the recipes are at the expected index in the set.

Fixes #156464

@llvmbot
Copy link
Member

llvmbot commented Sep 4, 2025

@llvm/pr-subscribers-vectorizers

Author: Sam Tebbs (SamTebbs33)

Changes

The VPExpressionRecipe class uses a set to store its bundled recipes. If repeated recipes are bundled then the duplicates will be lost, causing the following recipes to not be at the expected place in the set.

When printing a reduce.add(mul(ext, ext)) bundle, for example, if the extends are the same then the 3rd element of the set will be the reduction, rather than the expected mul, causing a cast error. With this change, the recipes are at the expected index in the set.

Fixes #156464


Full diff: https://github.com/llvm/llvm-project/pull/156976.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+6-2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+23-6)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll (+47)
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index bd1bee3a88887..f71cc09e3f990 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -29,6 +29,7 @@
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/SmallBitVector.h"
 #include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/ADT/ilist.h"
@@ -3003,8 +3004,11 @@ class VPExpressionRecipe : public VPSingleDefRecipe {
                            {Ext0, Ext1, Mul, Red}) {}
 
   ~VPExpressionRecipe() override {
-    for (auto *R : reverse(ExpressionRecipes))
-      delete R;
+    SmallSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen;
+    for (auto *R : reverse(ExpressionRecipes)) {
+      if (ExpressionRecipesSeen.insert(R).second)
+        delete R;
+    }
     for (VPValue *T : LiveInPlaceholders)
       delete T;
   }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 5f3503d0ce57a..1c743703a1a85 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2740,9 +2740,8 @@ VPExpressionRecipe::VPExpressionRecipe(
     ExpressionTypes ExpressionType,
     ArrayRef<VPSingleDefRecipe *> ExpressionRecipes)
     : VPSingleDefRecipe(VPDef::VPExpressionSC, {}, {}),
-      ExpressionRecipes(SetVector<VPSingleDefRecipe *>(
-                            ExpressionRecipes.begin(), ExpressionRecipes.end())
-                            .takeVector()),
+      ExpressionRecipes(SmallVector<VPSingleDefRecipe *>(
+          ExpressionRecipes.begin(), ExpressionRecipes.end())),
       ExpressionType(ExpressionType) {
   assert(!ExpressionRecipes.empty() && "Nothing to combine?");
   assert(
@@ -2776,25 +2775,43 @@ VPExpressionRecipe::VPExpressionRecipe(
       R->removeFromParent();
   }
 
+  // Keep track of how many instances of each recipe occur in the recipe list
+  SmallMapVector<VPSingleDefRecipe *, unsigned, 4> ExpressionRecipeCounts;
+  for (auto *R : ExpressionRecipes) {
+    auto *F = ExpressionRecipeCounts.find(R);
+    if (F == ExpressionRecipeCounts.end())
+      ExpressionRecipeCounts.insert(std::make_pair(R, 1));
+    else
+      F->second++;
+  }
+
   // Internalize all external operands to the expression recipes. To do so,
   // create new temporary VPValues for all operands defined by a recipe outside
   // the expression. The original operands are added as operands of the
   // VPExpressionRecipe itself.
   for (auto *R : ExpressionRecipes) {
+    auto *F = ExpressionRecipeCounts.find(R);
+    F->second--;
     for (const auto &[Idx, Op] : enumerate(R->operands())) {
       auto *Def = Op->getDefiningRecipe();
       if (Def && ExpressionRecipesAsSetOfUsers.contains(Def))
         continue;
       addOperand(Op);
-      LiveInPlaceholders.push_back(new VPValue());
-      R->setOperand(Idx, LiveInPlaceholders.back());
+      auto *Tmp = new VPValue();
+      Tmp->setUnderlyingValue(Op->getUnderlyingValue());
+      LiveInPlaceholders.push_back(Tmp);
+      // Only modify this recipe's operands if it's the last time it occurs in
+      // the recipe list
+      if (F->second == 0)
+        R->setOperand(Idx, Tmp);
     }
   }
 }
 
 void VPExpressionRecipe::decompose() {
   for (auto *R : ExpressionRecipes)
-    R->insertBefore(this);
+    if (!R->getParent())
+      R->insertBefore(this);
 
   for (const auto &[Idx, Op] : enumerate(operands()))
     LiveInPlaceholders[Idx]->replaceAllUsesWith(Op);
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll b/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
index 2ffb8203d49dd..29167324c7a09 100644
--- a/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
+++ b/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
@@ -651,3 +651,50 @@ exit:
   %r.0.lcssa = phi i64 [ %rdx.next, %loop ]
   ret i64 %r.0.lcssa
 }
+
+define i64 @print_mulacc_duplicate_extends(ptr nocapture readonly %x, ptr nocapture readonly %y, i32 %n) {
+; CHECK-LABEL: 'print_mulacc_duplicate_extends'
+; CHECK:      VPlan 'Initial VPlan for VF={4},UF>=1' {
+; CHECK-NEXT: Live-in vp<[[VF:%.+]]> = VF
+; CHECK-NEXT: Live-in vp<[[VFxUF:%.+]]> = VF * UF
+; CHECK-NEXT: Live-in vp<[[VTC:%.+]]> = vector-trip-count
+; CHECK-NEXT: Live-in ir<%n> = original trip-count
+; CHECK-EMPTY:
+; CHECK:      vector.ph:
+; CHECK-NEXT:   EMIT vp<[[RDX_START:%.+]]> = reduction-start-vector ir<0>, ir<0>, ir<1>
+; CHECK-NEXT: Successor(s): vector loop
+; CHECK-EMPTY:
+; CHECK-NEXT: <x1> vector loop: {
+; CHECK-NEXT:   vector.body:
+; CHECK-NEXT:     EMIT vp<[[IV:%.+]]> = CANONICAL-INDUCTION ir<0>, vp<[[IV_NEXT:%.+]]>
+; CHECK-NEXT:     WIDEN-REDUCTION-PHI ir<[[RDX:%.+]]> = phi vp<[[RDX_START]]>, vp<[[RDX_NEXT:%.+]]>
+; CHECK-NEXT:     vp<[[STEPS:%.+]]> = SCALAR-STEPS vp<[[IV]]>, ir<1>
+; CHECK-NEXT:     CLONE ir<[[ARRAYIDX0:%.+]]> = getelementptr inbounds ir<%x>, vp<[[STEPS]]>
+; CHECK-NEXT:     vp<[[ADDR0:%.+]]> = vector-pointer ir<[[ARRAYIDX0]]>
+; CHECK-NEXT:     WIDEN ir<[[LOAD0:%.+]]> = load vp<[[ADDR0]]>
+; CHECK-NEXT:     EXPRESSION vp<[[RDX_NEXT:%.+]]> = ir<[[RDX]]> + reduce.sub (mul nsw (ir<[[LOAD0]]> sext to i64), (ir<[[LOAD0]]> sext to i64))
+; CHECK-NEXT:     EMIT vp<[[IV_NEXT]]> = add nuw vp<[[IV]]>, vp<[[VFxUF]]>
+; CHECK-NEXT:     EMIT branch-on-count vp<[[IV_NEXT]]>, vp<[[VTC]]>
+; CHECK-NEXT:   No successors
+; CHECK-NEXT: }
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [ %iv.next, %loop ], [ 0, %entry ]
+  %rdx = phi i64 [ %rdx.next, %loop ], [ 0, %entry ]
+  %arrayidx = getelementptr inbounds i16, ptr %x, i32 %iv
+  %load0 = load i16, ptr %arrayidx, align 4
+  %conv0 = sext i16 %load0 to i32
+  %mul = mul nsw i32 %conv0, %conv0
+  %conv = sext i32 %mul to i64
+  %rdx.next = sub nsw i64 %rdx, %conv
+  %iv.next = add nuw nsw i32 %iv, 1
+  %exitcond = icmp eq i32 %iv.next, %n
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  %r.0.lcssa = phi i64 [ %rdx.next, %loop ]
+  ret i64 %r.0.lcssa
+}

@llvmbot
Copy link
Member

llvmbot commented Sep 4, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Sam Tebbs (SamTebbs33)

Changes

The VPExpressionRecipe class uses a set to store its bundled recipes. If repeated recipes are bundled then the duplicates will be lost, causing the following recipes to not be at the expected place in the set.

When printing a reduce.add(mul(ext, ext)) bundle, for example, if the extends are the same then the 3rd element of the set will be the reduction, rather than the expected mul, causing a cast error. With this change, the recipes are at the expected index in the set.

Fixes #156464


Full diff: https://github.com/llvm/llvm-project/pull/156976.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+6-2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+23-6)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll (+47)
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index bd1bee3a88887..f71cc09e3f990 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -29,6 +29,7 @@
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/SmallBitVector.h"
 #include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/ADT/ilist.h"
@@ -3003,8 +3004,11 @@ class VPExpressionRecipe : public VPSingleDefRecipe {
                            {Ext0, Ext1, Mul, Red}) {}
 
   ~VPExpressionRecipe() override {
-    for (auto *R : reverse(ExpressionRecipes))
-      delete R;
+    SmallSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen;
+    for (auto *R : reverse(ExpressionRecipes)) {
+      if (ExpressionRecipesSeen.insert(R).second)
+        delete R;
+    }
     for (VPValue *T : LiveInPlaceholders)
       delete T;
   }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 5f3503d0ce57a..1c743703a1a85 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2740,9 +2740,8 @@ VPExpressionRecipe::VPExpressionRecipe(
     ExpressionTypes ExpressionType,
     ArrayRef<VPSingleDefRecipe *> ExpressionRecipes)
     : VPSingleDefRecipe(VPDef::VPExpressionSC, {}, {}),
-      ExpressionRecipes(SetVector<VPSingleDefRecipe *>(
-                            ExpressionRecipes.begin(), ExpressionRecipes.end())
-                            .takeVector()),
+      ExpressionRecipes(SmallVector<VPSingleDefRecipe *>(
+          ExpressionRecipes.begin(), ExpressionRecipes.end())),
       ExpressionType(ExpressionType) {
   assert(!ExpressionRecipes.empty() && "Nothing to combine?");
   assert(
@@ -2776,25 +2775,43 @@ VPExpressionRecipe::VPExpressionRecipe(
       R->removeFromParent();
   }
 
+  // Keep track of how many instances of each recipe occur in the recipe list
+  SmallMapVector<VPSingleDefRecipe *, unsigned, 4> ExpressionRecipeCounts;
+  for (auto *R : ExpressionRecipes) {
+    auto *F = ExpressionRecipeCounts.find(R);
+    if (F == ExpressionRecipeCounts.end())
+      ExpressionRecipeCounts.insert(std::make_pair(R, 1));
+    else
+      F->second++;
+  }
+
   // Internalize all external operands to the expression recipes. To do so,
   // create new temporary VPValues for all operands defined by a recipe outside
   // the expression. The original operands are added as operands of the
   // VPExpressionRecipe itself.
   for (auto *R : ExpressionRecipes) {
+    auto *F = ExpressionRecipeCounts.find(R);
+    F->second--;
     for (const auto &[Idx, Op] : enumerate(R->operands())) {
       auto *Def = Op->getDefiningRecipe();
       if (Def && ExpressionRecipesAsSetOfUsers.contains(Def))
         continue;
       addOperand(Op);
-      LiveInPlaceholders.push_back(new VPValue());
-      R->setOperand(Idx, LiveInPlaceholders.back());
+      auto *Tmp = new VPValue();
+      Tmp->setUnderlyingValue(Op->getUnderlyingValue());
+      LiveInPlaceholders.push_back(Tmp);
+      // Only modify this recipe's operands if it's the last time it occurs in
+      // the recipe list
+      if (F->second == 0)
+        R->setOperand(Idx, Tmp);
     }
   }
 }
 
 void VPExpressionRecipe::decompose() {
   for (auto *R : ExpressionRecipes)
-    R->insertBefore(this);
+    if (!R->getParent())
+      R->insertBefore(this);
 
   for (const auto &[Idx, Op] : enumerate(operands()))
     LiveInPlaceholders[Idx]->replaceAllUsesWith(Op);
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll b/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
index 2ffb8203d49dd..29167324c7a09 100644
--- a/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
+++ b/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
@@ -651,3 +651,50 @@ exit:
   %r.0.lcssa = phi i64 [ %rdx.next, %loop ]
   ret i64 %r.0.lcssa
 }
+
+define i64 @print_mulacc_duplicate_extends(ptr nocapture readonly %x, ptr nocapture readonly %y, i32 %n) {
+; CHECK-LABEL: 'print_mulacc_duplicate_extends'
+; CHECK:      VPlan 'Initial VPlan for VF={4},UF>=1' {
+; CHECK-NEXT: Live-in vp<[[VF:%.+]]> = VF
+; CHECK-NEXT: Live-in vp<[[VFxUF:%.+]]> = VF * UF
+; CHECK-NEXT: Live-in vp<[[VTC:%.+]]> = vector-trip-count
+; CHECK-NEXT: Live-in ir<%n> = original trip-count
+; CHECK-EMPTY:
+; CHECK:      vector.ph:
+; CHECK-NEXT:   EMIT vp<[[RDX_START:%.+]]> = reduction-start-vector ir<0>, ir<0>, ir<1>
+; CHECK-NEXT: Successor(s): vector loop
+; CHECK-EMPTY:
+; CHECK-NEXT: <x1> vector loop: {
+; CHECK-NEXT:   vector.body:
+; CHECK-NEXT:     EMIT vp<[[IV:%.+]]> = CANONICAL-INDUCTION ir<0>, vp<[[IV_NEXT:%.+]]>
+; CHECK-NEXT:     WIDEN-REDUCTION-PHI ir<[[RDX:%.+]]> = phi vp<[[RDX_START]]>, vp<[[RDX_NEXT:%.+]]>
+; CHECK-NEXT:     vp<[[STEPS:%.+]]> = SCALAR-STEPS vp<[[IV]]>, ir<1>
+; CHECK-NEXT:     CLONE ir<[[ARRAYIDX0:%.+]]> = getelementptr inbounds ir<%x>, vp<[[STEPS]]>
+; CHECK-NEXT:     vp<[[ADDR0:%.+]]> = vector-pointer ir<[[ARRAYIDX0]]>
+; CHECK-NEXT:     WIDEN ir<[[LOAD0:%.+]]> = load vp<[[ADDR0]]>
+; CHECK-NEXT:     EXPRESSION vp<[[RDX_NEXT:%.+]]> = ir<[[RDX]]> + reduce.sub (mul nsw (ir<[[LOAD0]]> sext to i64), (ir<[[LOAD0]]> sext to i64))
+; CHECK-NEXT:     EMIT vp<[[IV_NEXT]]> = add nuw vp<[[IV]]>, vp<[[VFxUF]]>
+; CHECK-NEXT:     EMIT branch-on-count vp<[[IV_NEXT]]>, vp<[[VTC]]>
+; CHECK-NEXT:   No successors
+; CHECK-NEXT: }
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [ %iv.next, %loop ], [ 0, %entry ]
+  %rdx = phi i64 [ %rdx.next, %loop ], [ 0, %entry ]
+  %arrayidx = getelementptr inbounds i16, ptr %x, i32 %iv
+  %load0 = load i16, ptr %arrayidx, align 4
+  %conv0 = sext i16 %load0 to i32
+  %mul = mul nsw i32 %conv0, %conv0
+  %conv = sext i32 %mul to i64
+  %rdx.next = sub nsw i64 %rdx, %conv
+  %iv.next = add nuw nsw i32 %iv, 1
+  %exitcond = icmp eq i32 %iv.next, %n
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  %r.0.lcssa = phi i64 [ %rdx.next, %loop ]
+  ret i64 %r.0.lcssa
+}

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this only impacts printing, right? It seems like it would be simpler to just handle the case where 1 extend is shared when picking the multiply recipe, by checking if ExpressionRecipes[1] is an extend or widen recipe?

  auto *Mul = cast<VPWidenRecipe>(IsExtended ? ExpressionRecipes[2]
                                               : ExpressionRecipes[0]);

Comment on lines 2801 to 2822
Tmp->setUnderlyingValue(Op->getUnderlyingValue());
LiveInPlaceholders.push_back(Tmp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to set the underlying value now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was needed to fix a crash early on but I don't think it's needed now so I've removed it, thanks.

@SamTebbs33
Copy link
Collaborator Author

Hmm, this only impacts printing, right? It seems like it would be simpler to just handle the case where 1 extend is shared when picking the multiply recipe, by checking if ExpressionRecipes[1] is an extend or widen recipe?

  auto *Mul = cast<VPWidenRecipe>(IsExtended ? ExpressionRecipes[2]
                                               : ExpressionRecipes[0]);

I think it's best to get the vector of recipes correct from the start so that we don't need to account for unexpectedness in other places. This could theoretically happen for future bundle types too and we don't want lots of checks when we can just get it right from the start of the bundle's existence.

@SamTebbs33 SamTebbs33 force-pushed the vpexpression-keep-duplicates branch from e6c89c3 to 2300506 Compare September 19, 2025 12:27
}

// Keep track of how many instances of each recipe occur in the recipe list
SmallMapVector<VPSingleDefRecipe *, unsigned, 4> ExpressionRecipeCounts;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about a slightly different case, where the same (external) input operand is shared by different expression recipes. I don't think there would be an easy way to test that at the moment, but I can see that happening with other kinds of expressions in the future.

Comment on lines 2825 to 2826
if (F->second == 0)
R->setOperand(Idx, Tmp);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you were to keep a map of Op -> new VPValue() nodes, then you can just do:

LiveInPlaceholders.push_back(new VPValue());
MyMap[Op] = LiveInPlaceholders.back();

and then have a loop at the end to replace all values of Op by their LiveInPlaceholder (i.e. new VPValue()) values for each R in ExpressionRecipes.

That would also fix the potential issue I mentioned above, where same input operand is shared by multiple recipes passed to the expression.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's nice. It also means we don't have to keep track of the recipe counts. Done.

delete R;
SmallSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen;
for (auto *R : reverse(ExpressionRecipes)) {
if (ExpressionRecipesSeen.insert(R).second)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the comment for SmallVector<VPSingleDefRecipe *> ExpressionRecipes (line 2973) that this vector may contain duplicates?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not done?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I did add it but must not have staged it.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this only impacts printing, right? It seems like it would be simpler to just handle the case where 1 extend is shared when picking the multiply recipe, by checking if ExpressionRecipes[1] is an extend or widen recipe?

  auto *Mul = cast<VPWidenRecipe>(IsExtended ? ExpressionRecipes[2]
                                               : ExpressionRecipes[0]);

I think it's best to get the vector of recipes correct from the start so that we don't need to account for unexpectedness in other places. This could theoretically happen for future bundle types too and we don't want lots of checks when we can just get it right from the start of the bundle's existence.

But it looks like handling duplicates requires quite a bit of extra complexity spread throughout; ideally there would be no need to look up the individual entries in the expression list.

At the moment this just impacts printing. IMO it would be better to just have a generic print function that does not inspect the entries directly, but instead prints something like

EXPRESSION vp<%x> = ExtendedReduction operands {
  bundled recipes...
}

@SamTebbs33
Copy link
Collaborator Author

SamTebbs33 commented Sep 22, 2025

Hmm, this only impacts printing, right? It seems like it would be simpler to just handle the case where 1 extend is shared when picking the multiply recipe, by checking if ExpressionRecipes[1] is an extend or widen recipe?

  auto *Mul = cast<VPWidenRecipe>(IsExtended ? ExpressionRecipes[2]
                                               : ExpressionRecipes[0]);

I think it's best to get the vector of recipes correct from the start so that we don't need to account for unexpectedness in other places. This could theoretically happen for future bundle types too and we don't want lots of checks when we can just get it right from the start of the bundle's existence.

But it looks like handling duplicates requires quite a bit of extra complexity spread throughout; ideally there would be no need to look up the individual entries in the expression list.

At the moment this just impacts printing. IMO it would be better to just have a generic print function that does not inspect the entries directly, but instead prints something like

EXPRESSION vp<%x> = ExtendedReduction operands {
  bundled recipes...
}

Sorry, it doesn't just impact printing, but anything that wants to analyse the recipes at all, such as bundled partial reductions: https://github.com/llvm/llvm-project/pull/147302/files#diff-34abe4c3cd34aa7a9664bbd204834248455635ba80b8a9ba9506d8c3e6b94d95R2878

I don't quite see why it's bad to get the recipe list correct from the very beginning? Surely that's worth a little bit of complexity in the constructor?

@sdesmalen-arm
Copy link
Collaborator

But it looks like handling duplicates requires quite a bit of extra complexity spread throughout

At the moment only the destructor needs to know that ExpressionRecipes contains duplicates when deleting them. What other complexity is required with regards to handling duplicates?

SamTebbs33 added a commit to SamTebbs33/llvm-project that referenced this pull request Sep 22, 2025
This PR adds the ExtNegatedMulAccReduction expression type for
VPExpressionRecipe so that extend-multiply-accumulate reductions with a
negated multiply can be bundled.

Stacked PRs:

1. llvm#156976
2. -> This
3. llvm#147302
@fhahn
Copy link
Contributor

fhahn commented Sep 23, 2025

Sorry, it doesn't just impact printing, but anything that wants to analyse the recipes at all, such as bundled partial reductions: https://github.com/llvm/llvm-project/pull/147302/files#diff-34abe4c3cd34aa7a9664bbd204834248455635ba80b8a9ba9506d8c3e6b94d95R2878

But can't this also be avoided naturally, by just getting the needed extend from the reduction/result recipe which is the root of the pattern, instead of relying on a specific order of operands? In the future, the operands to a reduction pattern could also contain live-ins (e.g. constant), and in that case there would be be no recipe at all.

I don't quite see why it's bad to get the recipe list correct from the very beginning?

I guess it depends on what 'correct' means here. Currently it is simply the list of recipes that are bundled together.

At the moment only the destructor needs to know that ExpressionRecipes contains duplicates when deleting them. What other complexity is required with regards to handling duplicates?

Besides the descructor, decomposition and construction also need extra complexity. Granted it is not that much, but if looking up the required information from the root instruction works as well that may allow us to keep the generic code simpler.

@SamTebbs33
Copy link
Collaborator Author

Sorry, it doesn't just impact printing, but anything that wants to analyse the recipes at all, such as bundled partial reductions: https://github.com/llvm/llvm-project/pull/147302/files#diff-34abe4c3cd34aa7a9664bbd204834248455635ba80b8a9ba9506d8c3e6b94d95R2878

But can't this also be avoided naturally, by just getting the needed extend from the reduction/result recipe which is the root of the pattern, instead of relying on a specific order of operands? In the future, the operands to a reduction pattern could also contain live-ins (e.g. constant), and in that case there would be be no recipe at all.

I don't quite see why it's bad to get the recipe list correct from the very beginning?

I guess it depends on what 'correct' means here. Currently it is simply the list of recipes that are bundled together.

At the moment only the destructor needs to know that ExpressionRecipes contains duplicates when deleting them. What other complexity is required with regards to handling duplicates?

Besides the descructor, decomposition and construction also need extra complexity. Granted it is not that much, but if looking up the required information from the root instruction works as well that may allow us to keep the generic code simpler.

The problem with checking the root instruction is that you'll have to repeat the pattern matching that was already done in the transform pass. Whereas with this solution you can rely on them being in the recipe list so can fetch them from there, without re-doing any work. With Sander's placeholder caching suggestion, the code has become a lot simpler so I hope that's acceptable.

Copy link
Collaborator

@sdesmalen-arm sdesmalen-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the descructor, decomposition and construction also need extra complexity. Granted it is not that much, but if looking up the required information from the root instruction works as well that may allow us to keep the generic code simpler.

I guess if you're reasoning from the idea that we'll only ever have this limited list of VPexpressions, then I guess this adds (marginally) more logic. But if we'd add a new expression that takes 4 external operands where expression recipes 0 and 1 could be the same, 1 and 2 could be the same, or 2 or 3 could be the same, or any other combination, then it would all get very confusing to figure out what the meaning of the values in ExpressionRecipe is. To me it therefore makes more sense for a VPExpression operation to always have the expected number of external operands and to take a little bit of care when needing to map that back to the expressions when decomposing in case there was a duplicate.

delete R;
SmallSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen;
for (auto *R : reverse(ExpressionRecipes)) {
if (ExpressionRecipesSeen.insert(R).second)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not done?

for (auto *R : ExpressionRecipes) {
for (const auto &[Idx, Op] : enumerate(R->operands())) {
auto *Def = Op->getDefiningRecipe();
if (Def && ExpressionRecipesAsSetOfUsers.contains(Def))
continue;
addOperand(Op);
LiveInPlaceholders.push_back(new VPValue());
R->setOperand(Idx, LiveInPlaceholders.back());
if (OperandPlaceholders.find(Op) == OperandPlaceholders.end())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to check for duplicates, you can just do:

LiveInPlaceholders.push_back(new VPValue());                                                    
OperandPlaceholders[Op] = LiveInPlaceholders.back();

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but I've made a variable for the placeholder so we don't have to call back().

Comment on lines 2820 to 2826
for (auto *R : ExpressionRecipes) {
for (const auto &[Idx, Op] : enumerate(R->operands())) {
auto *Entry = OperandPlaceholders.find(Op);
if (Entry != OperandPlaceholders.end())
R->setOperand(Idx, Entry->second);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be:

Suggested change
for (auto *R : ExpressionRecipes) {
for (const auto &[Idx, Op] : enumerate(R->operands())) {
auto *Entry = OperandPlaceholders.find(Op);
if (Entry != OperandPlaceholders.end())
R->setOperand(Idx, Entry->second);
}
}
for (auto *R : ExpressionRecipes)
for (auto &KV : OperandPlaceholders)
R->replaceUsesOfWith(KV.first, KV.second);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better, thanks.

}
SmallSet<VPValue *, 4> PlaceholdersSeen;
for (VPValue *T : LiveInPlaceholders) {
if (PlaceholdersSeen.insert(T).second)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to check for duplicates, there is a placeholder value for each operand, regardless of whether they're duplicate or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I'm caching the placeholder for each operand so that duplicate operands share the same placeholder, so we do need to care about duplicates here to avoid deleting placeholders that have already been deleted. But I've changed it so that the same operands don't share a placeholder.

}
}
}

void VPExpressionRecipe::decompose() {
for (auto *R : ExpressionRecipes)
R->insertBefore(this);
if (!R->getParent())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment explaining why the !R->getParent().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

@fhahn
Copy link
Contributor

fhahn commented Sep 28, 2025

Besides the descructor, decomposition and construction also need extra complexity. Granted it is not that much, but if looking up the required information from the root instruction works as well that may allow us to keep the generic code simpler.

I guess if you're reasoning from the idea that we'll only ever have this limited list of VPexpressions, then I guess this adds (marginally) more logic. But if we'd add a new expression that takes 4 external operands where expression recipes 0 and 1 could be the same, 1 and 2 could be the same, or 2 or 3 could be the same, or any other combination, then it would all get very confusing to figure out what the meaning of the values in ExpressionRecipe is. To me it therefore makes more sense for a VPExpression operation to always have the expected number of external operands and to take a little bit of care when needing to map that back to the expressions when decomposing in case there was a duplicate.

I would like to make sure the code we end up with is flexible enough to handle future use cases w/o many changes.

One case I am wondering about is how we will handle cases where an interesting operand can be a recipe or a constant/live-in. At the moment, it looks like all extends must be recipes, but they could also be constants that can be treated as extends in some case. If we want to support them, would be back to needing to check the root instruction? E.g. scaled reductions could have constant operands (#161092) and I would assume the same also holds for some of the other reduction patterns?

@SamTebbs33
Copy link
Collaborator Author

SamTebbs33 commented Sep 29, 2025

I would like to make sure the code we end up with is flexible enough to handle future use cases w/o many changes.

One case I am wondering about is how we will handle cases where an interesting operand can be a recipe or a constant/live-in. At the moment, it looks like all extends must be recipes, but they could also be constants that can be treated as extends in some case. If we want to support them, would be back to needing to check the root instruction? E.g. scaled reductions could have constant operands (#161092) and I would assume the same also holds for some of the other reduction patterns?

I agree with making the recipe flexible for future cases, but I don't see anything in this PR that would prevent it working with non-recipe extends. I think that's more of an issue with the constructor interface, so if we do end up making changes to accommodate for non-recipe extends then I don't think this PR will be making that work any harder. Pattern-matching the root instruction would certainly be a way to make the recipe very generic and flexible but it would of course introduce a lot of extra logic that we get for free right now.

EDIT: I tried rebasing on top of #161092 and all of the LV tests still pass.

SamTebbs33 added a commit to SamTebbs33/llvm-project that referenced this pull request Sep 29, 2025
This PR adds the ExtNegatedMulAccReduction expression type for
VPExpressionRecipe so that extend-multiply-accumulate reductions with a
negated multiply can be bundled.

Stacked PRs:

1. llvm#156976
2. -> This
3. llvm#147302
@sdesmalen-arm
Copy link
Collaborator

sdesmalen-arm commented Sep 29, 2025

I would like to make sure the code we end up with is flexible enough to handle future use cases w/o many changes.

One case I am wondering about is how we will handle cases where an interesting operand can be a recipe or a constant/live-in. At the moment, it looks like all extends must be recipes, but they could also be constants that can be treated as extends in some case. If we want to support them, would be back to needing to check the root instruction? E.g. scaled reductions could have constant operands (#161092) and I would assume the same also holds for some of the other reduction patterns?

If we'd add a new VPExpression kind for that case, then there's no need to change the VPExpression class or change the rest of the mechanism. That case could be supported quite easily with the help of this PR (I did an experiment with this locally to handle the case in #161092). A new constructor for that case is needed anyway and having it as a new expression kind makes it clear to handle these slightly differently (e.g. there's no need to query the opcode of the constant/live-in value).

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with making the recipe flexible for future cases, but I don't see anything in this PR that would prevent it working with non-recipe extends. I think that's more of an issue with the constructor interface, so if we do end up making changes to accommodate for non-recipe extends then I don't think this PR will be making that work any harder. Pattern-matching the root instruction would certainly be a way to make the recipe very generic and flexible but it would of course introduce a lot of extra logic that we get for free right now.

I am not sure the pattern matching would introduce a lot of extra logic. Here's what I was thinking: we can get the multiply directly from the reduction recipe (since we need to look it up anyway). We know which operand it is and we can then get the casts from its operands. In this case, it work be even simpler, as we don't need to check IsExtended to get the multiply.

@@ -2900,14 +2900,13 @@ void VPExpressionRecipe::print(raw_ostream &O, const Twine &Indent,
       << " (";
     O << "mul";
     bool IsExtended = ExpressionType == ExpressionTypes::ExtMulAccReduction;
-    auto *Mul = cast<VPWidenRecipe>(IsExtended ? ExpressionRecipes[2]
-                                               : ExpressionRecipes[0]);
+    auto *Mul = cast<VPWidenRecipe>(ExpressionRecipes.back()->getOperand(1));
     Mul->printFlags(O);
     if (IsExtended)
       O << "(";
     getOperand(0)->printAsOperand(O, SlotTracker);
     if (IsExtended) {
-      auto *Ext0 = cast<VPWidenCastRecipe>(ExpressionRecipes[0]);
+      auto *Ext0 = cast<VPWidenCastRecipe>(Mul->getOperand(0));
       O << " " << Instruction::getOpcodeName(Ext0->getOpcode()) << " to "
         << *Ext0->getResultType() << "), (";
     } else {
@@ -2915,7 +2914,7 @@ void VPExpressionRecipe::print(raw_ostream &O, const Twine &Indent,
     }
     getOperand(1)->printAsOperand(O, SlotTracker);
     if (IsExtended) {
-      auto *Ext1 = cast<VPWidenCastRecipe>(ExpressionRecipes[1]);
+      auto *Ext1 = cast<VPWidenCastRecipe>(Mul->getOperand(1));
       O << " " << Instruction::getOpcodeName(Ext1->getOpcode()) << " to "
         << *Ext1->getResultType() << ")";
     }

The above doesn't quite work as expected at the moment unfortunately, because we currently end up with inconsistent operands for the VPExpressionRecipe if there are duplicates, which gets fixed by doing the replacement separately as in this patch. This change is quite subtle and the mapping of external operands to LiveInPlaceholders should probably be clarified/documented somewhere.

For the case in https://github.com/llvm/llvm-project/pull/147302/files#diff-34abe4c3cd34aa7a9664bbd204834248455635ba80b8a9ba9506d8c3e6b94d95R2878 we would need an extra check if the expression is negated, and depending on that use either the second-to-last or third-to-last recipe in the list.

As we need the operand remapping change in any case, the remaining changes seem minor. I still think the only reason they make some cases slighty more convenient is because we are missing support for extended reductions with constant operands, and once we support those we will need to look up the multiply w/o relying on the number of cast recipes anyways. But if you feel that it makes life easier, keeping duplicates is fine with me for now.

EDIT: I tried rebasing on top of #161092 and all of the LV tests still pass.

That's because we don't create VPExpressionRecipes for partial reductions yet I think.

}
}

for (auto *R : ExpressionRecipes)
for (auto &Entry : OperandPlaceholders)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need the map vector here? Can you do something like

Suggested change
for (auto &Entry : OperandPlaceholders)
for (const auto &[LiveIn, Tmp] : zip(operands(), LiveInPlaceholders)) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we do actually, and I've added a test to test against the case Sander was worried about here.

}
}

for (auto *R : ExpressionRecipes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This subtly changes the behavior; we could have the same external operand used multiple times, and we will have multiple entries for them in LiveInPlaceholders, but only use the first of them.

I'm not sure if there's a way around this inconsistency?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are two placeholders for a value where only one of them is used when decomposing the expression, what is the practical issue with that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is a practical issue at the moment, but as mentioned in #156976 (review) it should at least be mentioned/documented.

Or perhaps even better also store a map of live-ins and their tmp values, instead of a vector and creating unused temporary values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commit has actually eliminated this behavioural change so I don't think we need to document anything, but I can do that as a follow-up if that changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing replace-uses-with will update all uses of the external operand with the first temporary value we created for it, leaving the other tmp values created for the same external operand unused

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks, I missed that. I've added a comment as part of a rebase.

@@ -3013,8 +3015,11 @@ class VPExpressionRecipe : public VPSingleDefRecipe {
{Ext0, Ext1, Mul, Red}) {}

~VPExpressionRecipe() override {
for (auto *R : reverse(ExpressionRecipes))
delete R;
SmallSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SmallSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen;
SmallSetPtrSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

SamTebbs33 added a commit that referenced this pull request Sep 30, 2025
This PR adds the ExtNegatedMulAccReduction expression type for
VPExpressionRecipe so that extend-multiply-accumulate reductions with a
negated multiply can be bundled.

Stacked PRs:

1. #156976
2. -> #160154
3. #147302
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 30, 2025
This PR adds the ExtNegatedMulAccReduction expression type for
VPExpressionRecipe so that extend-multiply-accumulate reductions with a
negated multiply can be bundled.

Stacked PRs:

1. llvm/llvm-project#156976
2. -> llvm/llvm-project#160154
3. llvm/llvm-project#147302
The VPExpressionRecipe class uses a set to store its bundled recipes. If
repeated recipes are bundled then the duplicates will be lost, causing
the following recipes to not be at the expected place in the set.

When printing a reduce.add(mul(ext, ext)) bundle, if the extends are the
same then the 3rd element of the set will the the reduction, rather than
the expected mul, causing a cast error. With this change, the recipes
are at the expected index in the set.

Fixes llvm#156464
@SamTebbs33 SamTebbs33 force-pushed the vpexpression-keep-duplicates branch from e3a2221 to 1d6e918 Compare October 1, 2025 10:06
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with inline comments, thanks

Comment on lines 2758 to 2759
ExpressionRecipes(SmallVector<VPSingleDefRecipe *>(
ExpressionRecipes.begin(), ExpressionRecipes.end())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ExpressionRecipes(SmallVector<VPSingleDefRecipe *>(
ExpressionRecipes.begin(), ExpressionRecipes.end())),
ExpressionRecipes(ExpressionRecipes),

Does this now work? I think the explicit constructor was only needed to de-duplicate

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you.

; CHECK: [[MIDDLE_BLOCK]]:
; CHECK-NEXT: br [[DOT_CRIT_EDGE:label %.*]]
; CHECK: [[SCALAR_PH:.*]]:
; CHECK: [[_LR_PH:.*]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you strip the unrelated changes from the latest diff?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator Author

@SamTebbs33 SamTebbs33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

; CHECK: [[MIDDLE_BLOCK]]:
; CHECK-NEXT: br [[DOT_CRIT_EDGE:label %.*]]
; CHECK: [[SCALAR_PH:.*]]:
; CHECK: [[_LR_PH:.*]]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 2758 to 2759
ExpressionRecipes(SmallVector<VPSingleDefRecipe *>(
ExpressionRecipes.begin(), ExpressionRecipes.end())),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you.

@SamTebbs33 SamTebbs33 merged commit 664b227 into llvm:main Oct 1, 2025
9 checks passed
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
This PR adds the ExtNegatedMulAccReduction expression type for
VPExpressionRecipe so that extend-multiply-accumulate reductions with a
negated multiply can be bundled.

Stacked PRs:

1. llvm#156976
2. -> llvm#160154
3. llvm#147302
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
The VPExpressionRecipe class uses a set to store its bundled recipes. If
repeated recipes are bundled then the duplicates will be lost, causing
the following recipes to not be at the expected place in the set.

When printing a reduce.add(mul(ext, ext)) bundle, for example, if the
extends are the same then the 3rd element of the set will be the
reduction, rather than the expected mul, causing a cast error. With this
change, the recipes are at the expected index in the set.

Fixes llvm#156464
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VPExpressionRecipe discards duplicate recipes.
4 participants