Skip to content

[AIEX] Enhance combiner to support post-inc-load requiring moving multiple instructions #369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: aie-public
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 112 additions & 10 deletions llvm/lib/Target/AIE/AIECombinerHelper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@ static cl::opt<bool> EnableOffsetCombine(
static cl::opt<bool> EnablePostIncCombine(
"aie-postinc-combine", cl::Hidden, cl::init(true),
cl::desc("Enable combines of load and stores with post increments"));
static cl::opt<bool> EnablePostIncCombineMultipleInst(
"aie-postinc-combine-move-multiple-instr", cl::Hidden, cl::init(true),
cl::desc(
"Enable combines of load and stores with post increments requiring "
"multiple instructions to be moved"));
static cl::opt<bool> EnableGreedyAddressCombine(
"aie-greedy-address-combines", cl::Hidden, cl::init(false),
cl::desc("Enable greedy combines without checking for later uses of the "
Expand Down Expand Up @@ -213,8 +218,11 @@ MachineInstr *findPreIncMatch(MachineInstr &MemI, MachineRegisterInfo &MRI,
Register Addr = MemI.getOperand(1).getReg();
MachineInstr *AddrDef = getDefIgnoringCopies(Addr, MRI);
if (AddrDef->getOpcode() == TargetOpcode::G_PTR_ADD) {
MatchData = {AddrDef, TII.getOffsetMemOpcode(MemI.getOpcode()), &MemI,
/*ExtraInstrsToMove=*/{},
MatchData = {AddrDef,
TII.getOffsetMemOpcode(MemI.getOpcode()),
&MemI,
/*ExtraInstrsToMoveBefore=*/{},
/*ExtraInstrsToMoveAfter=*/{},
/*RemoveInstr=*/false};
return AddrDef;
}
Expand Down Expand Up @@ -277,6 +285,67 @@ bool llvm::canDelayMemOp(MachineInstr &MemI, MachineInstr &Dest,
return none_of(InstrRange, UnsafeToMovePast);
}

/// Collects all dependent instructions between SrcMI and DestMI.
/// This function iterates over all definitions in SrcMI and collects all
/// instructions that use these definitions and are within the range from
/// SrcMI to DestMI. The collected instructions are stored in the
/// DependentInstrs vector. It is expected that the SrcMI & DestMI are in the BB
///
/// \param SrcMI The source machine instruction to start the range.
/// \param DestMI The destination machine instruction to end the range.
/// \param MRI The machine register info used to query register uses.
/// \param DependentInstrs A set to store the collected dependent
/// instructions in dominating order.
void DependentInstrInRange(MachineInstr &SrcMI, MachineInstr &DestMI,
MachineRegisterInfo &MRI,
DependentInstrSet &DependentInstrs) {
if (SrcMI.getParent() != DestMI.getParent())
return;
const auto SrcMIIter = SrcMI.getIterator();
const auto DestMIIter = DestMI.getIterator();
const auto SrcToDestDistance = std::distance(SrcMIIter, DestMIIter);
for (auto &Defs : SrcMI.all_defs()) {
const Register Reg = Defs.getReg();
for (auto &Use : MRI.use_nodbg_instructions(Reg)) {
if (Use.getParent() != DestMI.getParent())
continue;
auto UseIter = Use.getIterator();
if (std::distance(SrcMIIter, UseIter) > SrcToDestDistance)
break;
DependentInstrs.insert(&Use);
DependentInstrInRange(Use, DestMI, MRI, DependentInstrs);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance for this to enter in loop? What happens if Use is a Phi node? We could have a test, something like:

a = Phi (c) ...
b = load p0
c = add a, b
p1 = padd p0

We could have a test with such a scenario.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added a test and explain why we will not get into a loop.

}
}
}

/// \return true and \a DependentInstrs set to move below Dest, if \a MemI
/// can be moved just before \a Dest in order to allow post-increment combining
bool llvm::canDelayMemOpWithExtraInstr(MachineInstr &MemI, MachineInstr &Dest,
MachineRegisterInfo &MRI,
DependentInstrSet &DependentInstrs) {
if (MemI.getParent() != Dest.getParent())
return false;
DependentInstrInRange(MemI, Dest, MRI, DependentInstrs);

if (DependentInstrs.empty())
return true;

// Check if we can move the dependent instructions after Dest in order to
// enable post-increment combining.
if (!canDelayMemOp(**DependentInstrs.begin(), Dest, MRI))
return false;

// Check if we can move the rest of the dependent instructions after the
// previous one, this inherently checks if we can move the dependent
// instruction below the Dest instruction
for (auto Iter = DependentInstrs.begin(), NextIter = std::next(Iter);
NextIter != DependentInstrs.end(); ++Iter, ++NextIter)
if (!canDelayMemOp(**NextIter, **Iter, MRI))
return false;

return true;
Copy link
Collaborator

@martien-de-jong martien-de-jong Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I don't like here is that we have a very layered view of dependences. I think the basic condition is: the destination should not be reachable from the source in the dependence graph over any of DependentInstrs. canDelayMemOp() is one way of coding a dependence edge in that graph, and the use def chains are another one.
Also, the formulation is not the most direct. I think you actually want to know canAdvanceDest(MI, DestMI) for all MI in DependentInstrs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Src -> I1 -> I2 -> I3
Src -> Dst

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I have tried to check here using the existing functions is if I can pack these dependence graphs together.

By together here, I mean next to each other.

Using the same Src -> I1 -> I2 -> I3 andSrc -> Dstexample but with others Ix in between Src/Des/I1-3, I check if it's safe to pack them as Src , I1, I2, I3 , Dst in mist, of Ix if so, I move belowI1, I2, I3athe Dst and Src just above Dst

The reason for taking this approach was that I1 -> I2 -> I3 could only be moved as a group, and checking that in a single shot without actually moving them in the "match" phase of the combiner would have been tricky.

}

/// \return true if \a Dest can be moved just after \a MemI in order to allow
/// combining
bool llvm::canAdvanceOp(MachineInstr &MemI, MachineInstr &Dest,
Expand Down Expand Up @@ -497,25 +566,43 @@ MachineInstr *findPostIncMatch(MachineInstr &MemI, MachineRegisterInfo &MRI,
})) {
continue;
}
MatchData = {&PtrInc, *CombinedOpcode, &MemI,
/*ExtraInstrsToMove=*/{},
MatchData = {&PtrInc,
*CombinedOpcode,
&MemI,
/*dxeBefore=*/{},
/*ExtraInstrsToMoveAfter=*/{},
/*RemoveInstr=*/true};
// The offset of the PtrInc might be defined after MemI, in this case we
// want to verify if it would be possible to insert the combined
// instruction at the PtrInc instead of the location of MemI. Instruction
// with side effects are also blocking: Loads, stores, calls, instructions
// with side effects cannot be moved.
// TODO: try move other instructions that block us from combining
} else if (canDelayMemOp(MemI, PtrAddInsertLoc, MRI)) {
// If Definition of the offset is a G_CONSTANT we have to move that
// instruction up
MatchData = {
&PtrInc, *CombinedOpcode, &PtrAddInsertLoc,
/*ExtraInstrsToMove=*/
/*ExtraInstrsToMoveBefore=*/
findConstantOffsetsToMove(PtrInc, PtrAddInsertLoc, MRI, Helper),
/*ExtraInstrsToMoveAfter=*/{},
/*RemoveInstr=*/true};
// When the offset of the PtrInc might be defined after MemI, we may want
// to move some instruction below the PtrInc to allow the combine.
} else if (DependentInstrSet DependentInstrToMoveBelow(Helper);
EnablePostIncCombineMultipleInst &&
canDelayMemOpWithExtraInstr(MemI, PtrInc, MRI,
DependentInstrToMoveBelow)) {
std::vector<MachineInstr *> ExtraInstrsToMoveAfter(
DependentInstrToMoveBelow.begin(), DependentInstrToMoveBelow.end());
MatchData = {&PtrInc,
*CombinedOpcode,
&PtrInc,
/*ExtraInstrsToMoveBefore=*/{},
/*ExtraInstrsToMoveAfter=*/ExtraInstrsToMoveAfter,
/*RemoveInstr=*/true};

} else {
LLVM_DEBUG(dbgs() << " Ignoring candidate " << PtrInc);
LLVM_DEBUG(dbgs() << " Ignoring candidate for PostInc " << PtrInc);
continue;
}
// Only combine postIncs if we know that the original pointer is not used
Expand Down Expand Up @@ -552,9 +639,24 @@ void llvm::applyLdStInc(MachineInstr &MI, MachineRegisterInfo &MRI,
// Debug Loc: Debug Loc of LOAD STORE: MI
B.setDebugLoc(MI.getDebugLoc());
auto NewInstr = B.buildInstr(MatchData.CombinedInstrOpcode);
for (auto *Instr : MatchData.ExtraInstrsToMove) {
Instr->moveBefore(NewInstr);
}

// Move the instructions before the NewInstr
auto MoveInstrsBefore = [&NewInstr](const auto &InstrsToMoveBefore) {
std::for_each(InstrsToMoveBefore.begin(), InstrsToMoveBefore.end(),
[&NewInstr](auto *MI) { MI->moveBefore(NewInstr); });
};
MoveInstrsBefore(MatchData.ExtraInstrsToMoveBefore);

// Move the instructions after the NewInstr
auto MoveInstrsAfter = [&NewInstr](const auto &InstrsToMoveAfter) {
const auto &NewInstrIter = NewInstr.getInstr()->getIterator();
std::for_each(InstrsToMoveAfter.begin(), InstrsToMoveAfter.end(),
[&NewInstrIter](auto *MI) {
MI->moveBefore(&*std::next(NewInstrIter));
});
};
MoveInstrsAfter(MatchData.ExtraInstrsToMoveAfter);

if (MI.mayLoad())
NewInstr.addDef(MI.getOperand(0).getReg() /* Loaded value */);
if (MatchData.RemoveInstr)
Expand Down
20 changes: 19 additions & 1 deletion llvm/lib/Target/AIE/AIECombinerHelper.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,9 @@ struct AIELoadStoreCombineMatchData {
/// before this Instruction
MachineInstr *CombinedInsertPoint;
/// Additional instructions to be moved just before Instr
std::vector<MachineInstr *> ExtraInstrsToMove;
std::vector<MachineInstr *> ExtraInstrsToMoveBefore;
/// Additional instructions to be moved just after Instr
std::vector<MachineInstr *> ExtraInstrsToMoveAfter;
/// Should Instr (the PtrAdd) be removed after the combine was applied
bool RemoveInstr;
};
Expand All @@ -39,6 +41,16 @@ struct ShuffleMaskValidity {
SmallVector<unsigned, 4> MaskExceptions;
};

// Custom comparator for std::set based on dominance relation
struct DominanceComparator {
CombinerHelper &Helper;
DominanceComparator(CombinerHelper &Helper) : Helper(Helper) {}
bool operator()(MachineInstr *A, MachineInstr *B) const {
return Helper.dominates(*B, *A);
}
};
using DependentInstrSet = std::set<MachineInstr *, DominanceComparator>;

struct FrequentIndexResult {
unsigned FrequentIdx;
unsigned NonMatchingCount;
Expand Down Expand Up @@ -132,6 +144,12 @@ bool matchShuffleToExtractBroadcast(MachineInstr &MI, MachineRegisterInfo &MRI,
/// post-increment combining
bool canDelayMemOp(MachineInstr &MemI, MachineInstr &Dest,
MachineRegisterInfo &MRI);
/// \return true if \a MemI can be moved just before \a Dest in order to allow
/// post-increment combining, and return additional instuction in \a
/// DependentInstrs to be moved just after \a Dest
bool canDelayMemOpWithExtraInstr(MachineInstr &MemI, MachineInstr &Dest,
MachineRegisterInfo &MRI,
DependentInstrSet &DependentInstrs);
/// \return true if \a Dest can be moved just after \a MemI in order to allow
/// combining
bool canAdvanceOp(MachineInstr &MemI, MachineInstr &Dest,
Expand Down
20 changes: 9 additions & 11 deletions llvm/test/CodeGen/AIE/GlobalISel/combine-loads-stores.mir
Original file line number Diff line number Diff line change
Expand Up @@ -212,20 +212,19 @@ body: |
...

---
name: load_not_to_postinc_cannot_move_offset
name: load_to_postinc_move_inst_below_PTR_ADD
body: |
bb.0:
liveins: $p0
; CHECK-LABEL: name: load_not_to_postinc_cannot_move_offset
; CHECK-LABEL: name: load_to_postinc_move_inst_below_PTR_ADD
; CHECK: liveins: $p0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p0) = COPY $p0
; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY]](p0) :: (load (s32))
; CHECK-NEXT: $r0 = COPY [[LOAD]](s32)
; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 12
; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s20) = G_TRUNC [[C]](s32)
; CHECK-NEXT: [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[COPY]], [[TRUNC]](s20)
; CHECK-NEXT: $p0 = COPY [[PTR_ADD]](p0)
; CHECK-NEXT: [[AIE_POSTINC_LOAD:%[0-9]+]]:_(s32), [[AIE_POSTINC_LOAD1:%[0-9]+]]:_(p0) = G_AIE_POSTINC_LOAD [[COPY]], [[TRUNC]](s20) :: (load (s32))
; CHECK-NEXT: $r0 = COPY [[AIE_POSTINC_LOAD]](s32)
; CHECK-NEXT: $p0 = COPY [[AIE_POSTINC_LOAD1]](p0)
%0:_(p0) = COPY $p0
%3:_(s32) = G_LOAD %0 :: (load (s32))
$r0 = COPY %3
Expand Down Expand Up @@ -421,22 +420,21 @@ body: |
...

---
name: store_not_to_postinc_cannot_move_offset
name: store_to_postinc_move_store_near_to_ptr_add
body: |
bb.0:
liveins: $p0, $r0
; CHECK-LABEL: name: store_not_to_postinc_cannot_move_offset
; CHECK-LABEL: name: store_to_postinc_move_store_near_to_ptr_add
; CHECK: liveins: $p0, $r0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p0) = COPY $p0
; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $r0
; CHECK-NEXT: G_STORE [[COPY1]](s32), [[COPY]](p0) :: (store (s32))
; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(p0) = COPY $p1
; CHECK-NEXT: G_STORE [[COPY1]](s32), [[COPY2]](p0) :: (store (s32))
; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 24
; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s20) = G_TRUNC [[C]](s32)
; CHECK-NEXT: [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[COPY]], [[TRUNC]](s20)
; CHECK-NEXT: $p0 = COPY [[PTR_ADD]](p0)
; CHECK-NEXT: [[AIE_POSTINC_STORE:%[0-9]+]]:_(p0) = G_AIE_POSTINC_STORE [[COPY1]](s32), [[COPY]], [[TRUNC]](s20) :: (store (s32))
; CHECK-NEXT: $p0 = COPY [[AIE_POSTINC_STORE]](p0)
%0:_(p0) = COPY $p0
%1:_(s32) = COPY $r0
G_STORE %1, %0 :: (store (s32))
Expand Down
Loading