Skip to content

[AIEX] Enhance combiner to support post-inc-load requiring moving multiple instructions #369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: aie-public
Choose a base branch
from

Conversation

krishnamtibrewala
Copy link
Collaborator

@krishnamtibrewala krishnamtibrewala commented Feb 24, 2025

Enhance the combiner to identify the potential to move multiple dependent instructions below G_PTR_ADD to enable post-inc combining.

Say between a dependent LOAD & PTR_ADD you have multiple instructions that use the value loaded by the LOAD instruction.

First we try to identify the dependent instructions chain, which uses the definition of LOAD, and see if we can move the dependency chain below PTR_ADD such that we open up the potential to combine LOAD & PTR_ADD to POST_INC_LOAD.

if (!canDelayMemOp(*DependentInstrs[i + 1], *DependentInstrs[i], MRI))
return false;

return true;
Copy link
Collaborator

@martien-de-jong martien-de-jong Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I don't like here is that we have a very layered view of dependences. I think the basic condition is: the destination should not be reachable from the source in the dependence graph over any of DependentInstrs. canDelayMemOp() is one way of coding a dependence edge in that graph, and the use def chains are another one.
Also, the formulation is not the most direct. I think you actually want to know canAdvanceDest(MI, DestMI) for all MI in DependentInstrs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Src -> I1 -> I2 -> I3
Src -> Dst

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I have tried to check here using the existing functions is if I can pack these dependence graphs together.

By together here, I mean next to each other.

Using the same Src -> I1 -> I2 -> I3 andSrc -> Dstexample but with others Ix in between Src/Des/I1-3, I check if it's safe to pack them as Src , I1, I2, I3 , Dst in mist, of Ix if so, I move belowI1, I2, I3athe Dst and Src just above Dst

The reason for taking this approach was that I1 -> I2 -> I3 could only be moved as a group, and checking that in a single shot without actually moving them in the "match" phase of the combiner would have been tricky.

@krishnamtibrewala krishnamtibrewala force-pushed the aie-post-inc branch 2 times, most recently from 474c3d8 to e39efb2 Compare February 24, 2025 17:31

// Check if we can move the dependent instructions after Dest in order to
// enable post-increment combining.
if (!canDelayMemOp(*DependentInstrs.front(), Dest, MRI))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check is the only that matters here. Maybe you can add a test that fails on the next for.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only this wont be sufficient (kingly note the front() will give the last instruction in the dependency chain)
Why we need the next check is explained here #369 (comment)

@krishnamtibrewala
Copy link
Collaborator Author

Hi @andcarminati can you re-review this again. Thank you.

std::distance(SrcMIIter, DestMIIter))
break;
DependentInstrs.insert(&Use);
DependentInstrInRange(Use, DestMI, MRI, DependentInstrs);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance for this to enter in loop? What happens if Use is a Phi node? We could have a test, something like:

a = Phi (c) ...
b = load p0
c = add a, b
p1 = padd p0

We could have a test with such a scenario.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added a test and explain why we will not get into a loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants