Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow inlining past loop broadcasts #3416

Draft
wants to merge 2 commits into
base: mma_predicate_elimination
Choose a base branch
from

Conversation

jacobhinkle
Copy link
Collaborator

Stacked on #3414

This PR enables us to inline an MmaOp properly when its inputs are missing broadcast dimensions. We do this by always allowing inlining past loop broadcasts or their transforms. For example

tv0:
  logical [ iS1{i0} ]
  loop [ iS1{i0} bS5{1} ]
tv1:
  logical [ iS2{i1} ]
  loop [ bS6{1} iS2{i1} ]
tv2 = foo(tv0, tv1)
  logical [ iS3{i0} iS4{i1} ]

As long as the operation foo properly maps its arguments despite the missing logical dimensions (as MmaOp does as of #3391), then we should be able to fully inline this case because the loop broadcasts bS5 and bS6 are imaginary in the sense that they don't impact indexing.

@jacobhinkle
Copy link
Collaborator Author

jacobhinkle commented Nov 15, 2024

After this, we can actually generate a proper kernel and run it. I will rebase #3406 onto this and modify the test to compile and run in that PR so we can inspect the generated kernel there. We can keep this PR for discussing the inlining changes only.

@naoyam
Copy link
Collaborator

naoyam commented Nov 15, 2024

Does this only apply to broadcast IDs added by TensorView::broadcast()?

@jacobhinkle
Copy link
Collaborator Author

Does this only apply to broadcast IDs added by TensorView::broadcast()?

Yes, that's the intention. I am using tv->domain()->additionalIDs(), which I think is only those broadcasts?

@naoyam
Copy link
Collaborator

naoyam commented Nov 15, 2024

Does this only apply to broadcast IDs added by TensorView::broadcast()?

Yes, that's the intention. I am using tv->domain()->additionalIDs(), which I think is only those broadcasts?

Yes. @zasdfgbnm, when you added this, were you thinking about having non-broadcast IDs in additional_ids_?

@jacobhinkle
Copy link
Collaborator Author

Does this only apply to broadcast IDs added by TensorView::broadcast()?

Yes, that's the intention. I am using tv->domain()->additionalIDs(), which I think is only those broadcasts?

Yes. @zasdfgbnm, when you added this, were you thinking about having non-broadcast IDs in additional_ids_?

To be safe I'll check the IterType when skipping.

Comment on lines +206 to +210
for ([[maybe_unused]] auto [expr, dir] : IRBFS::getExprsBetween(
{tv->domain()->additionalIDs().begin(),
tv->domain()->additionalIDs().end()},
{tv->getLoopDomain().begin(), tv->getLoopDomain().end()},
/*require_all_to_visited=*/false)) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This includes all IDs that are between additionalIDs() and loop domain. However, we could have something like this:

tv->broadcast(0, 16);
tv->merge(0);

In this case, we'll be merging the new broadcast ID with a pre-existing loop ID, so we should not ignore that. I think instead maybe what we should do is traverse from the root domain to the loop domain instead and the complement will then be the "pure" loop broadcasts which we can ignore.

Copy link
Collaborator Author

@jacobhinkle jacobhinkle Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I suppose that would also automatically allow us to inline past regular broadcasts that are created using BroadcastOp since those new Broadcast IDs are not reachable from the root domain either, but we already inline past those IDs anyway I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants