-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LinalgExt] Add online_attention op #17536
Merged
Groverkss
merged 31 commits into
iree-org:main
from
Groverkss:new-decomposition-attention
Jun 12, 2024
Merged
Changes from all commits
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
9549091
Split tests
Groverkss 872ca3e
Address comments
Groverkss a3e3471
save
Groverkss b53734d
save
Groverkss 3f35acb
add online attention op
Groverkss 1e5c190
Implement TilingInterface for online attention
Groverkss c68d28f
refactor some impl
Groverkss e3f5896
Add aggregate op interface for online_attention
Groverkss 5993feb
add dtype conversions and convert to online attention pass
Groverkss c10ff9c
remove redundant functions
Groverkss 1765894
Make llvmcpu backend use online attention
Groverkss a69bf87
Remove redundant comments
Groverkss 283f5cc
add test for tiling
Groverkss 6feaa37
clang-format
Groverkss c3fb664
add decompose test
Groverkss 0974aee
Add docs for online_attention
Groverkss 2e96982
bazeltocamke
Groverkss 8835a84
remove todo
Groverkss 791a31a
address comments
Groverkss f937879
Move aggregate op implementation to seperate file
Groverkss 5d6f8cc
addreess comments
Groverkss b52d70d
fix compilation error
Groverkss 4f0a7e9
Address hanhan's comments
Groverkss f490be5
pre-commit
Groverkss ac149e3
dummy reduction tile sizes for winograd
Groverkss ec7aff2
fix tests
Groverkss c249c98
fix test
Groverkss 713c95b
BAZEL :cry:
Groverkss 5157ed5
Revert "BAZEL :cry:"
Groverkss 01fca7d
BAZEL BAZEL
Groverkss 3e2f54f
BEZELL
Groverkss File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this happen before tiling parallel dims? Is it a requirement for tiling reduction loops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can only tile reduction loops on online_attention op. We could do this before tiling parallel dims, but we would then need to propagate lowering_config info in createConvertAttentionToOnlineAttention pass. For more context, the conversion does:
to
The lowering config gets preserved on the online_attention op and is used for reduction tiling. Until we have consumer fusion (and greedy fusion for multiple operands/results) fixed, I don't think we can do it.
As a side note, this doesn't allow us to do further levels of parallel tiling on the elementwise and fill operations (which is not the best).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, I would like there to be a way to propagate the lowering_config attribute when I do a conversion like this (which would be putting the tiling information on the type, or somewhere more presistent).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more like asking questions but not a requirement to address the comment. I'm trying to see the whole picture of how it could be done in CPU backend.
So it seems that we can convert the op to online_attention op before lowering strategy selection, like what we've done in softmax op. Do you think that we want to keep it as attention form when we're doing the tiling on parallel loops? Or it does not matter if we have "tile online_attention op and fuse its producers/consumers into the for loop"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I understand what you mean now. I can try. I'm thinking there might be problems with fusion because online_attention op has multiple results. Let me try and see if I can do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to try it and land it in this PR, because the PR is already big and it is fairly new to CPU backends. I can pull in others to help with CPU changes later. Are there other pending changes for attention ops?