-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SplitLogicalObjectFifos] Add support for dma tranposed on the target side #850
base: main
Are you sure you want to change the base?
Conversation
...D-AIE/iree-amd-aie/Transforms/test/split_logicalobjfifos_for_connection_reuse_transpose.mlir
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks so cleaner now! Thanks Vivian! A couple more comments to address.
compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIELogicalObjFifoSplittingUtils.cpp
Outdated
Show resolved
Hide resolved
compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIELogicalObjFifoSplittingUtils.cpp
Outdated
Show resolved
Hide resolved
compiler/plugins/target/AMD-AIE/iree-amd-aie/Transforms/AMDAIELogicalObjFifoSplittingUtils.cpp
Outdated
Show resolved
Hide resolved
// The L2 target side has transposed dimensions, while the L3 source side | ||
// data are continuous and doesn't have "nonSplitDim". | ||
// Hardcoded the transposed dimensions for now. | ||
const SmallVector<size_t> transposeDim = {0, 2, 1, 3}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have added a check earlier : if (l2TargetShape.size() != staticL2AsTargetSizes.size())
, I'd rather we infer even the dimensions where transpose has taken place inside the loop for (auto [s1, s2] : llvm::zip_equal(l2TargetShape, staticL2AsTargetSizes))
itself and get rid of this hardcoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, now I think there could be issues if we blindly compare values. There could be dimensions that have same values, e.g, [2, 32, 32, 32] -> [2, 32, 32, 32] after transposing dim 1 and 2. Then there's no way to tell which dma dimensions are transposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh okay. That makes sense. Thanks for the explanation.
break; | ||
} | ||
} | ||
if (staticL3AsSourceSizes.size() != staticL2AsTargetSizes.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added this check to deal with the case I mentioned [2, 32, 32, 32] transposed to [2, 32, 32, 32]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, can you explain how this check is helping the case of [2, 32, 32, 32] transposed to [2, 32, 32, 32]
?
I see the static sizes' ranks being compared for L3 source and L2 target, and unable to understand how the above case is checked here. Perhaps some assumption is being made here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the dma is tranposed on the target side then the source side data is continuous. The dma ops are converted from pack ops, and for matmul cases we know the dimensions of source and target memrefs are not the same. For example, we have [128, 128] on the source side, and [1, 2, 32, 32] on the target side, if we transpose on the target side, then the dma addressing has 2 dimensions on source side and 4 dimensions on target side. While on the side, if we transpose on source side, then both dma addressing have 4 dimensions.
Previous PR #812 added support to transpose dma dimensions on the target side for control code optimization. However, this new dma addressing wasn't supported in
SplitLogicalObjectFifosForConnectionReuse
pass.This PR keeps the original logic and most of the original codes while adding the support for the new dma format.