-
Notifications
You must be signed in to change notification settings - Fork 7
Refactor lower_alias_memory.cpp #2170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if (debug_print) { | ||
AllocateReuseModifier::debugPrint(exprs); | ||
} | ||
AllocateReuseModifier::modify(exprs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why, but previously when the debug print is enabled, the analysis is executed twice. That's actually not supported as the analysis assumes anything already aliased is an output that is aliased to an input (https://github.com/csarofeen/pytorch/pull/2170/files#diff-37be3f6d58243db8255519e2eb487975a21cfe8c030c937ae169e4367f5ef6e1R682)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not finished reading yet. Just posting my comments so far. They are minor comments, I don't have any major concerns so far.
|
||
//! Reuse Allocation nodes via pointer aliasing | ||
class AllocateReuseModifier { | ||
class ReusableAllocationFinder : private kir::IrVisitor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check my understanding:
The detection of reusability is based on the line number. Except for checking visibility in the scope, we are just ignoring the loops. So this approach assumes that the allocation in an outer scope is only used by the first iteration of an inner loop? This assumption sounds strange to me. For example, if I have:
float T1[2];
// compute T1[2]
for i in range(2):
something = T1[i];
// no more read of T1 below this point.
float T2[2]; // this will reuse T1??? Is it correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See an issue here: #1785, but I am still concerned. I don't think broadcasting resolution is the only case we should worry about. For example, I can have code:
float T1[2];
compute T1[2]
for i in range(2):
something = T1[i];
// no more read of T1 below this point.
float T2[2]; // this will reuse T1
read T2 from fusion input
something_else = reduction(T2)
fusion_output[index] = something + something_else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this case would be covered by isSerialBroadcastResolution
, but not 100% sure. Can you create a repro and file an issue? I want to focus just refactoring in this PR. There's one bug I'm fixing in #2174.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you didn't change the algorithm in this PR, so this is not a blocking comment. I am just generally worried about the algorithm and will try to think if I can break it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think the above case I mentioned is covered in
//! Careful heavy check on inner sharing candidates,
//! current enforced conditions are:
//!
//! 1. The two buffers have producer-consumer relationship
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished reading. Thanks for refactoring! I only have some minor suggestions, and except for that, it looks good to me.
//! Realize alias of two buffers through inner alias analysis and | ||
//! keep track of the re-use. | ||
void useInnerAlias(AllocationInfoPtr from, AllocationInfoPtr to) { | ||
void useInnerAlias(AllocationInfo* from, AllocationInfo* to) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I personally feel that from
and to
are confusing. I don't know who is aliasing who, and I need to look around to figure it out everytime. Would it make sense to reorder the parameter and name it as something like (x, alias_of_x)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments. Hope it's better now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding comments looks good to me
// The alias-to Allocate may be replaced with a new | ||
// Allocate. Use the new one in that case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for supporting 2-hop aliasing? I believe we have a comment
// Avoid 2-hop aliasing for simplicity. Can support if really need in
// extreme cases.
If we don't need to support 2-hop aliasing, then we can just remove the old2new_
map from this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like that shouldn't happen. The previous version of this code modifies kir::Allocate
, but we are now replacing the pointers, so the link from AllocationInfo
to kir::Allocate
may be broken. If we never alias an aliased tensor, that should be fine. I'd rather keep the map and assert that condition. Will update the code.
Just refactored the alias analysis pass without change the underlying logic.
Changed the analysis to use the visitor and mutator to reduce duplicated traversal logic. The pass was written before
kir::IrVisistor
andkir::ExprMutator
were added.Also, previous classes such as
BufferUseDefInfo
does two separate traversals, which are now two separate classes (ScopeMap
andAllocationInfoMap
).There should be no logic change except for the fix of #2117, which was because the aliasing modification pass was done twice when the debug print was enabled. I just dumped all the alias pairs from all the C++ and Python tests, and confirmed every test has the same alias pairs before and after this PR.
I'm skeptical if this aliasing pass could be actually helpful for local tensors as they should be always statically sized and statically indexed, so nvcc should be able to have the same liveness information. We still need to do this for shared memory tensors, though.
This PR will be followed up with a fix for #2163.