Skip to content

Conversation

naoyam
Copy link
Collaborator

@naoyam naoyam commented Nov 9, 2022

Just refactored the alias analysis pass without change the underlying logic.

Changed the analysis to use the visitor and mutator to reduce duplicated traversal logic. The pass was written before kir::IrVisistor and kir::ExprMutator were added.

Also, previous classes such as BufferUseDefInfo does two separate traversals, which are now two separate classes (ScopeMap and AllocationInfoMap).

There should be no logic change except for the fix of #2117, which was because the aliasing modification pass was done twice when the debug print was enabled. I just dumped all the alias pairs from all the C++ and Python tests, and confirmed every test has the same alias pairs before and after this PR.

I'm skeptical if this aliasing pass could be actually helpful for local tensors as they should be always statically sized and statically indexed, so nvcc should be able to have the same liveness information. We still need to do this for shared memory tensors, though.

This PR will be followed up with a fix for #2163.

Comment on lines -1221 to -1224
if (debug_print) {
AllocateReuseModifier::debugPrint(exprs);
}
AllocateReuseModifier::modify(exprs);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why, but previously when the debug print is enabled, the analysis is executed twice. That's actually not supported as the analysis assumes anything already aliased is an output that is aliased to an input (https://github.com/csarofeen/pytorch/pull/2170/files#diff-37be3f6d58243db8255519e2eb487975a21cfe8c030c937ae169e4367f5ef6e1R682)

@naoyam naoyam changed the title [WIP] Refactor lower_alias_memory.cpp Refactor lower_alias_memory.cpp Nov 9, 2022
@naoyam naoyam requested a review from zasdfgbnm November 9, 2022 06:05
Copy link
Collaborator

@zasdfgbnm zasdfgbnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not finished reading yet. Just posting my comments so far. They are minor comments, I don't have any major concerns so far.


//! Reuse Allocation nodes via pointer aliasing
class AllocateReuseModifier {
class ReusableAllocationFinder : private kir::IrVisitor {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check my understanding:

The detection of reusability is based on the line number. Except for checking visibility in the scope, we are just ignoring the loops. So this approach assumes that the allocation in an outer scope is only used by the first iteration of an inner loop? This assumption sounds strange to me. For example, if I have:

float T1[2];
// compute T1[2]
for i in range(2):
  something = T1[i];
  // no more read of T1 below this point.
  float T2[2];  // this will reuse T1??? Is it correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See an issue here: #1785, but I am still concerned. I don't think broadcasting resolution is the only case we should worry about. For example, I can have code:

float T1[2];
compute T1[2]
for i in range(2):
  something = T1[i];
  // no more read of T1 below this point.
  float T2[2];  // this will reuse T1
  read T2 from fusion input
  something_else = reduction(T2)
  fusion_output[index] = something + something_else

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case would be covered by isSerialBroadcastResolution, but not 100% sure. Can you create a repro and file an issue? I want to focus just refactoring in this PR. There's one bug I'm fixing in #2174.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you didn't change the algorithm in this PR, so this is not a blocking comment. I am just generally worried about the algorithm and will try to think if I can break it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think the above case I mentioned is covered in

  //! Careful heavy check on inner sharing candidates,
  //!  current enforced conditions are:
  //!
  //! 1. The two buffers have producer-consumer relationship

Copy link
Collaborator

@zasdfgbnm zasdfgbnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished reading. Thanks for refactoring! I only have some minor suggestions, and except for that, it looks good to me.

//! Realize alias of two buffers through inner alias analysis and
//! keep track of the re-use.
void useInnerAlias(AllocationInfoPtr from, AllocationInfoPtr to) {
void useInnerAlias(AllocationInfo* from, AllocationInfo* to) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I personally feel that from and to are confusing. I don't know who is aliasing who, and I need to look around to figure it out everytime. Would it make sense to reorder the parameter and name it as something like (x, alias_of_x)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments. Hope it's better now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding comments looks good to me

Comment on lines 1301 to 1302
// The alias-to Allocate may be replaced with a new
// Allocate. Use the new one in that case.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for supporting 2-hop aliasing? I believe we have a comment

      // Avoid 2-hop aliasing for simplicity. Can support if really need  in
      // extreme cases.

If we don't need to support 2-hop aliasing, then we can just remove the old2new_ map from this class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that shouldn't happen. The previous version of this code modifies kir::Allocate, but we are now replacing the pointers, so the link from AllocationInfo to kir::Allocate may be broken. If we never alias an aliased tensor, that should be fine. I'd rather keep the map and assert that condition. Will update the code.

@naoyam naoyam merged commit f588435 into devel Nov 11, 2022
@naoyam naoyam deleted the fix_mem_reuse branch November 11, 2022 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants