-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELAY] Turn reshape into nop in graph executor backend. #7945
Conversation
Previously we are generating the function calls for reshape. This PR updates the optimization to turn reshape into nop: - Tag a fused function as reshape only if it only contains reshape. - Update memory planner to force input output to share the same piece of memory - Update the graph runtime codegen to emit nop when reshape only function is encountered.
cc @altanh @jroesch @zhiics @icemelon9 @Meteorix @ZihengJiang @junrushao1994 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also update the tvm/src/relay/transforms/memory_alloc.cc
to use the kReshapeOnly
flag?
Thanks @icemelon9 let me take a look and see what is going on there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an important PR to have, but also wanted to bring attention to other similar cases and propose that we generalize this so that the code is less ad hoc if possible.
@@ -144,6 +144,9 @@ constexpr const char* kComposite = "Composite"; | |||
constexpr const char* kInline = "Inline"; | |||
/*! \brief Indicate the function was created by the Pattern Partitioning Pass. */ | |||
constexpr const char* kPartitionedFromPattern = "PartitionedFromPattern"; | |||
|
|||
/*! \brief Mark the function as only composed of reshape operations. */ | |||
constexpr const char* kReshapeOnly = "relay.reshape_only"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there other ops that could benefit? for example squeeze
, expand_dims
, etc.? I feel like this should more generally be like relay.alias_only
(although alias might be too general since you could "alias" a subset of an input by slicing)
src/relay/transforms/fuse_ops.cc
Outdated
void VisitExpr_(const CallNode* op) final { has_call = true; } | ||
bool reshape_only = true; | ||
void VisitExpr_(const CallNode* op) final { | ||
static const Op& reshape_op_ = Op::Get("reshape"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to my previous comment, we'd at least want to include a check for reverse_reshape
op, but I think we should generalize and make it less ad hoc if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! It's great to see some thinking on logical operation handling, but I feel the choice to make reshape a nop as done in the PR is mixing two separate concepts that are worth addressing,
C1. Input-ouput pair support to achieve storage aliasing for inplace operations
C2. Eliding physical operations on logical-only transformations/ops
For C1, the reshape function attribute strikes me as a special case of what could be a generic input output aliasing for inplace operations. E.g. operator annotations that indicate a mapping from input indices to output indices. From which the memory planner can reuse the input token for an output according to this mapping.
With respect to C2, there can be other logical operators that a backend can support depending on the hardware target (e.g. implicit broadcast, transpose, reshape). Two concerns come to mind,
-
Logical operations on one platform might not be logical on another. For example, Reshape([A, B, C] -> [A, b1, b2, C]) is not guaranteed to be logical for a device api which utilizes the Nd storage allocation. For example, with texture2d storage: shape[A, B, C] would yield a texture of shape {A, B, C} whereas shape[A, b1, b2, C] could yield a texture of shape {A*b1, b2, C}. So the result of a reshape can change the physical memory layout depending on the storage type.
-
It's worth discussing the pros and cons of handling logical operations by eliding their graph/runtime codegen. I've done this in the past, and while it works, it is a bit of a hack in the sense that the graph-level representation of the compute no longer maps to the device runtime. Removing the invocation definitely has the advantage of not needing to touch the graph or update the type relations etc. But has the disadvantage of breaking the expectation of the relay graph being a honest representation of the physical compute. Another option can be to introduce a LogicalExpr at the graph-level and use passes to replace expr with certain properties (which can be hw dependent, see 1) with these logical exprs. These can then be handles specially by the type checker to be type and shape syncs (on the inputs) and sources (on the outputs), and which are understood to be nops in the graph runtime.
I'm sure there are other considerations I've missed here, but overall I feel this needs a bit more discussion, would you consider making an RFC?
Thanks @icemelon9 @altanh comments are addressed by looking for a reshapeop property. |
@csullivan to followup on your points. I agree that a decision about output aliasing can be a big undertaking to mark in the IR itself. The rationale that brings the current PR os structured as follows:
Note that majority part of the property marking in K0 more strict than C1, due to the reason that aliasing info breaches the semantics while K0 is considered as an annotation of the function property, the backend can still choose to implement reshape only function by actually running the code or eliding and nop. The annotation K0 does not imply the decision. K1 have to happen at some time point, and can be both target and operation dependent. Given that right now the memory planning and executions are coupled at the graph runtime codegen, the current change to the graph runtime is consistent with that choice. As we start to factor out more of memory alloca and execution into seperate passes, I agree that introducing physical specific annotations would then make sense since the decisions will then be passed across a few stages. The target dependence of K1 is indeed something worth thinking about. Given that the current graph planner already assumes flat memory, my assumption is that a separate planning and execution generator will be needed for the N-D case, where we can skip this choice for now(note that the reshape only is a property tag rather than a tag that enforces the execution). Back to the general topic of aliasing memory. I think that problem would be a bigger decision indeeds deserves an RFC and more in-depth thoughts(I don't think marking aliasing in the graph directly is the right approach as you said it changes the semantics of the graph in an early stage). This PR is mainly intended as a incremental improvement to the graph executor that is handle the case of reshape in a consistent way as VM(which already elides reshape) in the case of flat memory scenario. We can certainly bring followup refactors once we have a good proposal to handle aliasing memory. |
Added comments about the flat memory assumptions and TODOs about what can be done under non-flat settings once the support is introduced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be good to decouple the memory planning and codegen into separate passes.
I'm not clear on the distinction between K0 and C0. In both cases they are function level annotations. In C0, the proposal is to annotate the function as attr::alias(i0, o0), in K1 the proposal is to annotate the function as attr:kReshapeOnly. In both cases the backend can choose whether to respect them based on target or storage capability.
I'm fine to wait on implementing this as a post memory planning optimization pass once things are factored out. But want to be sure I understand why you argue that the aliasing info breaches the semantics.
See my comment below for K1. I personally would feel better about the graph executor codegen eliding logical reshapes if it checks to ensure that the operation is inplace (input storage_ids have been forwarded).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
reshape only was a property about the function itself, the property have well-defined meaning in functional land. While alias property itself does not have a clear definition in the function land and is closer to the execution property. So a reshape only tag can appear in early stage of compilation without a problem. While alias tag would likely need to only appear when we start to think about memory planning. For general ops, having alias information alone may not give a complete picture, as the operator semantics itself is also part of the picture. For some ops, alias needs to be enforced(e.g. case of scatter sparse add to dense), while others the impact is more minimum(e.g. reshape). Reshape only in that regard is a more narrow tag that specifies the op semantics as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, LGTM 👍
cc @icemelon9 @altanh please take another look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* [RELAY] Turn reshape into nop in graph executor backend. Previously we are generating the function calls for reshape. This PR updates the optimization to turn reshape into nop: - Tag a fused function as reshape only if it only contains reshape. - Update memory planner to force input output to share the same piece of memory - Update the graph runtime codegen to emit nop when reshape only function is encountered. * Address review comments. * Additional comment and TODOs on the rationale
* [RELAY] Turn reshape into nop in graph executor backend. Previously we are generating the function calls for reshape. This PR updates the optimization to turn reshape into nop: - Tag a fused function as reshape only if it only contains reshape. - Update memory planner to force input output to share the same piece of memory - Update the graph runtime codegen to emit nop when reshape only function is encountered. * Address review comments. * Additional comment and TODOs on the rationale
* [RELAY] Turn reshape into nop in graph executor backend. Previously we are generating the function calls for reshape. This PR updates the optimization to turn reshape into nop: - Tag a fused function as reshape only if it only contains reshape. - Update memory planner to force input output to share the same piece of memory - Update the graph runtime codegen to emit nop when reshape only function is encountered. * Address review comments. * Additional comment and TODOs on the rationale
* [RELAY] Turn reshape into nop in graph executor backend. Previously we are generating the function calls for reshape. This PR updates the optimization to turn reshape into nop: - Tag a fused function as reshape only if it only contains reshape. - Update memory planner to force input output to share the same piece of memory - Update the graph runtime codegen to emit nop when reshape only function is encountered. * Address review comments. * Additional comment and TODOs on the rationale
* [RELAY] Turn reshape into nop in graph executor backend. Previously we are generating the function calls for reshape. This PR updates the optimization to turn reshape into nop: - Tag a fused function as reshape only if it only contains reshape. - Update memory planner to force input output to share the same piece of memory - Update the graph runtime codegen to emit nop when reshape only function is encountered. * Address review comments. * Additional comment and TODOs on the rationale
Previously we are generating the function calls for reshape.
This PR updates the optimization to turn reshape into nop: