Fix the FInplaceIdentity #2572

ptrendx · 2019-02-06T22:09:44Z

Currently NNVM operators that could be identity under certain conditions (e.g. Cast from type A to type A) cannot be marked with FInplaceIdentity option. It is because PR #1696 changed the requirements of inplace optimization (including identity) to not check for the exact type match, and the operator itself does not have sufficient information inside FInplaceIdentity function to be able to set the option only if it is actually identity.

This PR restricts identity in the memory plan pass to honor FInplaceIdentity option from operator only if shape sizes match and if types of input and output are bitwise compatible (by which I mean cast from dtype1 to dtype2 does not change anything in memory, like going from int32_t to uint32_t).

@eric-haibin-lin FYI
Adding author of PR #1696 @ZhennanQin for comment if this approach still satisfies their needs.

ZhennanQin · 2019-02-07T02:17:47Z

nnvm/src/pass/plan_memory.cc

@@ -39,6 +40,22 @@ static int GetDTypeSize(int type_flag) {
  }
 }

+static bool BitwiseCompatibleTypes(int type1, int type2) {
+  static std::set<std::pair<int, int>> compatible_pairs { {kUint8, kInt8},


It seems this set is designed for cast op. For others, it may not suitable. For example, saturate_cast from kUint8 to kInt8 will change trunk data. Relu from kInt8 to kUint8 will also break this.

It's hard to imagine relu and saturate_cast to advertise themselves as identites.
That said, I made this exception for types to accommodate your usecase as I understand it, if you don't need that then I can just replace the check with strict type check, that will also be fine (and slightly safer).

Relu on uint8 and saturate_cast with same dtype are exactly the same cases as cast, as they are redundant operations. According to your approach, they should add identity attributes. But this will break the cases relu from int8 to uint8 and saturate_cast from uint8 to int8.

If replacing the check with strict type check, then identity can't help on casting from uint8 to int8. Also, relu on fp32 will have trouble.

Overall, you can't find a unified relu here to cover all cases. A graph pass is a better solution as it can see the operator and make different relu for each operator.

ZhennanQin · 2019-02-07T02:22:38Z

I will evalute the effect on mkldnn when CNY holiday ends on next Monday. For this PR itself, I suggest to implement an graph pass alternatively for removing cast with same dtype.

ptrendx · 2019-02-07T16:45:38Z

It is not only about casts with the same dtype - there may be other operations that can be treated as identity, e.g. pass through some types and change other. Currently they can't advertise itself as potential identities, which is a bug.

ZhennanQin · 2019-02-08T05:24:27Z

From my point of view, cast shouldn't have identity attributes because for most of the usecase, cast will modify trunk data. It's abuse of using of identity for working around redundant operations. This should be implemented as a graph pass that detect and delete those redundant operations, including cast with same dtype, elewise_add with 0, relu on uint8, bn with kernel=[1,1] and stride=[1,1], etc.

Otherwise, we have to add identity to all those operators, it's not maintainable. Also, identity can only ensure memory inplace between input and output. Extra check have to be added in each opeators to avoid the computation, making common usecase even slower. Another problem is, the memory copy between previous GPU operator and next identity CPU operator can't be avoided, making identity useless for this case. Overall, this is imperfect solution.

ptrendx · 2019-02-08T17:26:55Z

I will address both of your comments here.

Most of the operations you listed are actually impossible to be handled by the graph pass because e.g. add with 0 depends on the data passed to the operator, which graph pass does not have access to. The decision to skip the computation has to be done inside those operators in that case.
Adding graph pass which needs to know about all those operations (and which operates on names of the operators which could change) is just as problematic to scale as handling the cases inside the operators. The "making common usecase even slower" assessment is wrong too because e.g. in Gluon, which I would argue is a pretty common usecase of NNVM, autograd needs to make the gradient graph every single iteration. Adding full graph pass to that is much more heavy weight than a simple if(dtypes are the same && req == kWriteInplace) in the operator.

For the relu case - for it to be able to claim that it is an identity, it would need to know the type of its input (to ensure it is some unsigned integral type), which it can't do in the current architecture of NNVM. It still can say that it wants to be in place and, if there is no other node using its inputs, it will get kWriteInplace, which, using the same type of if statement at runtime, enables it to skip the computation.

Making the check for both inplace and inplaceidentity more loose in your PR (instead of just inplace) made FInplaceIdentity usable only by reshape and its variants(flatten and expand_dims). This does not make sense.

ZhennanQin · 2019-02-09T02:23:52Z

optimizing add with 0 is also impossible within operator because checking if input is all 0 is too heavy. It can be done with graph pass when we have constant node in graph. For rest of case, graph pass can handle them for now. Graph pass can also handle multple operators case like A + B - B.

"making common usecase even slower" means we have to run if(dtypes are the same && req == kWriteInplace) in operator many times. For non-redundant use case, this is overhead and will increase latency. Maybe you think if(dtypes are the same && req == kWriteInplace) is negligible, but actually that check is not sufficient, to cover the case casting int8 to uint8, the full check should be like
if((dtypes are the same || BitwiseCompatibleTypes(in_dtype, out_dtype)) && req == kWriteInplace)
There's lots of comparison in BitwiseCompatibleTypes, making this check not negligible.

autograd needs to make the gradient graph every single iteration. Adding full graph pass to that is much more heavy weight. The pass itself is just a collection of operator identity requirements. On the worst situation that we have to run this pass every single iteration, the pass execution time should be roughly same as the check inside operator as they are doing the same check. On symbolic side, this pass will only run once on symbol bind, which is much efficient.

For the relu case, it can't do in the current architecture of NNVM. Actually, it's doable in current NNVM. But have to change BitwiseCompatibleTypes() to something like IsUnsignedIntegralType(). That's why I'm saying BitwiseCompatibleTypes is designed for cast op. Of course using IsUnsignedIntegralType in memory plan pass doesn't make sense as it's works for relu only. So the real problem is, When registering Identity attributes, the regsiter function doesn't know the input and output dtypes. So you want to put the dtype restriction inside memory planning. But different operators have different dtype restriction, it's impossible to use single restriction for all operators. Maybe adding dtypes as arguments of identity regsiter function is a way to solve it.

In my previous PR, I just loose the restriction of inplace and identity to make them can be used on dtype mismatch case. Overall their usage is enlarged rather than restricted. On my opinion, Simply adding restriction back for supporting some corner cases is not a smart choice.

Finally, I'm not the maintainer of this pass, and above is just a suggestion for technical disscusion. I'll try this PR on mkldnn next Monday. If this doesn't break the usage of mkldnn. I'm fine with this PR.

ZhennanQin · 2019-02-11T01:17:17Z

I did some experiments with this PR, and doesn't see any issue with mkldnn. Even removing

  if (compatible_pairs.find(std::make_pair(type1, type2)) != compatible_pairs.end() ||
      compatible_pairs.find(std::make_pair(type2, type1)) != compatible_pairs.end()) {
    return true;
  }

is still OK for MKLDNN.

I think it's because this PR only adds restriction with FInplaceIdentity while FInplaceOption remains unchanged. And MKLDNN only uses FInplaceOption. If we have to add this restriction to FInplaceIdentity, I suggest to remove compatible type check above because it's quite specific for cast op, and may cause misunderstanding for other developer.

yzhliu · 2019-02-14T17:01:27Z

@ZhennanQin Please
https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

tqchen · 2019-02-18T21:21:23Z

Thanks @ptrendx @ZhennanQin @eric-haibin-lin @yzhliu this is now merged

Fix the identity

1b58433

ZhennanQin reviewed Feb 7, 2019

View reviewed changes

Get rid of bitwise compatible types

56e88b6

eric-haibin-lin approved these changes Feb 13, 2019

View reviewed changes

ZhennanQin approved these changes Feb 15, 2019

View reviewed changes

ptrendx mentioned this pull request Feb 15, 2019

MXNet AMP (automatic mixed precision) apache/mxnet#14173

Merged

16 tasks

tqchen merged commit 8518c7d into apache:master Feb 18, 2019

tqchen added the status: accepted label Feb 18, 2019

wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019

Fix the FInplaceIdentity (apache#2572)

1853bcc

wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019

Fix the FInplaceIdentity (apache#2572)

3f51747

yzhliu mentioned this pull request Mar 2, 2019

[DEV] TVM v0.6 Roadmap #2623

Closed

28 tasks

yzhliu mentioned this pull request Nov 11, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the FInplaceIdentity #2572

Fix the FInplaceIdentity #2572

ptrendx commented Feb 6, 2019

ZhennanQin Feb 7, 2019

ptrendx Feb 7, 2019

ZhennanQin Feb 8, 2019 •

edited

Loading

ZhennanQin commented Feb 7, 2019

ptrendx commented Feb 7, 2019

ZhennanQin commented Feb 8, 2019

ptrendx commented Feb 8, 2019

ZhennanQin commented Feb 9, 2019

ZhennanQin commented Feb 11, 2019

yzhliu commented Feb 14, 2019

tqchen commented Feb 18, 2019 •

edited

Loading

Fix the FInplaceIdentity #2572

Fix the FInplaceIdentity #2572

Conversation

ptrendx commented Feb 6, 2019

ZhennanQin Feb 7, 2019

Choose a reason for hiding this comment

ptrendx Feb 7, 2019

Choose a reason for hiding this comment

ZhennanQin Feb 8, 2019 • edited Loading

Choose a reason for hiding this comment

ZhennanQin commented Feb 7, 2019

ptrendx commented Feb 7, 2019

ZhennanQin commented Feb 8, 2019

ptrendx commented Feb 8, 2019

ZhennanQin commented Feb 9, 2019

ZhennanQin commented Feb 11, 2019

yzhliu commented Feb 14, 2019

tqchen commented Feb 18, 2019 • edited Loading

ZhennanQin Feb 8, 2019 •

edited

Loading

tqchen commented Feb 18, 2019 •

edited

Loading