Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the FInplaceIdentity #2572

Merged
merged 2 commits into from
Feb 18, 2019
Merged

Fix the FInplaceIdentity #2572

merged 2 commits into from
Feb 18, 2019

Conversation

ptrendx
Copy link
Member

@ptrendx ptrendx commented Feb 6, 2019

Currently NNVM operators that could be identity under certain conditions (e.g. Cast from type A to type A) cannot be marked with FInplaceIdentity option. It is because PR #1696 changed the requirements of inplace optimization (including identity) to not check for the exact type match, and the operator itself does not have sufficient information inside FInplaceIdentity function to be able to set the option only if it is actually identity.

This PR restricts identity in the memory plan pass to honor FInplaceIdentity option from operator only if shape sizes match and if types of input and output are bitwise compatible (by which I mean cast from dtype1 to dtype2 does not change anything in memory, like going from int32_t to uint32_t).

@eric-haibin-lin FYI
Adding author of PR #1696 @ZhennanQin for comment if this approach still satisfies their needs.

@@ -39,6 +40,22 @@ static int GetDTypeSize(int type_flag) {
}
}

static bool BitwiseCompatibleTypes(int type1, int type2) {
static std::set<std::pair<int, int>> compatible_pairs { {kUint8, kInt8},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this set is designed for cast op. For others, it may not suitable. For example, saturate_cast from kUint8 to kInt8 will change trunk data. Relu from kInt8 to kUint8 will also break this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to imagine relu and saturate_cast to advertise themselves as identites.
That said, I made this exception for types to accommodate your usecase as I understand it, if you don't need that then I can just replace the check with strict type check, that will also be fine (and slightly safer).

Copy link
Contributor

@ZhennanQin ZhennanQin Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relu on uint8 and saturate_cast with same dtype are exactly the same cases as cast, as they are redundant operations. According to your approach, they should add identity attributes. But this will break the cases relu from int8 to uint8 and saturate_cast from uint8 to int8.

If replacing the check with strict type check, then identity can't help on casting from uint8 to int8. Also, relu on fp32 will have trouble.

Overall, you can't find a unified relu here to cover all cases. A graph pass is a better solution as it can see the operator and make different relu for each operator.

@ZhennanQin
Copy link
Contributor

I will evalute the effect on mkldnn when CNY holiday ends on next Monday. For this PR itself, I suggest to implement an graph pass alternatively for removing cast with same dtype.

@ptrendx
Copy link
Member Author

ptrendx commented Feb 7, 2019

It is not only about casts with the same dtype - there may be other operations that can be treated as identity, e.g. pass through some types and change other. Currently they can't advertise itself as potential identities, which is a bug.

@ZhennanQin
Copy link
Contributor

From my point of view, cast shouldn't have identity attributes because for most of the usecase, cast will modify trunk data. It's abuse of using of identity for working around redundant operations. This should be implemented as a graph pass that detect and delete those redundant operations, including cast with same dtype, elewise_add with 0, relu on uint8, bn with kernel=[1,1] and stride=[1,1], etc.

Otherwise, we have to add identity to all those operators, it's not maintainable. Also, identity can only ensure memory inplace between input and output. Extra check have to be added in each opeators to avoid the computation, making common usecase even slower. Another problem is, the memory copy between previous GPU operator and next identity CPU operator can't be avoided, making identity useless for this case. Overall, this is imperfect solution.

@ptrendx
Copy link
Member Author

ptrendx commented Feb 8, 2019

I will address both of your comments here.

Most of the operations you listed are actually impossible to be handled by the graph pass because e.g. add with 0 depends on the data passed to the operator, which graph pass does not have access to. The decision to skip the computation has to be done inside those operators in that case.
Adding graph pass which needs to know about all those operations (and which operates on names of the operators which could change) is just as problematic to scale as handling the cases inside the operators. The "making common usecase even slower" assessment is wrong too because e.g. in Gluon, which I would argue is a pretty common usecase of NNVM, autograd needs to make the gradient graph every single iteration. Adding full graph pass to that is much more heavy weight than a simple if(dtypes are the same && req == kWriteInplace) in the operator.

For the relu case - for it to be able to claim that it is an identity, it would need to know the type of its input (to ensure it is some unsigned integral type), which it can't do in the current architecture of NNVM. It still can say that it wants to be in place and, if there is no other node using its inputs, it will get kWriteInplace, which, using the same type of if statement at runtime, enables it to skip the computation.

Making the check for both inplace and inplaceidentity more loose in your PR (instead of just inplace) made FInplaceIdentity usable only by reshape and its variants(flatten and expand_dims). This does not make sense.

@ZhennanQin
Copy link
Contributor

optimizing add with 0 is also impossible within operator because checking if input is all 0 is too heavy. It can be done with graph pass when we have constant node in graph. For rest of case, graph pass can handle them for now. Graph pass can also handle multple operators case like A + B - B.

"making common usecase even slower" means we have to run if(dtypes are the same && req == kWriteInplace) in operator many times. For non-redundant use case, this is overhead and will increase latency. Maybe you think if(dtypes are the same && req == kWriteInplace) is negligible, but actually that check is not sufficient, to cover the case casting int8 to uint8, the full check should be like
if((dtypes are the same || BitwiseCompatibleTypes(in_dtype, out_dtype)) && req == kWriteInplace)
There's lots of comparison in BitwiseCompatibleTypes, making this check not negligible.

autograd needs to make the gradient graph every single iteration. Adding full graph pass to that is much more heavy weight. The pass itself is just a collection of operator identity requirements. On the worst situation that we have to run this pass every single iteration, the pass execution time should be roughly same as the check inside operator as they are doing the same check. On symbolic side, this pass will only run once on symbol bind, which is much efficient.

For the relu case, it can't do in the current architecture of NNVM. Actually, it's doable in current NNVM. But have to change BitwiseCompatibleTypes() to something like IsUnsignedIntegralType(). That's why I'm saying BitwiseCompatibleTypes is designed for cast op. Of course using IsUnsignedIntegralType in memory plan pass doesn't make sense as it's works for relu only. So the real problem is, When registering Identity attributes, the regsiter function doesn't know the input and output dtypes. So you want to put the dtype restriction inside memory planning. But different operators have different dtype restriction, it's impossible to use single restriction for all operators. Maybe adding dtypes as arguments of identity regsiter function is a way to solve it.

In my previous PR, I just loose the restriction of inplace and identity to make them can be used on dtype mismatch case. Overall their usage is enlarged rather than restricted. On my opinion, Simply adding restriction back for supporting some corner cases is not a smart choice.

Finally, I'm not the maintainer of this pass, and above is just a suggestion for technical disscusion. I'll try this PR on mkldnn next Monday. If this doesn't break the usage of mkldnn. I'm fine with this PR.

@ZhennanQin
Copy link
Contributor

I did some experiments with this PR, and doesn't see any issue with mkldnn. Even removing

  if (compatible_pairs.find(std::make_pair(type1, type2)) != compatible_pairs.end() ||
      compatible_pairs.find(std::make_pair(type2, type1)) != compatible_pairs.end()) {
    return true;
  }

is still OK for MKLDNN.

I think it's because this PR only adds restriction with FInplaceIdentity while FInplaceOption remains unchanged. And MKLDNN only uses FInplaceOption. If we have to add this restriction to FInplaceIdentity, I suggest to remove compatible type check above because it's quite specific for cast op, and may cause misunderstanding for other developer.

@yzhliu
Copy link
Member

yzhliu commented Feb 14, 2019

@tqchen tqchen merged commit 8518c7d into apache:master Feb 18, 2019
@tqchen
Copy link
Member

tqchen commented Feb 18, 2019

Thanks @ptrendx @ZhennanQin @eric-haibin-lin @yzhliu this is now merged

wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
@yzhliu yzhliu mentioned this pull request Mar 2, 2019
28 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants