Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TF][Relay] BatchNorm support with run-time mean and variance calculation #4990

Merged
merged 2 commits into from
Mar 8, 2020

Conversation

lfengad
Copy link
Contributor

@lfengad lfengad commented Mar 5, 2020

We observe a great amount of tensorflow models used in our production environment invoke the FusedBatchNorm operator, and a lot of them use this operator in "is_training" mode for model inference. In "is_training" mode, the mean and variance are calculated dynamically using the run-time data without pre-defined. However, the current BatchNorm in TVM requires the mean and variance are given as non-empty tensors.

We add the support for BatchNorm in "is_training" mode, to make it able to dynamically calculate the mean and variance if not given. We first check the mean node and variance node for fused_batch_norm in tensorflow frontend to annotate them if they are empty. Then according to the annotation, we add necessary nodes for the mean and variance calculation in BatchNormToInferUnpack function, which is used to arrange the BatchNorm inference.

In our current implementation, the annotations of the empty mean and variance are added into the name_hint of the corresponding variable nodes. This solution is simple and no need to modify the attributes of the relay operator batch_norm. Alternatively, we can add a bool attribute "is_training" to the relay operator batch_norm. If the mean and variance are empty, "is_training" is set to true. Then according to the attributes of the relay operator, we decide whether to add the nodes for calculating the mean and variance or not in function BatchNormToInferUnpack. This solution needs to modify the relay operator batch_norm.

Any suggestions are welcome! @tqchen @FrozenGene

@FrozenGene
Copy link
Member

I think we needn't add _empty_for_training_mode_inference. If we find mean / variance is VarNode, we should call Mean and Variance.

I don't think we should add is_training flag to relay BatchNorm. This should be done by users to make sure TF's model BatchNorm's is_training flag be false. However, we still have user cases like you mention, so we could support as current implementation and don't add attribute to BatchNorm.

@FrozenGene FrozenGene self-assigned this Mar 5, 2020
@FrozenGene FrozenGene changed the title [Relay][Topi] BatchNorm support with run-time mean and variance calculation [TF][Relay][Topi] BatchNorm support with run-time mean and variance calculation Mar 5, 2020
@lfengad lfengad changed the title [TF][Relay][Topi] BatchNorm support with run-time mean and variance calculation [TF][Relay] BatchNorm support with run-time mean and variance calculation Mar 5, 2020
@lfengad lfengad changed the title [TF][Relay] BatchNorm support with run-time mean and variance calculation [TF][Relay][Topi] BatchNorm support with run-time mean and variance calculation Mar 5, 2020
@lfengad lfengad changed the title [TF][Relay][Topi] BatchNorm support with run-time mean and variance calculation [TF][Relay] BatchNorm support with run-time mean and variance calculation Mar 5, 2020
@lfengad
Copy link
Contributor Author

lfengad commented Mar 5, 2020

I think we needn't add _empty_for_training_mode_inference. If we find mean / variance is VarNode, we should call Mean and Variance.

I don't think we should add is_training flag to relay BatchNorm. This should be done by users to make sure TF's model BatchNorm's is_training flag be false. However, we still have user cases like you mention, so we could support as current implementation and don't add attribute to BatchNorm.

Thank you so much for the quick reply!
Yeah, our current implementation is just to check whether mean / variance is empty VarNode (with zero dimension), and then call Mean and Variance in BatchNormToInferUnpack. Also as I understand, if mean / variance is VarNode but with non-zero dimension, it still has the possibility to hold the given pre-defined constant values and thus cannot be replaced with Mean \ Variance.
Thank you for the discussion!

@FrozenGene
Copy link
Member

Yeah, our current implementation is just to check whether mean / variance is empty VarNode (with zero dimension), and then call Mean and Variance in BatchNormToInferUnpack.

I think our pr could remove name_hint too.

if mean / variance is VarNode but with non-zero dimension, it still has the possibility to hold the given pre-defined constant values and thus cannot be replaced with Mean \ Variance.

Could you give us an example of this condition? I could only imagine models have empty or full pre-defined values. So we should only to calculate it by calling Mean / Variance feed by data or our current implementation of BatchNormToInferUnpack .

@lfengad
Copy link
Contributor Author

lfengad commented Mar 6, 2020

Yeah, our current implementation is just to check whether mean / variance is empty VarNode (with zero dimension), and then call Mean and Variance in BatchNormToInferUnpack.

I think our pr could remove name_hint too.

Yeah, I agree that the better way should be removing name_hint and just checking whether the mean and variance are empty inside BatchNormToInferUnpack, with no need to modify the tensorflow frontend. Previously I have tried this way but got come compilation errors related with data shape checking. If we plan to do in this way, we need to modify the BatchNormRel for data shape assignment, since the current batch_norm relay operator only accept mean and variance with the same shape as the channel dimension. We need to make this relay operator accept mean and variance with empty shape by doing more modifications.

if mean / variance is VarNode but with non-zero dimension, it still has the possibility to hold the given pre-defined constant values and thus cannot be replaced with Mean \ Variance.

Could you give us an example of this condition? I could only imagine models have empty or full pre-defined values. So we should only to calculate it by calling Mean / Variance feed by data or our current implementation of BatchNormToInferUnpack .

What I mean is that for both cases the mean and variance are VarNode. In one case the VarNode is empty without pre-defined values, while in the other case the VarNode is not empty with pre-defined values.
Thank you for the discussion!

@lfengad
Copy link
Contributor Author

lfengad commented Mar 6, 2020

Yeah, our current implementation is just to check whether mean / variance is empty VarNode (with zero dimension), and then call Mean and Variance in BatchNormToInferUnpack.

I think our pr could remove name_hint too.

if mean / variance is VarNode but with non-zero dimension, it still has the possibility to hold the given pre-defined constant values and thus cannot be replaced with Mean \ Variance.

Could you give us an example of this condition? I could only imagine models have empty or full pre-defined values. So we should only to calculate it by calling Mean / Variance feed by data or our current implementation of BatchNormToInferUnpack .

Thanks for your discussion! According to our discussion, I have rewritten the code as in the newest commit. This time, the function BatchNormToInferUnpack is not modified. We only modify the tensorflow frontend for _fused_batch_norm. If mean and variance are empty, we directly add Mean and Variance relay operators before the batch_norm relay operator in the frontend graph, without modifying the batch_norm relay operator at all.
Thank you for the suggestions!

Copy link
Member

@FrozenGene FrozenGene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some final comments

python/tvm/relay/frontend/tensorflow.py Outdated Show resolved Hide resolved
tests/python/frontend/tensorflow/test_bn_dynamic.py Outdated Show resolved Hide resolved
tests/python/frontend/tensorflow/test_bn_dynamic.py Outdated Show resolved Hide resolved
Copy link
Member

@FrozenGene FrozenGene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@FrozenGene
Copy link
Member

Let us wait CI green.

As GitHub has issue: https://discuss.tvm.ai/t/github-issue-the-commit-author-is-wrong-since-today/5880/15 I will merge it after it is solved.

@lfengad
Copy link
Contributor Author

lfengad commented Mar 6, 2020

Let us wait CI green.

As GitHub has issue: https://discuss.tvm.ai/t/github-issue-the-commit-author-is-wrong-since-today/5880/15 I will merge it after it is solved.

Okay, thank you so much for the efforts!

@FrozenGene FrozenGene merged commit ba47786 into apache:master Mar 8, 2020
@FrozenGene FrozenGene added status: accepted and removed status: need update need update based on feedbacks labels Mar 8, 2020
@FrozenGene
Copy link
Member

Thanks @lfengad This is merged now.

@lfengad
Copy link
Contributor Author

lfengad commented Mar 8, 2020

Thanks @lfengad This is merged now.

Thank you so much for your help! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants