[TF][Relay] BatchNorm support with run-time mean and variance calculation #4990

lfengad · 2020-03-05T08:02:43Z

We observe a great amount of tensorflow models used in our production environment invoke the FusedBatchNorm operator, and a lot of them use this operator in "is_training" mode for model inference. In "is_training" mode, the mean and variance are calculated dynamically using the run-time data without pre-defined. However, the current BatchNorm in TVM requires the mean and variance are given as non-empty tensors.

We add the support for BatchNorm in "is_training" mode, to make it able to dynamically calculate the mean and variance if not given. We first check the mean node and variance node for fused_batch_norm in tensorflow frontend to annotate them if they are empty. Then according to the annotation, we add necessary nodes for the mean and variance calculation in BatchNormToInferUnpack function, which is used to arrange the BatchNorm inference.

In our current implementation, the annotations of the empty mean and variance are added into the name_hint of the corresponding variable nodes. This solution is simple and no need to modify the attributes of the relay operator batch_norm. Alternatively, we can add a bool attribute "is_training" to the relay operator batch_norm. If the mean and variance are empty, "is_training" is set to true. Then according to the attributes of the relay operator, we decide whether to add the nodes for calculating the mean and variance or not in function BatchNormToInferUnpack. This solution needs to modify the relay operator batch_norm.

Any suggestions are welcome! @tqchen @FrozenGene

merge

FrozenGene · 2020-03-05T08:58:45Z

I think we needn't add _empty_for_training_mode_inference. If we find mean / variance is VarNode, we should call Mean and Variance.

I don't think we should add is_training flag to relay BatchNorm. This should be done by users to make sure TF's model BatchNorm's is_training flag be false. However, we still have user cases like you mention, so we could support as current implementation and don't add attribute to BatchNorm.

lfengad · 2020-03-05T10:16:50Z

I think we needn't add _empty_for_training_mode_inference. If we find mean / variance is VarNode, we should call Mean and Variance.

I don't think we should add is_training flag to relay BatchNorm. This should be done by users to make sure TF's model BatchNorm's is_training flag be false. However, we still have user cases like you mention, so we could support as current implementation and don't add attribute to BatchNorm.

Thank you so much for the quick reply!
Yeah, our current implementation is just to check whether mean / variance is empty VarNode (with zero dimension), and then call Mean and Variance in BatchNormToInferUnpack. Also as I understand, if mean / variance is VarNode but with non-zero dimension, it still has the possibility to hold the given pre-defined constant values and thus cannot be replaced with Mean \ Variance.
Thank you for the discussion!

FrozenGene · 2020-03-05T10:34:54Z

Yeah, our current implementation is just to check whether mean / variance is empty VarNode (with zero dimension), and then call Mean and Variance in BatchNormToInferUnpack.

I think our pr could remove name_hint too.

if mean / variance is VarNode but with non-zero dimension, it still has the possibility to hold the given pre-defined constant values and thus cannot be replaced with Mean \ Variance.

Could you give us an example of this condition? I could only imagine models have empty or full pre-defined values. So we should only to calculate it by calling Mean / Variance feed by data or our current implementation of BatchNormToInferUnpack .

lfengad · 2020-03-06T02:11:06Z

Yeah, our current implementation is just to check whether mean / variance is empty VarNode (with zero dimension), and then call Mean and Variance in BatchNormToInferUnpack.

I think our pr could remove name_hint too.

Yeah, I agree that the better way should be removing name_hint and just checking whether the mean and variance are empty inside BatchNormToInferUnpack, with no need to modify the tensorflow frontend. Previously I have tried this way but got come compilation errors related with data shape checking. If we plan to do in this way, we need to modify the BatchNormRel for data shape assignment, since the current batch_norm relay operator only accept mean and variance with the same shape as the channel dimension. We need to make this relay operator accept mean and variance with empty shape by doing more modifications.

if mean / variance is VarNode but with non-zero dimension, it still has the possibility to hold the given pre-defined constant values and thus cannot be replaced with Mean \ Variance.

Could you give us an example of this condition? I could only imagine models have empty or full pre-defined values. So we should only to calculate it by calling Mean / Variance feed by data or our current implementation of BatchNormToInferUnpack .

What I mean is that for both cases the mean and variance are VarNode. In one case the VarNode is empty without pre-defined values, while in the other case the VarNode is not empty with pre-defined values.
Thank you for the discussion!

lfengad · 2020-03-06T05:49:17Z

Yeah, our current implementation is just to check whether mean / variance is empty VarNode (with zero dimension), and then call Mean and Variance in BatchNormToInferUnpack.

I think our pr could remove name_hint too.

if mean / variance is VarNode but with non-zero dimension, it still has the possibility to hold the given pre-defined constant values and thus cannot be replaced with Mean \ Variance.

Could you give us an example of this condition? I could only imagine models have empty or full pre-defined values. So we should only to calculate it by calling Mean / Variance feed by data or our current implementation of BatchNormToInferUnpack .

Thanks for your discussion! According to our discussion, I have rewritten the code as in the newest commit. This time, the function BatchNormToInferUnpack is not modified. We only modify the tensorflow frontend for _fused_batch_norm. If mean and variance are empty, we directly add Mean and Variance relay operators before the batch_norm relay operator in the frontend graph, without modifying the batch_norm relay operator at all.
Thank you for the suggestions!

tests/python/frontend/tensorflow/test_bn_trainingmod.py

python/tvm/relay/frontend/tensorflow.py

tests/python/frontend/tensorflow/test_bn_dynamic.py

FrozenGene

Some final comments

python/tvm/relay/frontend/tensorflow.py

tests/python/frontend/tensorflow/test_bn_dynamic.py

FrozenGene

LGTM.

FrozenGene · 2020-03-06T16:44:20Z

Let us wait CI green.

As GitHub has issue: https://discuss.tvm.ai/t/github-issue-the-commit-author-is-wrong-since-today/5880/15 I will merge it after it is solved.

lfengad · 2020-03-06T16:47:10Z

Let us wait CI green.

As GitHub has issue: https://discuss.tvm.ai/t/github-issue-the-commit-author-is-wrong-since-today/5880/15 I will merge it after it is solved.

Okay, thank you so much for the efforts!

tests/python/frontend/tensorflow/test_bn_dynamic.py

FrozenGene · 2020-03-08T04:46:47Z

Thanks @lfengad This is merged now.

lfengad · 2020-03-08T04:53:21Z

Thanks @lfengad This is merged now.

Thank you so much for your help! 😄

Merge pull request #1 from apache/master

cd3bcda

merge

FrozenGene self-assigned this Mar 5, 2020

FrozenGene added the status: need review label Mar 5, 2020

FrozenGene changed the title ~~[Relay][Topi] BatchNorm support with run-time mean and variance calculation~~ [TF][Relay][Topi] BatchNorm support with run-time mean and variance calculation Mar 5, 2020

lfengad changed the title ~~[TF][Relay][Topi] BatchNorm support with run-time mean and variance calculation~~ [TF][Relay] BatchNorm support with run-time mean and variance calculation Mar 5, 2020

lfengad changed the title ~~[TF][Relay] BatchNorm support with run-time mean and variance calculation~~ [TF][Relay][Topi] BatchNorm support with run-time mean and variance calculation Mar 5, 2020

lfengad changed the title ~~[TF][Relay][Topi] BatchNorm support with run-time mean and variance calculation~~ [TF][Relay] BatchNorm support with run-time mean and variance calculation Mar 5, 2020

lfengad force-pushed the master branch from 053a874 to d4f5909 Compare March 6, 2020 05:44

FrozenGene requested changes Mar 6, 2020

View reviewed changes

FrozenGene added status: need update need update based on feedbacks and removed status: need review labels Mar 6, 2020

lfengad force-pushed the master branch from d4f5909 to d323cf6 Compare March 6, 2020 10:34

FrozenGene requested changes Mar 6, 2020

View reviewed changes

tests/python/frontend/tensorflow/test_bn_dynamic.py Outdated Show resolved Hide resolved

lfengad force-pushed the master branch from d323cf6 to de2ad59 Compare March 6, 2020 13:41

FrozenGene reviewed Mar 6, 2020

View reviewed changes

python/tvm/relay/frontend/tensorflow.py Outdated Show resolved Hide resolved

tests/python/frontend/tensorflow/test_bn_dynamic.py Outdated Show resolved Hide resolved

tests/python/frontend/tensorflow/test_bn_dynamic.py Outdated Show resolved Hide resolved

lfengad force-pushed the master branch from de2ad59 to 0f20ba4 Compare March 6, 2020 14:38

FrozenGene reviewed Mar 6, 2020

View reviewed changes

tests/python/frontend/tensorflow/test_bn_dynamic.py Show resolved Hide resolved

lfengad force-pushed the master branch from 0f20ba4 to 9fce67b Compare March 6, 2020 16:10

FrozenGene approved these changes Mar 6, 2020

View reviewed changes

lfengad force-pushed the master branch from 9fce67b to 56df65b Compare March 7, 2020 04:27

FrozenGene reviewed Mar 7, 2020

View reviewed changes

tests/python/frontend/tensorflow/test_bn_dynamic.py Outdated Show resolved Hide resolved

Add BN support with run-time mean and variance calculation

2fa29b6

lfengad force-pushed the master branch from 56df65b to 2fa29b6 Compare March 7, 2020 05:45

FrozenGene merged commit ba47786 into apache:master Mar 8, 2020

FrozenGene added status: accepted and removed status: need update need update based on feedbacks labels Mar 8, 2020

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 16, 2020

Add BN support with run-time mean and variance calculation (apache#4990)

9763654

zhiics pushed a commit to neo-ai/tvm that referenced this pull request Apr 17, 2020

Add BN support with run-time mean and variance calculation (apache#4990)

a8287db

ZihengJiang mentioned this pull request Sep 17, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TF][Relay] BatchNorm support with run-time mean and variance calculation #4990

[TF][Relay] BatchNorm support with run-time mean and variance calculation #4990

lfengad commented Mar 5, 2020

FrozenGene commented Mar 5, 2020

lfengad commented Mar 5, 2020

FrozenGene commented Mar 5, 2020

lfengad commented Mar 6, 2020 •

edited

Loading

lfengad commented Mar 6, 2020 •

edited

Loading

FrozenGene left a comment

FrozenGene left a comment

FrozenGene commented Mar 6, 2020

lfengad commented Mar 6, 2020

FrozenGene commented Mar 8, 2020

lfengad commented Mar 8, 2020

[TF][Relay] BatchNorm support with run-time mean and variance calculation #4990

[TF][Relay] BatchNorm support with run-time mean and variance calculation #4990

Conversation

lfengad commented Mar 5, 2020

FrozenGene commented Mar 5, 2020

lfengad commented Mar 5, 2020

FrozenGene commented Mar 5, 2020

lfengad commented Mar 6, 2020 • edited Loading

lfengad commented Mar 6, 2020 • edited Loading

FrozenGene left a comment

Choose a reason for hiding this comment

FrozenGene left a comment

Choose a reason for hiding this comment

FrozenGene commented Mar 6, 2020

lfengad commented Mar 6, 2020

FrozenGene commented Mar 8, 2020

lfengad commented Mar 8, 2020

lfengad commented Mar 6, 2020 •

edited

Loading

lfengad commented Mar 6, 2020 •

edited

Loading