Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model conversion tool] Support fusing Conv+Add #7799

Merged
merged 21 commits into from
Jul 12, 2023

Conversation

Linchenn
Copy link
Collaborator

@Linchenn Linchenn commented Jul 2, 2023

(Conv means PointwiseConv, DepthwiseConv or generalConv here.)

Problem

The tool currently supports 'Conv + BiasAdd + Activiation' fusing, but this fusing requires the op of 'BiasAdd' to be 'BiasAdd', while 'Conv + AddV2 + Activiation' could not be fused even though they are mathematically same.

Candidate solutions

Assume we could not fix grappler for such missed fusing pattern. There are a couple of candidate solutions:

  1. [Current PR] Add one pass to convert all AddV2 ops that could be fused with Conv to BiasAdd op. After the conversion, when the graph is passed to the grappler, the graph could be fused as expected.
  2. Add two passes to fuse 'Conv + AddV2 + Activiation' (one for Conv and one for DepthwiseConv). Cons: This would require duplicate logics because we need to re-write a pass to fuse 'Conv + BiasAdd + Activiation' in addition to grappler's implementation.

Either way, the tool has to find out 'AddV2' nodes to fuse, with the following conditions:

  1. The ancestor node has to be Conv or DepthwiseConv op.
  2. The current node is the only successor of the ancestor.
  3. The current node 'AddV2' could be converted to 'BiasAdd'.

Misc.

From TF API doc, the major difference between 'AddV2' and 'Add' is the data type of input. The previous one additionally supports 'uint16, uint32, uint64', while the later one additionally supports 'string'.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

or ancestor_node.op == 'DepthwiseConv2dNative') \
and len(graph_rewrite_util.get_output_node_names(input_node_map, ancestor_node_name)):
node.op = 'BiasAdd'
node.attr['data_format'].s = bytes('NHWC', 'utf-8')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this data_format info should come from the ancestor node

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pyu10055 If bias is a 1D tensor, the AddV2 op always broadcast/adds it to the last dimension of the ancestor's results, so this behavior is always NHWC BiasAdd.

@Linchenn Linchenn requested a review from mattsoulanille July 6, 2023 18:38
@Linchenn
Copy link
Collaborator Author

Linchenn commented Jul 6, 2023

Also took a look at the models that do not fuse ops as expected, mainly for conv/depthwise+bias:

  • The op name is 'addV2' instead of 'biasAdd': MobileBert.
  • The op name is 'add' instead of 'biasAdd': Coco-SSD-MobileNetV2, Coco-SSD-MobileNetV1, DeepLabV3-pascal, DeepLabV3-ade20k, AutoML Object, AutoML Image.
  • The conversion of the following models might be too old to apply fusing ops: TextToxicity, posenet-MobileNetV1
    posenet-ResNet50, USE.

cc @qjia7 @mattsoulanille

@Linchenn
Copy link
Collaborator Author

Linchenn commented Jul 6, 2023

Update: added conversion supports for 'Add' op fusing, in addition to 'AddV2' because of the use cases of the last comment.

@qjia7
Copy link
Contributor

qjia7 commented Jul 7, 2023

Also took a look at the models that do not fuse ops as expected, mainly for conv/depthwise+bias:

  • The op name is 'addV2' instead of 'biasAdd': MobileBert.
  • The op name is 'add' instead of 'biasAdd': Coco-SSD-MobileNetV2, Coco-SSD-MobileNetV1, DeepLabV3-pascal, DeepLabV3-ade20k, AutoML Object, AutoML Image.
  • The conversion of the following models might be too old to apply fusing ops: TextToxicity, posenet-MobileNetV1
    posenet-ResNet50, USE.

Great work! After this PR is merged, the benchmarks models can be updated with this PR, right? Look forward to great perf improvement with fused models.

@Linchenn
Copy link
Collaborator Author

Linchenn commented Jul 7, 2023

Great work! After this PR is merged, the benchmarks models can be updated with this PR, right?

Yes

Comment on lines +133 to +139
def get_output_node_names(node_map, target):
output_node_names = []
for name, node in node_map.items():
for input_name in node.input:
if target == input_name:
output_node_names.append(name)
return output_node_names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised we have node.input but not node.output.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, node.input is the edge information for model topology, node.output would have the duplicate information.

…dd.py

Co-authored-by: Matthew Soulanille <matthew@soulanille.net>
@Linchenn Linchenn requested a review from mattsoulanille July 11, 2023 21:46
@Linchenn Linchenn enabled auto-merge (squash) July 12, 2023 20:43
@Linchenn Linchenn merged commit 1137ef0 into tensorflow:master Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants