[Relay][Legalize][ARM_CPU] Handling NHWC layout for arm_cpu. #3754

anijain2305 · 2019-08-11T23:43:09Z

Relevant Issue - #2519

Currently, the TFLite parsed models fail for ARM devices. The reason is TFLite models are NHWC and we don't support NHWC layout in ARM. However, TF models (which are also NHWC), are able to run on ARM devices. This is because of TF parser adds layout transforms while parsing the graph (which is not a good idea as parsers should retain the framework layout).

As a first step, this PR adds a legalize function for conv2d for arm_cpu to add transposes before and after to use the NCHW layout. This gets the TFLite models working for ARM cpus.

A follow-up work requires TF parser cleanup. Parsers should keep the frontend layout. So, we can add legalize for all the targets to fall back to NCHW for conv and then cleanup TF parser. (Let me know if I should open up a new Issue for tracking the tasks).

@yzhliu @FrozenGene @apivovarov @kevinthesun @yongwww

FrozenGene · 2019-08-12T02:48:49Z

Inserting transpose is not good solution IMO. As demonstrated, which has big performance impact when conv2d is executed fast. Besides us, the community also report this thing too. https://discuss.tvm.ai/t/quantization-autotvm-performance-degradation-due-to-transpose-ops-when-nchw-layout-is-used/3561. I think we should support NHWC layout of spatial pack schedule in ARM CPU instead of inserting transpose op. Morever, @jackwish has investigated NHWC / NCHW layout on ARM CPU advantages / disadvantages internally. If we don't take NHWC layout, we can not get the best performance and beat QNNPACK.

In summary, we should support NHWC layout like here: https://github.com/dmlc/tvm/blob/master/topi/python/topi/arm_cpu/bitserial_conv2d.py#L264 and it is not difficult and only need our minor work. Unified one NCHW sounds good but is not best in practical if we want to get the best performance (which is one key factor to let users use TVM or not).

anijain2305 · 2019-08-12T04:47:47Z

@FrozenGene I agree writing schedule for NHWC is a good long-term solution. However, for the time being, can this be merged in? Current state is bad because we fail compilation. As we get time, we can start working on the NHWC schedule. If you have already looked into it and that can be open-sourced, that will really helpful as well.

FrozenGene · 2019-08-12T06:28:17Z

I am busy in our internal DSP product. I can not make sure whether @jackwish could have interest and have time to help it, I think we could contribute it back.

anijain2305 · 2019-08-12T15:48:22Z

Ok, in that case, I would suggest current PR as an intermediate step, until a better NHWC schedule is merged.

Performance is definitely one of the key factors for using TVM. But, equally important is portability, both across frameworks and HW devices. It is fine to be in a temporary stage where things are functional and not performant as compared to a stage where we are neither functional nor performant. Currently, we fall into latter bucket for TFLite models, which makes it harder to convince edge device vendors (who like TFLite) to try TVM.

wweic · 2019-08-12T21:44:28Z

topi/python/topi/nn/conv2d.py

+    ----------
+    attrs : nnvm.top.AttrDict or tvm.attrs.Attrs
+        Attributes of current convolution
+    inputs : nnvm.symbol or tvm.relay.Expr


Arn't we going to discontinue nnvm support?

Thanks! Yes, I have made it work only for Relay. Changed the comments to reflect that.

zhenhuaw-me · 2019-08-13T13:00:09Z

I am busy in our internal DSP product. I can not make sure whether @jackwish could have interest and have time to help it, I think we could contribute it back.

We may need some internal discussions on this...

zhenhuaw-me

Thanks for ping. Are we using this to make it possible to test models (I mean something like mobilenet)?

topi/python/topi/arm_cpu/conv2d.py

anijain2305 · 2019-08-13T16:23:29Z

Thanks for ping. Are we using this to make it possible to test models (I mean something like mobilenet)?

Yes

kevinthesun · 2019-08-13T17:26:01Z

@FrozenGene I think this PR doesn't introduce transpose by replacing conv2d NHWC directly with conv2d NCHW. This partially resolves our issue. In the long term, we might need #3670 to better manage graph level layout conversion. For topi schedule, I suggest we only add schedules for new layout when obvious performance benefit is shown. Otherwise we can just leverage existing layout schedules and convert nodes into that layout.

FrozenGene · 2019-08-14T02:08:22Z

@FrozenGene I think this PR doesn't introduce transpose by replacing conv2d NHWC directly with conv2d NCHW. This partially resolves our issue. In the long term, we might need #3670 to better manage graph level layout conversion. For topi schedule, I suggest we only add schedules for new layout when obvious performance benefit is shown. Otherwise we can just leverage existing layout schedules and convert nodes into that layout.

Yeah, we have investigated NHWC and NCHW layout in quantization model deeply and find NHWC could make us get better performance, otherwise we will introduce pack operation internally to get the same effect (better locality) of NHWC if using NCHW. However, when we execute the quantized model, we care the performance very much. The pack cost time we can not ignore. @jackwish maybe could share his investigation result more.

kevinthesun · 2019-08-14T17:47:11Z

@FrozenGene I see. I think there is no harm that we improve layout support and topi schedule at the same time. This PR can serve as a short term solution for arm_cpu fp32.

kevinthesun · 2019-08-14T23:44:18Z

Thanks!

…3754)

vinx13 · 2019-08-16T18:06:22Z

topi/python/topi/nn/conv2d.py

+    Note
+    ----
+    Unlike other TOPI functions, this function operates on both graph level and operator level,
+    so we have to pass 'F' to make it support our two versions of graph IR, NNVM and Relay.


@anijain2305 why are NNVM and F argument still mentioned here?

Ah, thanks for pointing out. I copied the description from other functions which have this comment. Will send a separate PR to clean up the comments.

…3754)

zhenhuaw-me · 2019-08-30T03:36:45Z

Drafted the ARM NHWC schedule PR #3859, if you guys have interest.

…3754)

wweic reviewed Aug 12, 2019

View reviewed changes

anijain2305 force-pushed the fix_nhwc_issues branch from e5f278e to 0ac9ba7 Compare August 12, 2019 22:04

zhenhuaw-me reviewed Aug 13, 2019

View reviewed changes

topi/python/topi/arm_cpu/conv2d.py Show resolved Hide resolved

kevinthesun added the status: need update need update based on feedbacks label Aug 14, 2019

[Relay][Legalize][ARM_CPU] Handling NHWC layout for arm_cpu.

a23c552

anijain2305 force-pushed the fix_nhwc_issues branch from 0ac9ba7 to a23c552 Compare August 14, 2019 21:41

kevinthesun approved these changes Aug 14, 2019

View reviewed changes

kevinthesun merged commit 5498e54 into apache:master Aug 14, 2019

wweic pushed a commit to neo-ai/tvm that referenced this pull request Aug 16, 2019

[Relay][Legalize][ARM_CPU] Handling NHWC layout for arm_cpu. (apache#…

121accb

…3754)

vinx13 reviewed Aug 16, 2019

View reviewed changes

anijain2305 added a commit to anijain2305/tvm that referenced this pull request Aug 22, 2019

[Relay][Legalize][ARM_CPU] Handling NHWC layout for arm_cpu. (apache#…

a7ccf8f

…3754)

zhenhuaw-me mentioned this pull request Aug 30, 2019

[TOPI][AutoTVM] NHWC conv2d templates for ARM #3859

Merged

wweic pushed a commit to neo-ai/tvm that referenced this pull request Sep 6, 2019

[Relay][Legalize][ARM_CPU] Handling NHWC layout for arm_cpu. (apache#…

19cfb1f

…3754)

tqchen mentioned this pull request Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relay][Legalize][ARM_CPU] Handling NHWC layout for arm_cpu. #3754

[Relay][Legalize][ARM_CPU] Handling NHWC layout for arm_cpu. #3754

anijain2305 commented Aug 11, 2019

FrozenGene commented Aug 12, 2019 •

edited

Loading

anijain2305 commented Aug 12, 2019 •

edited

Loading

FrozenGene commented Aug 12, 2019

anijain2305 commented Aug 12, 2019 •

edited

Loading

wweic Aug 12, 2019

anijain2305 Aug 12, 2019

zhenhuaw-me commented Aug 13, 2019

zhenhuaw-me left a comment

anijain2305 commented Aug 13, 2019

kevinthesun commented Aug 13, 2019 •

edited

Loading

FrozenGene commented Aug 14, 2019 •

edited

Loading

kevinthesun commented Aug 14, 2019 •

edited

Loading

kevinthesun commented Aug 14, 2019

vinx13 Aug 16, 2019

anijain2305 Aug 16, 2019

zhenhuaw-me commented Aug 30, 2019

[Relay][Legalize][ARM_CPU] Handling NHWC layout for arm_cpu. #3754

[Relay][Legalize][ARM_CPU] Handling NHWC layout for arm_cpu. #3754

Conversation

anijain2305 commented Aug 11, 2019

FrozenGene commented Aug 12, 2019 • edited Loading

anijain2305 commented Aug 12, 2019 • edited Loading

FrozenGene commented Aug 12, 2019

anijain2305 commented Aug 12, 2019 • edited Loading

wweic Aug 12, 2019

Choose a reason for hiding this comment

anijain2305 Aug 12, 2019

Choose a reason for hiding this comment

zhenhuaw-me commented Aug 13, 2019

zhenhuaw-me left a comment

Choose a reason for hiding this comment

anijain2305 commented Aug 13, 2019

kevinthesun commented Aug 13, 2019 • edited Loading

FrozenGene commented Aug 14, 2019 • edited Loading

kevinthesun commented Aug 14, 2019 • edited Loading

kevinthesun commented Aug 14, 2019

vinx13 Aug 16, 2019

Choose a reason for hiding this comment

anijain2305 Aug 16, 2019

Choose a reason for hiding this comment

zhenhuaw-me commented Aug 30, 2019

FrozenGene commented Aug 12, 2019 •

edited

Loading

anijain2305 commented Aug 12, 2019 •

edited

Loading

anijain2305 commented Aug 12, 2019 •

edited

Loading

kevinthesun commented Aug 13, 2019 •

edited

Loading

FrozenGene commented Aug 14, 2019 •

edited

Loading

kevinthesun commented Aug 14, 2019 •

edited

Loading