-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay][Legalize][ARM_CPU] Handling NHWC layout for arm_cpu. #3754
Conversation
Inserting In summary, we should support NHWC layout like here: https://github.com/dmlc/tvm/blob/master/topi/python/topi/arm_cpu/bitserial_conv2d.py#L264 and it is not difficult and only need our minor work. Unified one NCHW sounds good but is not best in practical if we want to get the best performance (which is one key factor to let users use TVM or not). |
@FrozenGene I agree writing schedule for NHWC is a good long-term solution. However, for the time being, can this be merged in? Current state is bad because we fail compilation. As we get time, we can start working on the NHWC schedule. If you have already looked into it and that can be open-sourced, that will really helpful as well. |
I am busy in our internal DSP product. I can not make sure whether @jackwish could have interest and have time to help it, I think we could contribute it back. |
Ok, in that case, I would suggest current PR as an intermediate step, until a better NHWC schedule is merged. Performance is definitely one of the key factors for using TVM. But, equally important is portability, both across frameworks and HW devices. It is fine to be in a temporary stage where things are functional and not performant as compared to a stage where we are neither functional nor performant. Currently, we fall into latter bucket for TFLite models, which makes it harder to convince edge device vendors (who like TFLite) to try TVM. |
topi/python/topi/nn/conv2d.py
Outdated
---------- | ||
attrs : nnvm.top.AttrDict or tvm.attrs.Attrs | ||
Attributes of current convolution | ||
inputs : nnvm.symbol or tvm.relay.Expr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arn't we going to discontinue nnvm support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Yes, I have made it work only for Relay. Changed the comments to reflect that.
e5f278e
to
0ac9ba7
Compare
We may need some internal discussions on this... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for ping. Are we using this to make it possible to test models (I mean something like mobilenet)?
Yes |
@FrozenGene I think this PR doesn't introduce transpose by replacing conv2d NHWC directly with conv2d NCHW. This partially resolves our issue. In the long term, we might need #3670 to better manage graph level layout conversion. For topi schedule, I suggest we only add schedules for new layout when obvious performance benefit is shown. Otherwise we can just leverage existing layout schedules and convert nodes into that layout. |
Yeah, we have investigated NHWC and NCHW layout in quantization model deeply and find NHWC could make us get better performance, otherwise we will introduce |
@FrozenGene I see. I think there is no harm that we improve layout support and topi schedule at the same time. This PR can serve as a short term solution for arm_cpu fp32. |
0ac9ba7
to
a23c552
Compare
Thanks! |
Note | ||
---- | ||
Unlike other TOPI functions, this function operates on both graph level and operator level, | ||
so we have to pass 'F' to make it support our two versions of graph IR, NNVM and Relay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anijain2305 why are NNVM and F
argument still mentioned here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for pointing out. I copied the description from other functions which have this comment. Will send a separate PR to clean up the comments.
Drafted the ARM NHWC schedule PR #3859, if you guys have interest. |
Relevant Issue - #2519
Currently, the TFLite parsed models fail for ARM devices. The reason is TFLite models are NHWC and we don't support NHWC layout in ARM. However, TF models (which are also NHWC), are able to run on ARM devices. This is because of TF parser adds layout transforms while parsing the graph (which is not a good idea as parsers should retain the framework layout).
As a first step, this PR adds a legalize function for conv2d for arm_cpu to add transposes before and after to use the NCHW layout. This gets the TFLite models working for ARM cpus.
A follow-up work requires TF parser cleanup. Parsers should keep the frontend layout. So, we can add legalize for all the targets to fall back to NCHW for conv and then cleanup TF parser. (Let me know if I should open up a new Issue for tracking the tasks).
@yzhliu @FrozenGene @apivovarov @kevinthesun @yongwww