Dim order in delegates - dim order tagging and partition dim order in/out #8333
Replies: 1 comment
-
We have discussed this internally, and in my opinion, we currently have sufficient dim_order tools and APIs to address this issue both locally (within a single delegate graph) and globally (across multiple delegate subgraphs). Just so you know, the Edge IR requires every single node in the Graph (that include delegates nodes, portable/optimized kernels etc.) to support dim_order corresponding to NCHW and NHWC for 4D tensor, and so on for 3D, and 5D. ET core already supports this. Most of the portable ops supports this. And almost all delegates are compatible with it (i.e. will fail gracefully if they can't handle it). The dialect verifier today should flag nodes with That said, in terms of optimal number of permutes, here's how I envision this playing out:
Note: (1) is not in the scope of the dim_order workstream. Arm and XNNPACK both have plans to implement this in H1. (2) should be trivial. Once this is done, we can look at (3). For (3), we can solve using some simple heuristics at first, i.e. for a graph with all XNNPACK delegates, and NCHW input/output, just need to make 1st XNNPACK delegate output as NHWC and last XNNPACK delegate's output as NCHW, and so on. It can be similarly done using portable ops as well. |
Beta Was this translation helpful? Give feedback.
-
Problem Statement
I'm looking at two dim-order related use cases for the XNNPACK delegate, and am wondering if/how the recent dim order work done in core fits into this. The first use case is to pass graph inputs in as channels last. This is often the format that the image data is naturally in, and it saves the user needing manually convert before passing into ET. It also allows for skipping a dim order conversion before resize when doing bilinear resize in XNNPACK, which is key to keep memory down when taking full-size images as ideally the "raw" input tensor should go directly into the resize. This prevents any huge internal activation tensors from being allocated.
The second use case is supporting per-op mode partitioning (one partition per operator) in XNNPACK for channels-last ops. Per-op mode already exists for NCHW ops, and allows for all tensors to be owned and memory planned by ET. The XNNPACK delegate requires many vision ops to be done in channels-last for efficient kernels, including convolutions. Currently, the XNNPACK delegate asserts that all partition inputs and outputs are standard dim order / channels-first. This means that dim order conversions are inserted around every conv op in per-op mode.
In per_op=False mode, there might be a single partition that looks like this:
However, with per_op=True, it looks like this. Note that excess to_copy nodes surrounding each convolution.
Potential Solutions
Because these issues involve dim order across partition boundaries, it is difficult to solve solely in the delegate. That's where I hope that the framework dim order support can help, or be extended to help. As a backend author, I'd ideally like to be able to tag each partition with an input (and output) dim order. Then, there is a post-delegation pass that inserts the appropriate dim order conversions.
On a second note, for better dim order support, it would be nice to be able to get the dim order of a tensor from metadata, using the new dim order facilities, rather than just looking at the tensor memory_format. Ideally, it would be automatically populated during graph retracing as to stay updated after each pass. Perhaps some sort of dim order spec in the node meta.
Are either or both of these currently possible with the ET dim order work? If not, is there any objection do adding the partition dim order tagging and post-delegation dim order conversion pass I proposed above? I don't have a full technical proposal yet - I'm mostly interested in understanding whether there is any existing machinery to meet this need, and if not, is my proposed approach reasonable?
CC @Gasoonjia @mcr229 @digantdesai
Beta Was this translation helpful? Give feedback.
All reactions