Add support for tiling LinalgExt::UnPackOp. #10905

hanhanW · 2022-10-25T20:31:15Z

Idea

The main issue is about incomplete tile. Since all the dimensions are orthogonal, discussing 1-d unpack case is enough. The core idea is to make the input slice have complete tiles. In this case, a larger unpacked tile will be created. We'll need an extract_slice op to shift and truncate the output.

Example

Let's take Nn_to_N as an example. Say that N=32, n=8, and tiling_size=15. The coordinates of second tile (i.e., result[15..31]) are [(1, 7), (2, 0,), (2, 1) ... (3, 6), (3, 7)]. The first row and the last row are incomplete in terms of inputs. It's impossible to represent an unpack op using the coordinates. Because the input has higher rank and the math computation of coordinate is using mod and ceilDiv. That's very tricky.

To represent the unpack op, we have to complete the rows. I.e., the input coordinates would start with (1, 0); end with (3, 7). In this context, the tiled unpack produces a (3 * n) elements because there are 3 rows in total. Follow by a tensor.extract_slice op, we can get the actual result.

The PR relaxes the condition in tiling algorithm because two operations are generated when tiling a unpack op. Since the tiling implementation is using filter, all the generated ops should apply the filter. Otherwise, it runs into infinite loops. (Because the filter is not applied to tiled unpack op.)

\# Idea The main issue is about incomplete tile. Since all the dimensions are orthogonal, discussing 1-d unpack case is enough. The core idea is to make the input slice have complete tiles. In this case, a larger unpacked tile will be created. We'll need an extract_slice op to shift and truncate the output. \# Example Let's take Nn_to_N as an example. Say that N=32, n=8, and tiling_size=15. The coordinates of second tile (i.e., `result[15..31]`) are `[(1, 7), (2, 0,), (2, 1) ... (3, 6), (3, 7)]`. The first row and the last row are incomplete in terms of inputs. It's impossible to represent an unpack op using the coordinates. Because the input has higher rank and the math computation of coordinate is using mod and ceilDiv. That's very tricky. To represent the unpack op, we have to complete the rows. I.e., the input coordinates would start with `(1, 0)`; end with `(3, 7)`. In this context, the tiled unpack produces a (3 * n) elements because there are 3 rows in total. Follow by a tensor.extract_slice op, we can get the actual result. The PR relaxes the condition in tiling algorithm because two operations are generated when tiling a unpack op. Since the tiling implementation is using filter, all the generated ops should apply the filter. Otherwise, it runs into infinite loops. (Because the filter is not applied to tiled unpack op.)

hanhanW · 2022-10-25T20:32:06Z

@chelini this is the implementation of tiling unpack op, PTAL.

MaheshRavishankar

Ill review it in a bit, but FYI, the Tiling implementation here is only used for testing. The actual tiling used in IREE is implemented upstream as scf::tileUsingSCFForOp.

hanhanW · 2022-10-25T22:23:11Z

I did not consider outer_dim_perms in this PR, I'll think about it and update it later. It should be a minor change like #10907

hanhanW · 2022-10-25T23:20:27Z

I added the support for outer_dim_perms as well. I'm quite sure they are correct when writing the lit test. The dimensions map to each other correctly. :-)

chelini · 2022-10-26T11:32:04Z

...-external-projects/iree-dialects/include/iree-dialects/Dialect/LinalgExt/Passes/Transforms.h

 struct TiledOp {
-  /// Tiled op.
-  Operation *op;
+  /// Tiled operations that are created during tilng.


nit: tilng -> tiling

chelini · 2022-10-26T11:38:22Z

llvm-external-projects/iree-dialects/lib/Dialect/LinalgExt/IR/LinalgExtOps.cpp

+UnPackOp::getTiledImplementation(OpBuilder &builder,
+                                 ArrayRef<OpFoldResult> offsets,
+                                 ArrayRef<OpFoldResult> sizes) {
+  if (!hasTensorSemantics())


Why do we need to be at the tensor level? It is because the op needs an output tensor?

The output of tiled unpack is larger than tiling size because we have to handle incomplete tile. Restricting it at tensor addresses the issue. There are a couple reasons lead me to adding the restriction.

The upstream version will be at tensor dialect, so having it works at tensor level SGTM.

We'll need a larger buffer to store temp result, and copy the data from temp buffer to destination. We can not reuse the destination buffer because the producing output has more data. Vectorization could potentially address the extra buffer issue. Because everything is vector type, they are going to be stored in registers.

In IREE's pipeline, we apply the tiling optimization and the vectorization before bufferization. That makes the world easier. Having it works at tensors is good enough for IREE.

We're able to do tiling at memref level, but that introduces buffer allocation. It requires the users to understand that vectorization could remove the allocation. That's too many details for users. Since there are no needs on IREE side, I restrict it at tensor level for now. I'm happy to extend it if there are needs. I'll add a comment for it!

chelini · 2022-10-26T11:53:42Z

@chelini this is the implementation of tiling unpack op, PTAL.

Thanks a lot for the PR!

MaheshRavishankar

This is OK for now, but upstream TilingInterface might need some changes too?

llvm-external-projects/iree-dialects/lib/Dialect/LinalgExt/IR/LinalgExtOps.cpp

MaheshRavishankar

Actually clicked approve by mistake. Requesting changes for not materializing the arith.constant .. : index and affine.apply ops.

hanhanW · 2022-10-26T22:02:32Z

This is OK for now, but upstream TilingInterface might need some changes too?

I'm not pretty sure if upstream version need some changes or not. My prototype works e2e in #10823 In that PR, I don't modify upstream codes. This PR is enabling the tiling within LinalgExt scope. I'll take look at it when connecting things altogether.

llvm-external-projects/iree-dialects/lib/Dialect/LinalgExt/IR/LinalgExtOps.cpp

# Idea The main issue is about incomplete tile. Since all the dimensions are orthogonal, discussing 1-d unpack case is enough. The core idea is to make the input slice have complete tiles. In this case, a larger unpacked tile will be created. We'll need an extract_slice op to shift and truncate the output. # Example Let's take Nn_to_N as an example. Say that N=32, n=8, and tiling_size=15. The coordinates of second tile (i.e., `result[15..31]`) are `[(1, 7), (2, 0,), (2, 1) ... (3, 6), (3, 7)]`. The first row and the last row are incomplete in terms of inputs. It's impossible to represent an unpack op using the coordinates. Because the input has higher rank and the math computation of coordinate is using mod and ceilDiv. That's very tricky. To represent the unpack op, we have to complete the rows. I.e., the input coordinates would start with `(1, 0)`; end with `(3, 7)`. In this context, the tiled unpack produces a (3 * n) elements because there are 3 rows in total. Follow by a tensor.extract_slice op, we can get the actual result. The PR relaxes the condition in tiling algorithm because two operations are generated when tiling a unpack op. Since the tiling implementation is using filter, all the generated ops should apply the filter. Otherwise, it runs into infinite loops. (Because the filter is not applied to tiled unpack op.)

Some operations need to generate multiple operations when implementing the tiling interface. Here is a sound example in IREE, see iree-org/iree#10905 for more details. Reviewed By: mravishankar Differential Revision: https://reviews.llvm.org/D137300

hanhanW requested a review from MaheshRavishankar as a code owner October 25, 2022 20:31

hanhanW requested a review from nicolasvasilache October 25, 2022 20:31

clang-format

e1f26a6

MaheshRavishankar reviewed Oct 25, 2022

View reviewed changes

hanhanW added 2 commits October 25, 2022 16:18

Add support for outer_dim_perms case.

65072e1

Merge branch 'main' into tile-unpack-2

43450f6

chelini reviewed Oct 26, 2022

View reviewed changes

address comments and remove unused code

888a6d4

MaheshRavishankar approved these changes Oct 26, 2022

View reviewed changes

llvm-external-projects/iree-dialects/lib/Dialect/LinalgExt/IR/LinalgExtOps.cpp Show resolved Hide resolved

MaheshRavishankar requested changes Oct 26, 2022

View reviewed changes

hanhanW requested a review from MaheshRavishankar October 26, 2022 23:04

MaheshRavishankar approved these changes Oct 27, 2022

View reviewed changes

llvm-external-projects/iree-dialects/lib/Dialect/LinalgExt/IR/LinalgExtOps.cpp Show resolved Hide resolved

hanhanW merged commit 72446a0 into iree-org:main Oct 27, 2022

hanhanW deleted the tile-unpack-2 branch October 27, 2022 17:36

This was referenced Dec 9, 2022

[vmvx] Fix sitofp lowering to consider correct type #11489

Merged

Delete unused code in tensorflow/iree-dialects (WIP) #11513

Merged

[vmvx] Relax requirement on fptosi lowering #11553

Merged

jpienaar mentioned this pull request Dec 19, 2022

Set error message for where to report bugs #11602

Merged

This was referenced Dec 30, 2022

Add missing python package dependency #11684

Merged

[cmake] Move inclusion of tools directory up #11685

Merged

Add support for tiling LinalgExt::UnPackOp. #10905

Add support for tiling LinalgExt::UnPackOp. #10905

Uh oh!

Conversation

hanhanW commented Oct 25, 2022

Idea

Example

Uh oh!

hanhanW commented Oct 25, 2022

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

hanhanW commented Oct 25, 2022

Uh oh!

hanhanW commented Oct 25, 2022

Uh oh!

chelini Oct 26, 2022

Choose a reason for hiding this comment

Uh oh!

chelini Oct 26, 2022

Choose a reason for hiding this comment

Uh oh!

hanhanW Oct 26, 2022

Choose a reason for hiding this comment

Uh oh!

chelini Oct 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chelini commented Oct 26, 2022

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

hanhanW commented Oct 26, 2022

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chelini Oct 27, 2022 •

edited

Loading