-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect vector.transfer_read + vector.transfer_write hoisting #14994
Comments
I think @hanhanW would be the right person to help here but he is out. I'm not familiar with this code, unfortunately. @MaheshRavishankar? |
I've opened a PR in MLIR core with a reduced repro. Hopefully that will help with the discussion. |
I really dont have much context here. I am a bit swamped right now to dig deep here, and try to get context, but might be able to get to it by end of the week (sorry, just setting expectations w.r.t to ping above). |
At the moment, `hoistRedundantVectorTransfers` would hoist the `vector.transfer_read`/`vector.transfer_write` pair in this function: ```mlir func.func @no_hoisting_write_to_memref(%rhs: i32, %arg1: vector<1xi32>) { %c0_i32 = arith.constant 0 : i32 %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %c4 = arith.constant 4 : index %c20 = arith.constant 20 : index %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x1x2xi32> %cast = memref.cast %alloca : memref<1x1x2xi32> to memref<1x1x2xi32> %collapsed_1 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32> scf.for %_ = %c0 to %c20 step %c4 { %collapsed_2 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32> %lhs = vector.transfer_read %collapsed_1[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32> %acc = vector.transfer_read %collapsed_2[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32> %op = vector.outerproduct %lhs, %rhs, %acc {kind = #vector.kind<add>} : vector<1xi32>, i32 vector.transfer_write %op, %collapsed_1[%c0] {in_bounds = [true]} : vector<1xi32>, memref<2xi32> } return } ``` as follows: ```mlir func.func @no_hoisting_write_to_memref(%arg0: i32, %arg1: vector<1xi32>) { %c0_i32 = arith.constant 0 : i32 %c0 = arith.constant 0 : index %c4 = arith.constant 4 : index %c20 = arith.constant 20 : index %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x1x2xi32> %collapse_shape = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32> %collapse_shape_0 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32> %0 = vector.transfer_read %collapse_shape[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32> %1 = vector.transfer_read %collapse_shape_0[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32> %2 = scf.for %arg2 = %c0 to %c20 step %c4 iter_args(%arg3 = %0) -> (vector<1xi32>) { %3 = vector.outerproduct %arg3, %arg0, %1 {kind = #vector.kind<add>} : vector<1xi32>, i32 scf.yield %3 : vector<1xi32> } vector.transfer_write %2, %collapse_shape[%c0] {in_bounds = [true]} : vector<1xi32>, memref<2xi32> return } ``` This is not safe. While one argument for `vector.outerproduct` (`%rhs` from the original loop) is correctly being forwarded via `iter_args`, the other one (`%acc` from the original loop) is not. This patch disables hoisting in cases where the source of "candidate" `vector.transfer_read` aliases with some other `memref`. A more generic approach would be to make sure that all values are correctly forwarded via `iter_args`, but that would require involving alias analysis. [1] Based on iree-org/iree#14994.
I have updated llvm/llvm-project#66930 - that feels like the right fix to me. There's also a small repro that explains what the issue is. As always, feedback is greatly appreciated! |
…hoisting [mlir][vector] Prevent incorrect vector.transfer_{read|write} hoisting Refines how opportunities for hoisting vector.transfer_{read|write} pairs are identified. More specifically, rather than looking for specific MemRef ops that could lead to aliasing, this patch updates the hoisting logic to check whether the underlying Op implements `ViewLikeOpInterface`. Additional condition is added to prevent hoisting when one of the source operands implements `ViewLikeOpInterface`. This was motivated by the following example [1]: ```mlir func.func @no_hoisting_write_to_memref(%rhs: i32, %arg1: vector<1xi32>) { %c0_i32 = arith.constant 0 : i32 %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %c4 = arith.constant 4 : index %c20 = arith.constant 20 : index %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x1x2xi32> %cast = memref.cast %alloca : memref<1x1x2xi32> to memref<1x1x2xi32> %collapsed_1 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32> scf.for %_ = %c0 to %c20 step %c4 { %collapsed_2 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32> %lhs = vector.transfer_read %collapsed_1[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32> %acc = vector.transfer_read %collapsed_2[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32> %op = vector.outerproduct %lhs, %rhs, %acc {kind = #vector.kind<add>} : vector<1xi32>, i32 vector.transfer_write %op, %collapsed_1[%c0] {in_bounds = [true]} : vector<1xi32>, memref<2xi32> } return } ``` Originally, it would be rewritten as follows: ```mlir func.func @no_hoisting_write_to_memref(%arg0: i32, %arg1: vector<1xi32>) { %c0_i32 = arith.constant 0 : i32 %c0 = arith.constant 0 : index %c4 = arith.constant 4 : index %c20 = arith.constant 20 : index %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x1x2xi32> %collapse_shape = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32> %collapse_shape_0 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32> %0 = vector.transfer_read %collapse_shape[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32> %1 = vector.transfer_read %collapse_shape_0[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32> %2 = scf.for %arg2 = %c0 to %c20 step %c4 iter_args(%arg3 = %0) -> (vector<1xi32>) { %3 = vector.outerproduct %arg3, %arg0, %1 {kind = #vector.kind<add>} : vector<1xi32>, i32 scf.yield %3 : vector<1xi32> } vector.transfer_write %2, %collapse_shape[%c0] {in_bounds = [true]} : vector<1xi32>, memref<2xi32> return } ``` This was not safe. While one argument for `vector.outerproduct` was correctly being forwarded via `iter_args` (`%rhs` from the original loop), the other one wasn't (`%acc` from the original loop). [1] Based on iree-org/iree#14994.
#66930) At the moment, `hoistRedundantVectorTransfers` would hoist the `vector.transfer_read`/`vector.transfer_write` pair in this function: ```mlir func.func @no_hoisting_write_to_memref(%rhs: i32, %arg1: vector<1xi32>) { %c0_i32 = arith.constant 0 : i32 %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %c4 = arith.constant 4 : index %c20 = arith.constant 20 : index %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x1x2xi32> %cast = memref.cast %alloca : memref<1x1x2xi32> to memref<1x1x2xi32> %collapsed_1 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32> scf.for %_ = %c0 to %c20 step %c4 { %collapsed_2 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32> %lhs = vector.transfer_read %collapsed_1[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32> %acc = vector.transfer_read %collapsed_2[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32> %op = vector.outerproduct %lhs, %rhs, %acc {kind = #vector.kind<add>} : vector<1xi32>, i32 vector.transfer_write %op, %collapsed_1[%c0] {in_bounds = [true]} : vector<1xi32>, memref<2xi32> } return } ``` as follows: ```mlir func.func @no_hoisting_write_to_memref(%arg0: i32, %arg1: vector<1xi32>) { %c0_i32 = arith.constant 0 : i32 %c0 = arith.constant 0 : index %c4 = arith.constant 4 : index %c20 = arith.constant 20 : index %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x1x2xi32> %collapse_shape = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32> %collapse_shape_0 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32> %0 = vector.transfer_read %collapse_shape[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32> %1 = vector.transfer_read %collapse_shape_0[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32> %2 = scf.for %arg2 = %c0 to %c20 step %c4 iter_args(%arg3 = %0) -> (vector<1xi32>) { %3 = vector.outerproduct %arg3, %arg0, %1 {kind = #vector.kind<add>} : vector<1xi32>, i32 scf.yield %3 : vector<1xi32> } vector.transfer_write %2, %collapse_shape[%c0] {in_bounds = [true]} : vector<1xi32>, memref<2xi32> return } ``` This is not safe. While one argument for `vector.outerproduct` (`%rhs` from the original loop) is correctly being forwarded via `iter_args`, the other one (`%acc` from the original loop) is not. This patch disables hoisting in cases where the source of "candidate" `vector.transfer_read` aliases with some other `memref`. A more generic approach would be to make sure that all values are correctly forwarded via `iter_args`, but that would require involving alias analysis. [1] Based on iree-org/iree#14994.
Resolved within MLIR |
Hi,
For the example below,
hoistRedundantVectorTransfers
incorrectly hoists the followingvector.transfer_read
/vector.transfer_write
pair out of the most inner loop:This particular instance of
hoistRedundantVectorTransfers
is invoked in OptimizeVectorTransferPass.cpp, i.e. after bufferization. Hoisting happens here, i.e. after the dominance analysis. I'm mentioning this as I'm under the impression that logic assumes that it's analyzing MLIR pre-bufferization. I'm not sure though, all of this is very new to me. Any hints how to fix it?Btw, I've fixed a few similar aliasing issues in llvm/llvm-project#65770. @matthias-springer suggested "hoisting on tensors instead of memrefs" and I'm wondering whether that invocation of
hoistRedundantVectorTransfers
should simply be removed OptimizeVectorTransferPass.cpp?REPRODUCER
To build:
Input:
The text was updated successfully, but these errors were encountered: