Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect vector.transfer_read + vector.transfer_write hoisting #14994

Closed
banach-space opened this issue Sep 19, 2023 · 5 comments
Closed

Incorrect vector.transfer_read + vector.transfer_write hoisting #14994

banach-space opened this issue Sep 19, 2023 · 5 comments

Comments

@banach-space
Copy link
Collaborator

banach-space commented Sep 19, 2023

Hi,

For the example below, hoistRedundantVectorTransfers incorrectly hoists the following vector.transfer_read/vector.transfer_write pair out of the most inner loop:

%10 = vector.transfer_read %collapse_shape_10[%c0], %c0_i32 {in_bounds = [true]} : memref<1xi32, #hal.descriptor_type<storage_buffer>>, vector<1xi32>
(...)
vector.transfer_write %19, %collapse_shape_10[%c0] {in_bounds = [true]} : vector<1xi32>, memref<1xi32, #hal.descriptor_type<storage_buffer>>

This particular instance of hoistRedundantVectorTransfers is invoked in OptimizeVectorTransferPass.cpp, i.e. after bufferization. Hoisting happens here, i.e. after the dominance analysis. I'm mentioning this as I'm under the impression that logic assumes that it's analyzing MLIR pre-bufferization. I'm not sure though, all of this is very new to me. Any hints how to fix it?

Btw, I've fixed a few similar aliasing issues in llvm/llvm-project#65770. @matthias-springer suggested "hoisting on tensors instead of memrefs" and I'm wondering whether that invocation of hoistRedundantVectorTransfers should simply be removed OptimizeVectorTransferPass.cpp?

REPRODUCER

To build:

iree-opt --iree-codegen-optimize-vector-transfer file.mlir

Input:

func.func @pipeline_dispatch_0_depthwise_conv_2d_nhwc_hwc_1x10x20x1x1x9_i32() {
  %c0_i32 = arith.constant 0 : i32
  %c10 = arith.constant 10 : index
  %c20 = arith.constant 20 : index
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %c2 = arith.constant 2 : index
  %c5 = arith.constant 5 : index
  %c3 = arith.constant 3 : index
  %c9 = arith.constant 9 : index
  %cst = arith.constant dense<0> : vector<1xi32>
  %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x1x1xi32, #hal.descriptor_type<storage_buffer>>
  %alloca_0 = memref.alloca() {alignment = 64 : i64} : memref<1x3x1xi32, #hal.descriptor_type<storage_buffer>>
  %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c0) flags(ReadOnly) : memref<1x10x28x1xi32, #hal.descriptor_type<storage_buffer>>
  memref.assume_alignment %0, 64 : memref<1x10x28x1xi32, #hal.descriptor_type<storage_buffer>>
  %1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) alignment(64) offset(%c0) flags(ReadOnly) : memref<1x9x1xi32, #hal.descriptor_type<storage_buffer>>
  memref.assume_alignment %1, 64 : memref<1x9x1xi32, #hal.descriptor_type<storage_buffer>>
  %2 = hal.interface.binding.subspan set(0) binding(2) type(storage_buffer) alignment(64) offset(%c0) : memref<1x10x20x1xi32, #hal.descriptor_type<storage_buffer>>
  memref.assume_alignment %2, 64 : memref<1x10x20x1xi32, #hal.descriptor_type<storage_buffer>>
  %workgroup_id_x = hal.interface.workgroup.id[0] : index
  %workgroup_count_x = hal.interface.workgroup.count[0] : index
  %workgroup_id_y = hal.interface.workgroup.id[1] : index
  %workgroup_count_y = hal.interface.workgroup.count[1] : index
  %3 = affine.apply affine_map<()[s0] -> (s0 * 2)>()[%workgroup_id_y]
  %4 = affine.apply affine_map<()[s0] -> (s0 * 2)>()[%workgroup_count_y]
  %5 = affine.apply affine_map<()[s0] -> (s0 * 5)>()[%workgroup_id_x]
  %6 = affine.apply affine_map<()[s0] -> (s0 * 5)>()[%workgroup_count_x]
  scf.for %arg0 = %3 to %c10 step %4 {
    scf.for %arg1 = %5 to %c20 step %6 {
      %subview = memref.subview %2[0, %arg0, %arg1, 0] [1, 2, 5, 1] [1, 1, 1, 1] : memref<1x10x20x1xi32, #hal.descriptor_type<storage_buffer>> to memref<1x2x5x1xi32, strided<[200, 20, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
      %subview_1 = memref.subview %0[0, %arg0, %arg1, 0] [1, 2, 13, 1] [1, 1, 1, 1] : memref<1x10x28x1xi32, #hal.descriptor_type<storage_buffer>> to memref<1x2x13x1xi32, strided<[280, 28, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
      scf.for %arg2 = %c0 to %c2 step %c1 {
        scf.for %arg3 = %c0 to %c5 step %c1 {
          %subview_2 = memref.subview %subview_1[0, %arg2, %arg3, 0] [1, 1, 9, 1] [1, 1, 1, 1] : memref<1x2x13x1xi32, strided<[280, 28, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<1x1x9x1xi32, strided<[280, 28, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
          %subview_3 = memref.subview %subview[0, %arg2, %arg3, 0] [1, 1, 1, 1] [1, 1, 1, 1] : memref<1x2x5x1xi32, strided<[200, 20, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<1x1x1x1xi32, strided<[200, 20, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
          vector.transfer_write %cst, %subview_3[%c0, %c0, %c0, %c0] {in_bounds = [true]} : vector<1xi32>, memref<1x1x1x1xi32, strided<[200, 20, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
          %subview_4 = memref.subview %subview_3[0, 0, 0, 0] [1, 1, 1, 1] [1, 1, 1, 1] : memref<1x1x1x1xi32, strided<[200, 20, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<1x1x1xi32, strided<[200, 20, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
          %cast = memref.cast %subview_4 : memref<1x1x1xi32, strided<[200, 20, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<1x1x1xi32, strided<[?, ?, ?], offset: ?>, #hal.descriptor_type<storage_buffer>>
          %7 = scf.for %arg4 = %c0 to %c9 step %c3 iter_args(%arg5 = %cast) -> (memref<1x1x1xi32, strided<[?, ?, ?], offset: ?>, #hal.descriptor_type<storage_buffer>>) {
            %subview_5 = memref.subview %subview_2[0, 0, %arg4, 0] [1, 1, 3, 1] [1, 1, 1, 1] : memref<1x1x9x1xi32, strided<[280, 28, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<1x1x3x1xi32, strided<[280, 28, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
            %subview_6 = memref.subview %1[0, %arg4, 0] [1, 3, 1] [1, 1, 1] : memref<1x9x1xi32, #hal.descriptor_type<storage_buffer>> to memref<1x3x1xi32, strided<[9, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
            %subview_7 = memref.subview %subview_5[0, 0, 0, 0] [1, 1, 3, 1] [1, 1, 1, 1] : memref<1x1x3x1xi32, strided<[280, 28, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<1x3x1xi32, strided<[280, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
            %subview_8 = memref.subview %subview_6[0, 0, 0] [1, 3, 1] [1, 1, 1] : memref<1x3x1xi32, strided<[9, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<3x1xi32, strided<[1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
            linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%subview_7 : memref<1x3x1xi32, strided<[280, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>) outs(%alloca_0 : memref<1x3x1xi32, #hal.descriptor_type<storage_buffer>>) {
            ^bb0(%in: i32, %out: i32):
              linalg.yield %in : i32
            }
            %collapse_shape = memref.collapse_shape %alloca_0 [[0, 1, 2]] : memref<1x3x1xi32, #hal.descriptor_type<storage_buffer>> into memref<3xi32, #hal.descriptor_type<storage_buffer>>
            %collapse_shape_9 = memref.collapse_shape %subview_8 [[0, 1]] : memref<3x1xi32, strided<[1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>> into memref<3xi32, strided<[1], offset: ?>, #hal.descriptor_type<storage_buffer>>
            linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%arg5 : memref<1x1x1xi32, strided<[?, ?, ?], offset: ?>, #hal.descriptor_type<storage_buffer>>) outs(%alloca : memref<1x1x1xi32, #hal.descriptor_type<storage_buffer>>) {
            ^bb0(%in: i32, %out: i32):
              linalg.yield %in : i32
            }
            %collapse_shape_10 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x1xi32, #hal.descriptor_type<storage_buffer>> into memref<1xi32, #hal.descriptor_type<storage_buffer>>
            %8 = vector.transfer_read %collapse_shape[%c0], %c0_i32 {in_bounds = [true]} : memref<3xi32, #hal.descriptor_type<storage_buffer>>, vector<3xi32>
            %9 = vector.transfer_read %collapse_shape_9[%c0], %c0_i32 {in_bounds = [true]} : memref<3xi32, strided<[1], offset: ?>, #hal.descriptor_type<storage_buffer>>, vector<3xi32>
            %10 = vector.transfer_read %collapse_shape_10[%c0], %c0_i32 {in_bounds = [true]} : memref<1xi32, #hal.descriptor_type<storage_buffer>>, vector<1xi32>
            %11 = vector.extract_strided_slice %8 {offsets = [0], sizes = [1], strides = [1]} : vector<3xi32> to vector<1xi32>
            %12 = vector.extract_strided_slice %8 {offsets = [1], sizes = [1], strides = [1]} : vector<3xi32> to vector<1xi32>
            %13 = vector.extract_strided_slice %8 {offsets = [2], sizes = [1], strides = [1]} : vector<3xi32> to vector<1xi32>
            %14 = vector.extract %9[0] : vector<3xi32>
            %15 = vector.extract %9[1] : vector<3xi32>
            %16 = vector.extract %9[2] : vector<3xi32>
            %17 = vector.outerproduct %11, %14, %10 {kind = #vector.kind<add>} : vector<1xi32>, i32
            %18 = vector.outerproduct %12, %15, %17 {kind = #vector.kind<add>} : vector<1xi32>, i32
            %19 = vector.outerproduct %13, %16, %18 {kind = #vector.kind<add>} : vector<1xi32>, i32
            vector.transfer_write %19, %collapse_shape_10[%c0] {in_bounds = [true]} : vector<1xi32>, memref<1xi32, #hal.descriptor_type<storage_buffer>>
            %cast_11 = memref.cast %alloca : memref<1x1x1xi32, #hal.descriptor_type<storage_buffer>> to memref<1x1x1xi32, strided<[?, ?, ?], offset: ?>, #hal.descriptor_type<storage_buffer>>
            scf.yield %cast_11 : memref<1x1x1xi32, strided<[?, ?, ?], offset: ?>, #hal.descriptor_type<storage_buffer>>
          }
          linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%7 : memref<1x1x1xi32, strided<[?, ?, ?], offset: ?>, #hal.descriptor_type<storage_buffer>>) outs(%subview_4 : memref<1x1x1xi32, strided<[200, 20, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>) {
          ^bb0(%in: i32, %out: i32):
            linalg.yield %in : i32
          }
        }
      }
    }
  }
  return
}
@dcaballe
Copy link
Contributor

I think @hanhanW would be the right person to help here but he is out. I'm not familiar with this code, unfortunately. @MaheshRavishankar?

@banach-space
Copy link
Collaborator Author

I've opened a PR in MLIR core with a reduced repro. Hopefully that will help with the discussion.

@MaheshRavishankar
Copy link
Contributor

I really dont have much context here. I am a bit swamped right now to dig deep here, and try to get context, but might be able to get to it by end of the week (sorry, just setting expectations w.r.t to ping above).

banach-space added a commit to banach-space/llvm-project that referenced this issue Sep 22, 2023
At the moment, `hoistRedundantVectorTransfers` would hoist the
`vector.transfer_read`/`vector.transfer_write` pair in this function:

```mlir
func.func @no_hoisting_write_to_memref(%rhs: i32, %arg1: vector<1xi32>) {
  %c0_i32 = arith.constant 0 : i32
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %c4 = arith.constant 4 : index
  %c20 = arith.constant 20 : index
  %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x1x2xi32>
  %cast = memref.cast %alloca : memref<1x1x2xi32> to memref<1x1x2xi32>
  %collapsed_1 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32>
  scf.for %_ = %c0 to %c20 step %c4 {
    %collapsed_2 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32>
    %lhs = vector.transfer_read %collapsed_1[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32>
    %acc = vector.transfer_read %collapsed_2[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32>
    %op = vector.outerproduct %lhs, %rhs, %acc {kind = #vector.kind<add>} : vector<1xi32>, i32
    vector.transfer_write %op, %collapsed_1[%c0] {in_bounds = [true]} : vector<1xi32>, memref<2xi32>
  }
  return
}
```
as follows:
```mlir
  func.func @no_hoisting_write_to_memref(%arg0: i32, %arg1: vector<1xi32>) {
    %c0_i32 = arith.constant 0 : i32
    %c0 = arith.constant 0 : index
    %c4 = arith.constant 4 : index
    %c20 = arith.constant 20 : index
    %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x1x2xi32>
    %collapse_shape = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32>
    %collapse_shape_0 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32>
    %0 = vector.transfer_read %collapse_shape[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32>
    %1 = vector.transfer_read %collapse_shape_0[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32>
    %2 = scf.for %arg2 = %c0 to %c20 step %c4 iter_args(%arg3 = %0) -> (vector<1xi32>) {
      %3 = vector.outerproduct %arg3, %arg0, %1 {kind = #vector.kind<add>} : vector<1xi32>, i32
      scf.yield %3 : vector<1xi32>
    }
    vector.transfer_write %2, %collapse_shape[%c0] {in_bounds = [true]} : vector<1xi32>, memref<2xi32>
    return
  }
```

This is not safe. While one argument for `vector.outerproduct` (`%rhs`
from the original loop) is correctly being forwarded via `iter_args`,
the other one (`%acc` from the original loop) is not.

This patch disables hoisting in cases where the source of "candidate"
`vector.transfer_read` aliases with some other `memref`. A more generic
approach would be to make sure that all values are correctly forwarded
via `iter_args`, but that would require involving alias analysis.

[1] Based on iree-org/iree#14994.
@banach-space
Copy link
Collaborator Author

I have updated llvm/llvm-project#66930 - that feels like the right fix to me. There's also a small repro that explains what the issue is.

As always, feedback is greatly appreciated!

banach-space added a commit to banach-space/llvm-project that referenced this issue Sep 29, 2023
…hoisting

[mlir][vector] Prevent incorrect vector.transfer_{read|write} hoisting

Refines how opportunities for hoisting vector.transfer_{read|write}
pairs are identified. More specifically, rather than looking for
specific MemRef ops that could lead to aliasing, this patch updates the
hoisting logic to check whether the underlying Op implements
`ViewLikeOpInterface`.

Additional condition is added to prevent hoisting when one of the source
operands implements `ViewLikeOpInterface`. This was motivated by the
following example [1]:

```mlir
func.func @no_hoisting_write_to_memref(%rhs: i32, %arg1: vector<1xi32>) {
  %c0_i32 = arith.constant 0 : i32
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %c4 = arith.constant 4 : index
  %c20 = arith.constant 20 : index
  %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x1x2xi32>
  %cast = memref.cast %alloca : memref<1x1x2xi32> to memref<1x1x2xi32>
  %collapsed_1 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32>
  scf.for %_ = %c0 to %c20 step %c4 {
    %collapsed_2 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32>
    %lhs = vector.transfer_read %collapsed_1[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32>
    %acc = vector.transfer_read %collapsed_2[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32>
    %op = vector.outerproduct %lhs, %rhs, %acc {kind = #vector.kind<add>} : vector<1xi32>, i32
    vector.transfer_write %op, %collapsed_1[%c0] {in_bounds = [true]} : vector<1xi32>, memref<2xi32>
  }
  return
}
```

Originally, it would be rewritten as follows:

```mlir
  func.func @no_hoisting_write_to_memref(%arg0: i32, %arg1: vector<1xi32>) {
    %c0_i32 = arith.constant 0 : i32
    %c0 = arith.constant 0 : index
    %c4 = arith.constant 4 : index
    %c20 = arith.constant 20 : index
    %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x1x2xi32>
    %collapse_shape = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32>
    %collapse_shape_0 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32>
    %0 = vector.transfer_read %collapse_shape[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32>
    %1 = vector.transfer_read %collapse_shape_0[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32>
    %2 = scf.for %arg2 = %c0 to %c20 step %c4 iter_args(%arg3 = %0) -> (vector<1xi32>) {
      %3 = vector.outerproduct %arg3, %arg0, %1 {kind = #vector.kind<add>} : vector<1xi32>, i32
      scf.yield %3 : vector<1xi32>
    }
    vector.transfer_write %2, %collapse_shape[%c0] {in_bounds = [true]} : vector<1xi32>, memref<2xi32>
    return
  }
```

This was not safe. While one argument for `vector.outerproduct` was
correctly being forwarded via `iter_args` (`%rhs` from the original
loop), the other one wasn't (`%acc` from the original loop).

[1] Based on iree-org/iree#14994.
banach-space added a commit to llvm/llvm-project that referenced this issue Sep 29, 2023
#66930)

At the moment, `hoistRedundantVectorTransfers` would hoist the
`vector.transfer_read`/`vector.transfer_write` pair in this function:

```mlir
func.func @no_hoisting_write_to_memref(%rhs: i32, %arg1: vector<1xi32>) {
  %c0_i32 = arith.constant 0 : i32
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %c4 = arith.constant 4 : index
  %c20 = arith.constant 20 : index
  %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x1x2xi32>
  %cast = memref.cast %alloca : memref<1x1x2xi32> to memref<1x1x2xi32>
  %collapsed_1 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32>
  scf.for %_ = %c0 to %c20 step %c4 {
    %collapsed_2 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32>
    %lhs = vector.transfer_read %collapsed_1[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32>
    %acc = vector.transfer_read %collapsed_2[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32>
    %op = vector.outerproduct %lhs, %rhs, %acc {kind = #vector.kind<add>} : vector<1xi32>, i32
    vector.transfer_write %op, %collapsed_1[%c0] {in_bounds = [true]} : vector<1xi32>, memref<2xi32>
  }
  return
}
```
as follows:
```mlir
  func.func @no_hoisting_write_to_memref(%arg0: i32, %arg1: vector<1xi32>) {
    %c0_i32 = arith.constant 0 : i32
    %c0 = arith.constant 0 : index
    %c4 = arith.constant 4 : index
    %c20 = arith.constant 20 : index
    %alloca = memref.alloca() {alignment = 64 : i64} : memref<1x1x2xi32>
    %collapse_shape = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32>
    %collapse_shape_0 = memref.collapse_shape %alloca [[0, 1, 2]] : memref<1x1x2xi32> into memref<2xi32>
    %0 = vector.transfer_read %collapse_shape[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32>
    %1 = vector.transfer_read %collapse_shape_0[%c0], %c0_i32 {in_bounds = [true]} : memref<2xi32>, vector<1xi32>
    %2 = scf.for %arg2 = %c0 to %c20 step %c4 iter_args(%arg3 = %0) -> (vector<1xi32>) {
      %3 = vector.outerproduct %arg3, %arg0, %1 {kind = #vector.kind<add>} : vector<1xi32>, i32
      scf.yield %3 : vector<1xi32>
    }
    vector.transfer_write %2, %collapse_shape[%c0] {in_bounds = [true]} : vector<1xi32>, memref<2xi32>
    return
  }
```

This is not safe. While one argument for `vector.outerproduct` (`%rhs`
from the original loop) is correctly being forwarded via `iter_args`,
the other one (`%acc` from the original loop) is not.

This patch disables hoisting in cases where the source of "candidate"
`vector.transfer_read` aliases with some other `memref`. A more generic
approach would be to make sure that all values are correctly forwarded
via `iter_args`, but that would require involving alias analysis.

[1] Based on iree-org/iree#14994.
@banach-space
Copy link
Collaborator Author

Resolved within MLIR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants