dominance-related failure occurs when compiling an opt model #17759

zjgarvey · 2024-06-27T16:57:04Z

What happened?

This error came from testing the /onnx/models/opt-125M-vaiq model in https://github.com/nod-ai/SHARK-TestSuite:

opt-125M-awq.default.onnx.torch.mlir:1218:13: error: operand #0 does not dominate this use
    %1166 = torch.aten.item %1165 : !torch.vtensor<[],si64> -> !torch.int
            ^
opt-125M-awq.default.onnx.torch.mlir:1218:13: note: see current operation: %412 = "tensor.extract"(%415#1) : (tensor<i32>) -> i32
opt-125M-awq.default.onnx.torch.mlir:1229:13: note: operand defined here (op in the same block)
    %1175 = torch.aten.where.self %1172, %1173, %1174 : !torch.vtensor<[],i1>, !torch.vtensor<[],si64>, !torch.vtensor<[],si64> -> !torch.vtensor<[],si64>
            ^

Steps to reproduce your issue

Download a smaller reproducer from this gist, then run

iree-compile --iree-input-demote-i64-to-i32 --iree-hal-target-backends=llvm-cpu  --iree-input-type=torch dom_error_repro.torch.mlir -o dump.vmfb

This will result in the error message:

dom_error_repro.torch.mlir:62:11: error: operand #0 does not dominate this use
    %55 = torch.aten.item %54 : !torch.vtensor<[],si64> -> !torch.int
          ^
dom_error_repro.torch.mlir:62:11: note: see current operation: %17 = "tensor.extract"(%20#1) : (tensor<i32>) -> i32
dom_error_repro.torch.mlir:71:11: note: operand defined here (op in the same block)
    %64 = torch.aten.where.self %61, %62, %63 : !torch.vtensor<[],i1>, !torch.vtensor<[],si64>, !torch.vtensor<[],si64> -> !torch.vtensor<[],si64>

It seems to be failing on the pass iree-flow-form-scalar-dispatches. Here is the dump after failure:

// -----// IR Dump After FormScalarDispatchesPass Failed (iree-flow-form-scalar-dispatches) //----- //
"util.func"() <{function_type = (!hal.buffer_view, !hal.buffer_view, !hal.buffer_view, !hal.fence, !hal.fence) -> !hal.buffer_view, inlining_policy = #util.inline.never, sym_name = "main_graph$async"}> ({
^bb0(%arg0: !hal.buffer_view, %arg1: !hal.buffer_view, %arg2: !hal.buffer_view, %arg3: !hal.fence, %arg4: !hal.fence):
  %0 = "arith.constant"() <{value = 0 : index}> : () -> index
  %1 = "arith.constant"() <{value = 1 : index}> : () -> index
  %2 = "arith.constant"() <{value = 0 : i32}> : () -> i32
  %3 = "arith.constant"() <{value = dense<[-1, 0]> : tensor<2xi32>}> : () -> tensor<2xi32>
  %4 = "hal.buffer_view.dim"(%arg0) {index = 0 : index} : (!hal.buffer_view) -> index
  %5 = "hal.buffer_view.dim"(%arg0) {index = 1 : index} : (!hal.buffer_view) -> index
  %6 = "hal.tensor.import"(%arg0, %4, %5, %arg3) {operandSegmentSizes = array<i32: 1, 2, 1>, target_encoding = tensor<?x?xi64>} : (!hal.buffer_view, index, index, !hal.fence) -> tensor<?x?xi32>
  %7 = "arith.index_cast"(%4) : (index) -> i32
  %8 = "arith.index_cast"(%5) : (index) -> i32
  %9 = "tensor.empty"() : () -> tensor<i32>
  %10 = "flow.dispatch.region"() <{operandSegmentSizes = array<i32: 0, 0>}> ({
    %34 = "linalg.generic"(%9) <{indexing_maps = [affine_map<() -> ()>], iterator_types = [], operandSegmentSizes = array<i32: 0, 1>}> ({
    ^bb0(%arg7: i32):
      "linalg.yield"(%8) : (i32) -> ()
    }) : (tensor<i32>) -> tensor<i32>
    "flow.return"(%34) : (tensor<i32>) -> ()
  }, {
    %33 = "arith.constant"() <{value = 1 : index}> : () -> index
    "flow.return"(%33, %33, %33) : (index, index, index) -> ()
  }) : () -> tensor<i32>
  %11 = "tensor.expand_shape"(%10) <{reassociation = [], static_output_shape = array<i64: 1>}> : (tensor<i32>) -> tensor<1xi32>
  %12 = "tensor.insert_slice"(%10, %3) <{operandSegmentSizes = array<i32: 1, 1, 0, 0, 0>, static_offsets = array<i64: 1>, static_sizes = array<i64: 1>, static_strides = array<i64: 1>}> : (tensor<i32>, tensor<2xi32>) -> tensor<2xi32>
  %13 = "tensor.extract_slice"(%12) <{operandSegmentSizes = array<i32: 1, 0, 0, 0>, static_offsets = array<i64: 0>, static_sizes = array<i64: 1>, static_strides = array<i64: 1>}> : (tensor<2xi32>) -> tensor<i32>
  %14 = "tensor.expand_shape"(%13) <{reassociation = [], static_output_shape = array<i64: 1>}> : (tensor<i32>) -> tensor<1xi32>
  %15 = "tensor.extract"(%14, %0) : (tensor<1xi32>, index) -> i32
  %16 = "arith.cmpi"(%15, %2) <{predicate = 0 : i64}> : (i32, i32) -> i1
  %17 = "tensor.extract"(%20#1) : (tensor<i32>) -> i32
  %18 = "tensor.extract"(%11, %0) : (tensor<1xi32>, index) -> i32
  %19 = "arith.cmpi"(%18, %2) <{predicate = 0 : i64}> : (i32, i32) -> i1
  %20:2 = "flow.dispatch.region"() <{operandSegmentSizes = array<i32: 0, 0>}> ({
    %29 = "linalg.generic"(%9) <{indexing_maps = [affine_map<() -> ()>], iterator_types = [], operandSegmentSizes = array<i32: 0, 1>}> ({
    ^bb0(%arg6: i32):
      %32 = "arith.select"(%16, %7, %15) : (i1, i32, i32) -> i32
      "linalg.yield"(%32) : (i32) -> ()
    }) : (tensor<i32>) -> tensor<i32>
    %30 = "linalg.generic"(%9) <{indexing_maps = [affine_map<() -> ()>], iterator_types = [], operandSegmentSizes = array<i32: 0, 1>}> ({
    ^bb0(%arg5: i32):
      %31 = "arith.select"(%19, %8, %18) : (i1, i32, i32) -> i32
      "linalg.yield"(%31) : (i32) -> ()
    }) : (tensor<i32>) -> tensor<i32>
    "flow.return"(%30, %29) : (tensor<i32>, tensor<i32>) -> ()
  }, {
    %28 = "arith.constant"() <{value = 1 : index}> : () -> index
    "flow.return"(%28, %28, %28) : (index, index, index) -> ()
  }) : () -> (tensor<i32>, tensor<i32>)
  %21 = "tensor.extract"(%20#0) : (tensor<i32>) -> i32
  %22 = "tensor.from_elements"(%17, %21) : (i32, i32) -> tensor<2xi32>
  %23 = "tensor.reshape"(%6, %22) : (tensor<?x?xi32>, tensor<2xi32>) -> tensor<?x?xi32>
  %24 = "hal.tensor.barrier"(%23, %arg4) : (tensor<?x?xi32>, !hal.fence) -> tensor<?x?xi32>
  %25 = "tensor.dim"(%24, %0) : (tensor<?x?xi32>, index) -> index
  %26 = "tensor.dim"(%24, %1) : (tensor<?x?xi32>, index) -> index
  %27 = "hal.tensor.export"(%24, %25, %26) {source_encoding = tensor<?x?xi64>} : (tensor<?x?xi32>, index, index) -> !hal.buffer_view
  "util.return"(%27) : (!hal.buffer_view) -> ()
}) {iree.abi.model = "coarse-fences", iree.abi.stub} : () -> ()

What component(s) does this issue relate to?

Compiler

Version information

using a local build at commit 3b5d269

Additional context

No response

The text was updated successfully, but these errors were encountered:

hanhanW · 2024-06-28T16:44:20Z

dom_error_repro.torch.mlir:62:11: error: operand #0 does not dominate this use
%55 = torch.aten.item %54 : !torch.vtensor<[],si64> -> !torch.int

It usually indicates that we don't set insertion point before creating an operation. @IanWood1 could help with this? I think you have some context about FormScalarDispatchesPass. I can jump in if you need some help.

cc @MaheshRavishankar

MaheshRavishankar · 2024-06-29T00:13:42Z

dom_error_repro.torch.mlir:62:11: error: operand #0 does not dominate this use
%55 = torch.aten.item %54 : !torch.vtensor<[],si64> -> !torch.int

It usually indicates that we don't set insertion point before creating an operation. @IanWood1 could help with this? I think you have some context about FormScalarDispatchesPass. I can jump in if you need some help.

cc @MaheshRavishankar

Oh crap... I havent touched that pass in ages.

IanWood1 · 2024-06-29T00:53:11Z

it seems like horizontal fusion is moving ops into the region that have uses before rootOp/the new region.

When preforming horizontal fusion, ops that were clonable (but not used by the fusion group) were ignored. If these ops were dependent on values produced by 'root ops', then the root op would get moved into the region. Closes iree-org#17759 Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu> Signed-off-by: Lubo Litchev <lubol@google.com>

zjgarvey added the bug 🐞 Something isn't working label Jun 27, 2024

hanhanW assigned IanWood1 Jun 28, 2024

IanWood1 mentioned this issue Jul 1, 2024

[Flow] Fix dominance error in FormScalarDispatches #17785

Merged

ScottTodd added the integrations/pytorch PyTorch integration work label Jul 3, 2024

This was referenced Jul 3, 2024

tensor.extract operand #0 does not dominate this use #17806

Closed

[tracking] E2EShark Model Tests Onnx Mode nod-ai/SHARK-ModelDev#566

Open

IanWood1 linked a pull request Jul 26, 2024 that will close this issue

[Flow] Fix dominance error in FormScalarDispatches #17785

Merged

IanWood1 closed this as completed in #17785 Jul 26, 2024

IanWood1 closed this as completed in afea353 Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dominance-related failure occurs when compiling an opt model #17759

dominance-related failure occurs when compiling an opt model #17759

zjgarvey commented Jun 27, 2024

hanhanW commented Jun 28, 2024

MaheshRavishankar commented Jun 29, 2024

IanWood1 commented Jun 29, 2024 •

edited

Loading

dominance-related failure occurs when compiling an opt model #17759

dominance-related failure occurs when compiling an opt model #17759

Comments

zjgarvey commented Jun 27, 2024

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

hanhanW commented Jun 28, 2024

MaheshRavishankar commented Jun 29, 2024

IanWood1 commented Jun 29, 2024 • edited Loading

IanWood1 commented Jun 29, 2024 •

edited

Loading