Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dominance-related failure occurs when compiling an opt model #17759

Closed
zjgarvey opened this issue Jun 27, 2024 · 3 comments · Fixed by #17785
Closed

dominance-related failure occurs when compiling an opt model #17759

zjgarvey opened this issue Jun 27, 2024 · 3 comments · Fixed by #17785
Assignees
Labels
bug 🐞 Something isn't working integrations/pytorch PyTorch integration work

Comments

@zjgarvey
Copy link
Contributor

What happened?

This error came from testing the /onnx/models/opt-125M-vaiq model in https://github.com/nod-ai/SHARK-TestSuite:

opt-125M-awq.default.onnx.torch.mlir:1218:13: error: operand #0 does not dominate this use
    %1166 = torch.aten.item %1165 : !torch.vtensor<[],si64> -> !torch.int
            ^
opt-125M-awq.default.onnx.torch.mlir:1218:13: note: see current operation: %412 = "tensor.extract"(%415#1) : (tensor<i32>) -> i32
opt-125M-awq.default.onnx.torch.mlir:1229:13: note: operand defined here (op in the same block)
    %1175 = torch.aten.where.self %1172, %1173, %1174 : !torch.vtensor<[],i1>, !torch.vtensor<[],si64>, !torch.vtensor<[],si64> -> !torch.vtensor<[],si64>
            ^

Steps to reproduce your issue

Download a smaller reproducer from this gist, then run

iree-compile --iree-input-demote-i64-to-i32 --iree-hal-target-backends=llvm-cpu  --iree-input-type=torch dom_error_repro.torch.mlir -o dump.vmfb

This will result in the error message:

dom_error_repro.torch.mlir:62:11: error: operand #0 does not dominate this use
    %55 = torch.aten.item %54 : !torch.vtensor<[],si64> -> !torch.int
          ^
dom_error_repro.torch.mlir:62:11: note: see current operation: %17 = "tensor.extract"(%20#1) : (tensor<i32>) -> i32
dom_error_repro.torch.mlir:71:11: note: operand defined here (op in the same block)
    %64 = torch.aten.where.self %61, %62, %63 : !torch.vtensor<[],i1>, !torch.vtensor<[],si64>, !torch.vtensor<[],si64> -> !torch.vtensor<[],si64>

It seems to be failing on the pass iree-flow-form-scalar-dispatches. Here is the dump after failure:

// -----// IR Dump After FormScalarDispatchesPass Failed (iree-flow-form-scalar-dispatches) //----- //
"util.func"() <{function_type = (!hal.buffer_view, !hal.buffer_view, !hal.buffer_view, !hal.fence, !hal.fence) -> !hal.buffer_view, inlining_policy = #util.inline.never, sym_name = "main_graph$async"}> ({
^bb0(%arg0: !hal.buffer_view, %arg1: !hal.buffer_view, %arg2: !hal.buffer_view, %arg3: !hal.fence, %arg4: !hal.fence):
  %0 = "arith.constant"() <{value = 0 : index}> : () -> index
  %1 = "arith.constant"() <{value = 1 : index}> : () -> index
  %2 = "arith.constant"() <{value = 0 : i32}> : () -> i32
  %3 = "arith.constant"() <{value = dense<[-1, 0]> : tensor<2xi32>}> : () -> tensor<2xi32>
  %4 = "hal.buffer_view.dim"(%arg0) {index = 0 : index} : (!hal.buffer_view) -> index
  %5 = "hal.buffer_view.dim"(%arg0) {index = 1 : index} : (!hal.buffer_view) -> index
  %6 = "hal.tensor.import"(%arg0, %4, %5, %arg3) {operandSegmentSizes = array<i32: 1, 2, 1>, target_encoding = tensor<?x?xi64>} : (!hal.buffer_view, index, index, !hal.fence) -> tensor<?x?xi32>
  %7 = "arith.index_cast"(%4) : (index) -> i32
  %8 = "arith.index_cast"(%5) : (index) -> i32
  %9 = "tensor.empty"() : () -> tensor<i32>
  %10 = "flow.dispatch.region"() <{operandSegmentSizes = array<i32: 0, 0>}> ({
    %34 = "linalg.generic"(%9) <{indexing_maps = [affine_map<() -> ()>], iterator_types = [], operandSegmentSizes = array<i32: 0, 1>}> ({
    ^bb0(%arg7: i32):
      "linalg.yield"(%8) : (i32) -> ()
    }) : (tensor<i32>) -> tensor<i32>
    "flow.return"(%34) : (tensor<i32>) -> ()
  }, {
    %33 = "arith.constant"() <{value = 1 : index}> : () -> index
    "flow.return"(%33, %33, %33) : (index, index, index) -> ()
  }) : () -> tensor<i32>
  %11 = "tensor.expand_shape"(%10) <{reassociation = [], static_output_shape = array<i64: 1>}> : (tensor<i32>) -> tensor<1xi32>
  %12 = "tensor.insert_slice"(%10, %3) <{operandSegmentSizes = array<i32: 1, 1, 0, 0, 0>, static_offsets = array<i64: 1>, static_sizes = array<i64: 1>, static_strides = array<i64: 1>}> : (tensor<i32>, tensor<2xi32>) -> tensor<2xi32>
  %13 = "tensor.extract_slice"(%12) <{operandSegmentSizes = array<i32: 1, 0, 0, 0>, static_offsets = array<i64: 0>, static_sizes = array<i64: 1>, static_strides = array<i64: 1>}> : (tensor<2xi32>) -> tensor<i32>
  %14 = "tensor.expand_shape"(%13) <{reassociation = [], static_output_shape = array<i64: 1>}> : (tensor<i32>) -> tensor<1xi32>
  %15 = "tensor.extract"(%14, %0) : (tensor<1xi32>, index) -> i32
  %16 = "arith.cmpi"(%15, %2) <{predicate = 0 : i64}> : (i32, i32) -> i1
  %17 = "tensor.extract"(%20#1) : (tensor<i32>) -> i32
  %18 = "tensor.extract"(%11, %0) : (tensor<1xi32>, index) -> i32
  %19 = "arith.cmpi"(%18, %2) <{predicate = 0 : i64}> : (i32, i32) -> i1
  %20:2 = "flow.dispatch.region"() <{operandSegmentSizes = array<i32: 0, 0>}> ({
    %29 = "linalg.generic"(%9) <{indexing_maps = [affine_map<() -> ()>], iterator_types = [], operandSegmentSizes = array<i32: 0, 1>}> ({
    ^bb0(%arg6: i32):
      %32 = "arith.select"(%16, %7, %15) : (i1, i32, i32) -> i32
      "linalg.yield"(%32) : (i32) -> ()
    }) : (tensor<i32>) -> tensor<i32>
    %30 = "linalg.generic"(%9) <{indexing_maps = [affine_map<() -> ()>], iterator_types = [], operandSegmentSizes = array<i32: 0, 1>}> ({
    ^bb0(%arg5: i32):
      %31 = "arith.select"(%19, %8, %18) : (i1, i32, i32) -> i32
      "linalg.yield"(%31) : (i32) -> ()
    }) : (tensor<i32>) -> tensor<i32>
    "flow.return"(%30, %29) : (tensor<i32>, tensor<i32>) -> ()
  }, {
    %28 = "arith.constant"() <{value = 1 : index}> : () -> index
    "flow.return"(%28, %28, %28) : (index, index, index) -> ()
  }) : () -> (tensor<i32>, tensor<i32>)
  %21 = "tensor.extract"(%20#0) : (tensor<i32>) -> i32
  %22 = "tensor.from_elements"(%17, %21) : (i32, i32) -> tensor<2xi32>
  %23 = "tensor.reshape"(%6, %22) : (tensor<?x?xi32>, tensor<2xi32>) -> tensor<?x?xi32>
  %24 = "hal.tensor.barrier"(%23, %arg4) : (tensor<?x?xi32>, !hal.fence) -> tensor<?x?xi32>
  %25 = "tensor.dim"(%24, %0) : (tensor<?x?xi32>, index) -> index
  %26 = "tensor.dim"(%24, %1) : (tensor<?x?xi32>, index) -> index
  %27 = "hal.tensor.export"(%24, %25, %26) {source_encoding = tensor<?x?xi64>} : (tensor<?x?xi32>, index, index) -> !hal.buffer_view
  "util.return"(%27) : (!hal.buffer_view) -> ()
}) {iree.abi.model = "coarse-fences", iree.abi.stub} : () -> ()

What component(s) does this issue relate to?

Compiler

Version information

using a local build at commit 3b5d269

Additional context

No response

@zjgarvey zjgarvey added the bug 🐞 Something isn't working label Jun 27, 2024
@hanhanW
Copy link
Contributor

hanhanW commented Jun 28, 2024

dom_error_repro.torch.mlir:62:11: error: operand #0 does not dominate this use
%55 = torch.aten.item %54 : !torch.vtensor<[],si64> -> !torch.int

It usually indicates that we don't set insertion point before creating an operation. @IanWood1 could help with this? I think you have some context about FormScalarDispatchesPass. I can jump in if you need some help.

cc @MaheshRavishankar

@MaheshRavishankar
Copy link
Contributor

dom_error_repro.torch.mlir:62:11: error: operand #0 does not dominate this use
%55 = torch.aten.item %54 : !torch.vtensor<[],si64> -> !torch.int

It usually indicates that we don't set insertion point before creating an operation. @IanWood1 could help with this? I think you have some context about FormScalarDispatchesPass. I can jump in if you need some help.

cc @MaheshRavishankar

Oh crap... I havent touched that pass in ages.

@IanWood1
Copy link
Contributor

IanWood1 commented Jun 29, 2024

it seems like horizontal fusion is moving ops into the region that have uses before rootOp/the new region.

@ScottTodd ScottTodd added the integrations/pytorch PyTorch integration work label Jul 3, 2024
@IanWood1 IanWood1 linked a pull request Jul 26, 2024 that will close this issue
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this issue Jul 30, 2024
When preforming horizontal fusion, ops that were clonable (but not used
by the fusion group) were ignored. If these ops were dependent on values
produced by 'root ops', then the root op would get moved into the
region.

Closes iree-org#17759

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Signed-off-by: Lubo Litchev <lubol@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working integrations/pytorch PyTorch integration work
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants