-
Notifications
You must be signed in to change notification settings - Fork 676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fold arith.exti
operation into conv
#17765
Conversation
compiler/src/iree/compiler/GlobalOptimization/test/raise_special_ops.mlir
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/GlobalOptimization/RaiseSpecialOps.cpp
Outdated
Show resolved
Hide resolved
Abbreviated Benchmark Summary@ commit b43ea7037a2fb5ed008173160049c9b31ccc6114 (vs. base dcba7c56799a7303c347baf7ec21e9ca07a56fec) Data-Tiling Comparison TableClick to show
Regressed Latencies 🚩
[Top 3 out of 5 results showed] Improved Latencies 🎉
[Top 3 out of 11 results showed] No improved or regressed compilation metrics 🏖️ For more information: |
%2 = tensor.empty() : tensor<10x40xi32> | ||
%3 = arith.constant 0 : i32 | ||
%4 = linalg.fill ins(%3 : i32) outs(%2 : tensor<10x40xi32>) -> tensor<10x40xi32> | ||
%5 = linalg.matmul ins(%arg0, %1 : tensor<10x20xi32>, tensor<20x40xi32>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this was a linalg.matmul_unsigned
this pattern would be incorrect because it would change the extension semantics. This is why we need to handle each op one by one for integer extends because different named ops have different extension semantics, unlike float ops which all imply extf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure how this doesnt work, printing out the body of the matmul_unsigned
shows that it is extended as a signed integer
module {
util.func public @matmul_extsi(%arg0: tensor<10x20xi32>, %arg1: tensor<20x40xi16>) -> tensor<10x40xi32> {
%c0_i32 = arith.constant 0 : i32
%0 = tensor.empty() : tensor<10x40xi32>
%1 = linalg.fill ins(%c0_i32 : i32) outs(%0 : tensor<10x40xi32>) -> tensor<10x40xi32>
%2 = linalg.matmul_unsigned ins(%arg0, %arg1 : tensor<10x20xi32>, tensor<20x40xi16>) outs(%1 : tensor<10x40xi32>) -> tensor<10x40xi32>
util.return %2 : tensor<10x40xi32>
}
}
// Generic form
#map = affine_map<(d0, d1, d2) -> (d0, d2)>
#map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
#map2 = affine_map<(d0, d1, d2) -> (d0, d1)>
"builtin.module"() ({
"util.func"() <{function_type = (tensor<10x20xi32>, tensor<20x40xi16>) -> tensor<10x40xi32>, sym_name = "matmul_extsi", sym_visibility = "public", tied_operands = [-1 : index]}> ({
^bb0(%arg0: tensor<10x20xi32>, %arg1: tensor<20x40xi16>):
%0 = "arith.constant"() <{value = 0 : i32}> : () -> i32
%1 = "tensor.empty"() : () -> tensor<10x40xi32>
%2 = "linalg.fill"(%0, %1) <{operandSegmentSizes = array<i32: 1, 1>}> ({
^bb0(%arg5: i32, %arg6: i32):
"linalg.yield"(%arg5) : (i32) -> ()
}) : (i32, tensor<10x40xi32>) -> tensor<10x40xi32>
%3 = "linalg.matmul_unsigned"(%arg0, %arg1, %2) <{operandSegmentSizes = array<i32: 2, 1>}> ({
^bb0(%arg2: i32, %arg3: i16, %arg4: i32):
%4 = "arith.extsi"(%arg3) : (i16) -> i32
%5 = "arith.muli"(%arg2, %4) <{overflowFlags = #arith.overflow<none>}> : (i32, i32) -> i32
%6 = "arith.addi"(%arg4, %5) <{overflowFlags = #arith.overflow<none>}> : (i32, i32) -> i32
"linalg.yield"(%6) : (i32) -> ()
}) {linalg.memoized_indexing_maps = [#map, #map1, #map2]} : (tensor<10x20xi32>, tensor<20x40xi16>, tensor<10x40xi32>) -> tensor<10x40xi32>
"util.return"(%3) : (tensor<10x40xi32>) -> ()
}) : () -> ()
}) : () -> ()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case then that's a bug. If I take the IR you just posted and run linalg-generalize-named-ops
I get an extui
#map = affine_map<(d0, d1) -> ()>
#map1 = affine_map<(d0, d1) -> (d0, d1)>
#map2 = affine_map<(d0, d1, d2) -> (d0, d2)>
#map3 = affine_map<(d0, d1, d2) -> (d2, d1)>
#map4 = affine_map<(d0, d1, d2) -> (d0, d1)>
module {
util.func public @matmul_extsi(%arg0: tensor<10x20xi32>, %arg1: tensor<20x40xi16>) -> tensor<10x40xi32> {
%c0_i32 = arith.constant 0 : i32
%0 = tensor.empty() : tensor<10x40xi32>
%1 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel"]} ins(%c0_i32 : i32) outs(%0 : tensor<10x40xi32>) {
^bb0(%in: i32, %out: i32):
linalg.yield %in : i32
} -> tensor<10x40xi32>
%2 = linalg.generic {indexing_maps = [#map2, #map3, #map4], iterator_types = ["parallel", "parallel", "reduction"]} ins(%arg0, %arg1 : tensor<10x20xi32>, tensor<20x40xi16>) outs(%1 : tensor<10x40xi32>) {
^bb0(%in: i32, %in_0: i16, %out: i32):
%3 = arith.extui %in_0 : i16 to i32
%4 = arith.muli %in, %3 : i32
%5 = arith.addi %out, %4 : i32
linalg.yield %5 : i32
} -> tensor<10x40xi32>
util.return %2 : tensor<10x40xi32>
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, seems like a bug. linalg-generalize-named-op
is producing different output than mlir-print-op-generic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh yeah I think the hidden region on linalg named ops is probably broken then. In all other contexts matmul_unsigned
is assumed to mean extui
. Looking at the list of linalg operations: https://mlir.llvm.org/docs/Dialects/Linalg/#operations
it looks like only matmul_unsigned has these unsigned extension semantics, so we can just add a special case in the pattern to fail if trying to fuse with a matmul_unsigned
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, looked at this further. linalg-generalize-named-ops
does print extsi
, but I wasn't running against the mlir I posted here. I started with a generic
(signed cast) -> matmul_unsigned
and used the modified pass from this pr. This manually places the extsi
inside of the matmul_unsigned
's region, but it doesn't get printed out. I'm not sure if changing regions like this is allowed/good.
Maybe it makes sense to only fold extsi with signed variants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if changing regions like this is allowed/good.
Agree :) although I see it more as a function of named ops being over engineered. It's probably because we're trying to use interfaces instead of just using the named op builders, but linalg is awkward...
18d2460
to
d37eaf1
Compare
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
d37eaf1
to
abe6b8d
Compare
Mahesh added similar changes to shared/sdxl_quantized here 7eb8280. But as Quinn noted, we cant fuse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
%2 = tensor.empty() : tensor<10x40xi32> | ||
%3 = arith.constant 0 : i32 | ||
%4 = linalg.fill ins(%3 : i32) outs(%2 : tensor<10x40xi32>) -> tensor<10x40xi32> | ||
%5 = linalg.matmul ins(%arg0, %1 : tensor<10x20xi32>, tensor<20x40xi32>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if changing regions like this is allowed/good.
Agree :) although I see it more as a function of named ops being over engineered. It's probably because we're trying to use interfaces instead of just using the named op builders, but linalg is awkward...
I'm still hooking tests up so they'll run on presubmit, but I think this fixed int8 punet compilation for Can't quite follow the paper trail here, but if this was what fixed it, thanks! |
It's papering over backend issues, but is a reasonable fix |
nod-ai/SHARK-ModelDev#755 Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu> Signed-off-by: Lubo Litchev <lubol@google.com>
nod-ai/SHARK-ModelDev#755