iree-org · kuhar · Dec 11, 2024 · Dec 11, 2024 · Dec 11, 2024 · Dec 11, 2024
@@ -18,4 +18,5 @@ repository.
 
 * [Glossary](./glossary.md)
 * [Optimization options](./optimization-options.md)
+* [Tuning](./tuning.md)
 * [Extensions](./extensions.md)
@@ -0,0 +1,136 @@
+---
+icon: octicons/meter-16
+---
+
+# Tuning
+
+This page documents support for IREE dispatch tuning. The compiler supports
+both default and user-provided tuning specs (specifications) that override
+compiler heuristics that guide dispatch code generation. In our experience,
+tuning specs can provide meaningful speedup of model execution. For example, we
+achieved a ~10% improvement on the Stable Diffusion XL (SDXL) model with the
+MI300X GPU.
+
+## Tuning specs
+
+The default specs are shipped with the IREE compiler and are target-specific.
+We aim to provide default tuning specs that cover the most in-demand hardware
+and dispatches from most popular ML models, although we do not guarantee
+completeness.
+
+User-provided tuning specs are a mechanism that allows for users to get the
+best performance on custom models and hardware targets without having to modify
+the compiler source code or needlessly special-case compiler heuristics.
+
+Currently, the dispatch tuner that generates tuning specs is still experimental
+and hosted
+[in an external repo](https://github.com/nod-ai/shark-ai/tree/main/tuner). This
+document describes how to work with tuning specs generated by the SHARK Tuner
+or produced manually, but it does not go into detail on how to generate these
+specs.
+
+## Flags
+
+The use of tuning specs in `iree-compile` is controlled with the following
+flags:
+
+* `--iree-codegen-enable-default-tuning-specs` -- enables or disables the
+  default tuning specs shipped with the compiler.
+* `--iree-codegen-tuning-spec-path` -- loads a user-specified tuning spec.
+* `--iree-codegen-dump-tuning-specs-to` -- dumps final tuning specs to a
+  directory or standard output.
+
+Note that both default and user-provided specs can be enabled at the same time.
+The compiler will link them together and invoke the user-provided spec before
+attempting the default one.
+
+## Anatomy of a tuning spec
+
+### Example
+
+```mlir
+module @my_spec attributes { transform.with_named_sequence } {
+transform.named_sequence @apply_op_config(%op: !transform.any_op {transform.readonly},
+                                          %config: !transform.any_param {transform.readonly}) {
+  transform.annotate %op "compilation_info" = %config : !transform.any_op, !transform.any_param
+  transform.yield
+}
+
+transform.named_sequence
+@match_mmt_f16_f16_f32(%root: !transform.any_op {transform.readonly}) -> !transform.any_op {
+  transform.match.operation_name %root ["linalg.generic"] : !transform.any_op
+  %ins, %outs = transform.iree.match.cast_compatible_dag_from_root %root {
+    ^bb0(%lhs: tensor<?x?xf16>, %rhs: tensor<?x?xf16>, %out: tensor<?x?xf32>):
+    %7 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>,
+                                          affine_map<(d0, d1, d2) -> (d1, d2)>,
+                                          affine_map<(d0, d1, d2) -> (d0, d1)>],
+                          iterator_types = ["parallel", "parallel", "reduction"]}
+        ins(%lhs, %rhs : tensor<?x?xf16>, tensor<?x?xf16>) outs(%out : tensor<?x?xf32>) {
+      ^bb0(%in: f16, %in_0: f16, %acc: f32):
+        %8 = arith.extf %in : f16 to f32
+        %9 = arith.extf %in_0 : f16 to f32
+        %10 = arith.mulf %8, %9 : f32
+        %11 = arith.addf %acc, %10 : f32
+        linalg.yield %11 : f32
+      } -> tensor<?x?xf32>
+  } : (!transform.any_op) -> (!transform.any_value, !transform.any_value)
+  transform.yield %root : !transform.any_op
+}
+
+transform.named_sequence
+@match_mmt_2048x1280x5120_f16_f16_f32(%matmul: !transform.any_op {transform.readonly})
+  -> (!transform.any_op, !transform.any_param) {
+  %mmt = transform.include @match_mmt_f16_f16_f32 failures(propagate) (%matmul)
+    : (!transform.any_op) -> !transform.any_op
+  %lhs = transform.get_operand %matmul[0] : (!transform.any_op) -> !transform.any_value
+  %rhs = transform.get_operand %matmul[1] : (!transform.any_op) -> !transform.any_value
+  transform.iree.match.cast_compatible_type %lhs = tensor<2048x5120xf16> : !transform.any_value
+  transform.iree.match.cast_compatible_type %rhs = tensor<1280x5120xf16> : !transform.any_value
+  %config = transform.param.constant #iree_codegen.compilation_info<
+    lowering_config = #iree_gpu.lowering_config<{promote_operands = [0, 1],
+                                                 mma_kind = #iree_gpu.mma_layout<MFMA_F32_16x16x16_F16>,
+                                                 subgroup_m_count = 2, subgroup_n_count = 2,
+                                                 reduction = [0, 0, 64],
+                                                 workgroup = [64, 128, 0]}>,
+    translation_info = #iree_codegen.translation_info<pipeline = LLVMGPUVectorDistribute
+      workgroup_size = [256, 1, 1] subgroup_size = 64,
+      {gpu_pipeline_options = #iree_gpu.pipeline_options<prefetch_shared_memory = true>}>
+  > -> !transform.any_param
+  transform.yield %matmul, %config : !transform.any_op, !transform.any_param
+}
+
+transform.named_sequence
+@__kernel_config(%variant_op: !transform.any_op {transform.consumed}) -> !transform.any_op
+  attributes { iree_codegen.tuning_spec_entrypoint } {
+  %res = transform.foreach_match in %variant_op
+    @match_mmt_2048x1280x5120_f16_f16_f32 -> @apply_op_config
+    : (!transform.any_op) -> !transform.any_op
+  transform.yield %res : !transform.any_op
+}
+}
+```
+
+### Explanation
+
+Tuning specs are
+[transform dialect](https://mlir.llvm.org/docs/Dialects/Transform/) libraries
+that conform to the following format:
+
+* All tuning spec entry points (named sequence ops) are marked with the
+  `iree_codegen.tuning_spec_entrypoint` attribute. They have a single argument
+  of type `!transform.any_op` and return a single value of type
+  `!transform.any_op`.
+* All entry points in the final tuning specs must either read
+  (`transform.readonly`) or consume (`transform.consumed`) the argument.
+
+The tuning spec above attempts to match `linalg.generic` ops that correspond to the
+matmul operation with the RHS operand transposed (a.k.a. mmt) of shape
+`2048x1280x5120` and `f16` operand element types and `f32` result element type.
+
+If the match succeeds, the tuning spec applies the `compilation_info` attribute
+that will drive the code generation. This attribute is considered a compiler
+implementation detail; in general, each codegen pipeline has its own
+requirements as to what is considered a valid compilation info and how to
+interpret it.
+
+Tuning specs get executed by the 'Materialize User Configs` pass.
@@ -188,6 +188,7 @@ nav:
       - "Other topics":
           - Glossary: "reference/glossary.md"
           - Optimization options: "reference/optimization-options.md"
+          - Tuning: "reference/tuning.md"
           - Extensions: "reference/extensions.md"
   - "Developers":
       - "developers/index.md"