Include default tuning specs with the compiler #19214

kuhar · 2024-11-19T21:06:53Z

We want to be able to ship a library of default tuning specs with IREE, so that users can get good performance out of the box on known key operations. This is applied after dispatch formation and realized around the time we run the MaterializeUserConfigs pass. Currently, there's not default spec provided, but the users can supply their own transform dialect libraries.

I propose the following solutions, which bears similarity with how we plan to handle ukernels. There are a lot of things that still need to be worked out, but this is roughly the brake-down of of key requirements/properties:

We will provide default tuning specs for key architectures like gfx942. Each architecture will have its own tuning spec file. The initial target is performant SDXL on gfx942. @nithinsubbiah
During IREE build, these tuning specs will be given to iree-opt, verified, and saved as mlir text files. These text files will be embedded into the final iree compiler binary. At runtime, the compiler will be able to access them as memory buffers. @kuhar
- [Codegen][Tuner] Add support for default tuning specs #19394
Just before MaterializeUserConfigs, we will load the compatible tuning spec, if any, and the user-provided transform libraries. @kuhar
- [Codegen] Clean up MaterializeUserConfigs. NFC. #19207
- [Flow] Drop transform dialect based dispatch formation #19261
We will put both transform dialect libraries in the same module, so that there's only one library to handle. @kuhar
We will then materialize the transform library in an opaque resource similar to #hal.executable.object. We don't want to embed the library as a nested module to prevent accidental visitation with walk/pattern rewrite drivers, as the transform libraries can contain ops very similar to the kernel code (e.g., linalg.generic). @kuhar
- [Codegen] Add pass to materialize tuning specs #19337
- [Codegen][Tuner] Clarify tuning spec linking order. NFC. #19370
We will run the transform dialect interpreter, like we do today, in MaterializeUserConfigs. @kuhar
- [Codegen] Load transform library only once in MaterializeUserConfigs #19313
- [Codegen][Tuner] Allow tuning specs in the LLVMGPU pipeline #19359
Add a verifier for tuning specs. We can add a discardable attr verifier for entry points iree_codegen.tuning_spec_entrypoint. @bangtianliu
Add documentation for tuning specs. @kuhar
- [Docs][Tuner] Add initial tuning spec docs #19462

Spec maintanance policy:

New tests will make sure that the default tuning specs are up-to-date and in a working state.
In case of minor syntactic changes, it will be the responsibility of the patch author to update the specs.
In case of substantial upstream mlir dialect changes, the author of the mlir change will be responsible for the upgrade. If that person is not an IREE contributor, the responsibility would then fall on the author of the tuning spec.

The text was updated successfully, but these errors were encountered:

kuhar · 2024-11-19T21:07:24Z

cc: @MaheshRavishankar @benvanik @bjacob @stellaraccident

MaheshRavishankar · 2024-11-19T21:25:12Z

Thanks Jakub. This is a great summary and description. Few comments below.

During IREE build, these tuning specs will be given to iree-opt, verified, and saved as mlir bytecode files. These bytecode files will be embedded into the final iree compiler binary. At runtime, the compiler will be able to access them as memory buffers.

I dont think they should be included as part of the final iree compiler binary. I think we should be able to put in a location that iree-compile can access it from? We also need to be able to provide/append a way for user to override the tuning spec picked up

Just before MaterializeUserConfigs, we will load the compatible tuning spec, if any, and the user-provided transform libraries.

We will put both transform dialect libraries in the same module, so that there's only one library to handle.

Nice. Thanks for incorporating this. We should build this is a separate utility/tool that can be tested independently (and maybe invoked from within the compiler to be able to append to existing tuning).

Rest of the stuff looks good to me.

cc @erman-gurses @bangtianliu @nithinsubbiah

ScottTodd · 2024-11-19T21:35:40Z

During IREE build, these tuning specs will be given to iree-opt, verified, and saved as mlir bytecode files. These bytecode files will be embedded into the final iree compiler binary. At runtime, the compiler will be able to access them as memory buffers.

I dont think they should be included as part of the final iree compiler binary. I think we should be able to put in a location that iree-compile can access it from? We also need to be able to provide/append a way for user to override the tuning spec picked up

I think we can figure these details out partway through the implementation work, but starting with brainstorming and design work now is a good idea. Once we have the basic mechanism in place to use tuning specs, how those specs are provided will matter more.

We can survey other similar projects to see what they do and what users will expect. Bundling the files as part of the compiler distribution (either embedded directly in the libIREECompiler.so or in separate files that we package together) will certainly be nice for a self-contained / hermetic compiler. I could see users wanting to maintain their own spec libraries to be shared between multiple developers, in which case we'd want a way to load those libraries from a local file path or even a remote URL. That could get complex with priority loading / ordering if there are multiple matching specs, generic specs that can act as a fallback if no more specific spec matches, etc.

kuhar · 2024-11-20T00:02:45Z

During IREE build, these tuning specs will be given to iree-opt, verified, and saved as mlir bytecode files. These bytecode files will be embedded into the final iree compiler binary. At runtime, the compiler will be able to access them as memory buffers.

I dont think they should be included as part of the final iree compiler binary. I think we should be able to put in a location that iree-compile can access it from?

IIUC, ukernels are also stored in the compiler binary and this part mirrors that solution. This makes distribution easier because you don't have to carry an additional directory with tuning specs and worry about installing it somewhere relative to the compiler.

We also need to be able to provide/append a way for user to override the tuning spec picked up

My plan is to append all specs (both the default arch-specific one and user-provided one) into one transform library module, and have that embedded in an opaque attribute. I attempted to describe it the bullet points towards the middle.

This pass is meant for combining multiple tuning specs (e.g., a user-provided one and a default one). We expect the input module to have nested sub-modules with named sequences marked with the `iree_codegen.tuning_spec_entrypoint` unit attributes. The pass collects all such tuning specs and introduce a new named sequence that includes all the other tuning spec entry points. The order of inclusion is the same as the in which these nested tuning specs appear in the IR. Issue: #19214

…19313) Hoist the library loading logic out of the loop that configures functions. This is in preparation for adding tuning spec loading from a new module attr. Issue: #19214

This pass is meant for combining multiple tuning specs (e.g., a user-provided one and a default one). We expect the input module to have nested sub-modules with named sequences marked with the `iree_codegen.tuning_spec_entrypoint` unit attributes. The pass collects all such tuning specs and introduce a new named sequence that includes all the other tuning spec entry points. The order of inclusion is the same as the in which these nested tuning specs appear in the IR. Issue: iree-org#19214

…ree-org#19313) Hoist the library loading logic out of the loop that configures functions. This is in preparation for adding tuning spec loading from a new module attr. Issue: iree-org#19214

... and update 'Materialize User Configs' to pick up those tuning specs. The overall flow is as follows: * We pick up any user-specified tuning specs in `materialize tuning specs` and link them into a single transform dialect library module. * We serialize that linked tuning spec as MLIR bytecode. * We embed this MLIR bytecode as a module attribute. This is so that none of the subsequent passes will accidentally `walk` or otherwise modify it. * In `materialize user configs`, we first check if there are any transform libraries provided. If not, then we check if the tuning spec is present. * We deserialize the tuning spec attribute into a transform dialect library module and execute it. * We remove the serialized tuning spec from the module, as it's no longer needed. I also modified `getOrLoadTransformLibraryModule` so that it doesn't use the `transform::detail::assembleTransformLibraryFromPaths` function, because it has some logic to perform library merging that would overwrite module symbol names. There's no need to call it anyway, since we are loading a single library at a time. This is not added to any codegen pipeline yet -- I will do that in a future PR. Issue: #19214

This pass is meant for combining multiple tuning specs (e.g., a user-provided one and a default one). We expect the input module to have nested sub-modules with named sequences marked with the `iree_codegen.tuning_spec_entrypoint` unit attributes. The pass collects all such tuning specs and introduce a new named sequence that includes all the other tuning spec entry points. The order of inclusion is the same as the in which these nested tuning specs appear in the IR. Issue: iree-org#19214 Signed-off-by: Giacomo Serafini <179146510+giacs-epic@users.noreply.github.com>

…ree-org#19313) Hoist the library loading logic out of the loop that configures functions. This is in preparation for adding tuning spec loading from a new module attr. Issue: iree-org#19214 Signed-off-by: Giacomo Serafini <179146510+giacs-epic@users.noreply.github.com>

... and update 'Materialize User Configs' to pick up those tuning specs. The overall flow is as follows: * We pick up any user-specified tuning specs in `materialize tuning specs` and link them into a single transform dialect library module. * We serialize that linked tuning spec as MLIR bytecode. * We embed this MLIR bytecode as a module attribute. This is so that none of the subsequent passes will accidentally `walk` or otherwise modify it. * In `materialize user configs`, we first check if there are any transform libraries provided. If not, then we check if the tuning spec is present. * We deserialize the tuning spec attribute into a transform dialect library module and execute it. * We remove the serialized tuning spec from the module, as it's no longer needed. I also modified `getOrLoadTransformLibraryModule` so that it doesn't use the `transform::detail::assembleTransformLibraryFromPaths` function, because it has some logic to perform library merging that would overwrite module symbol names. There's no need to call it anyway, since we are loading a single library at a time. This is not added to any codegen pipeline yet -- I will do that in a future PR. Issue: iree-org#19214 Signed-off-by: Giacomo Serafini <179146510+giacs-epic@users.noreply.github.com>

These default specs are target architecture-specific and will be shipped with the compiler. * Default specs belong to target plugins and get embedded in `libIREECompiler.so`, just like ukernels. * Plugins then register their default tuning specs with the default embedded directory. * We store them as mlir text. We can't easily assemble them as mlir bytecode without taking a circular dependency on iree-opt. We can revisit this in the future and add a new tool `iree-as` that will only link with dialects. * After the initial loading, we cache the default specs in the IREE codegen dialect transform library manager. * Add a placeholder spec for gfx942. * Document and test the inclusion order. User specs come before default specs. Issue: #19214

This is a fixup to the tuning spec materialization that makes default and user-provided specs work e2e. As an example, a working spec for `linalg.matmul_transpose_b` is provided for gfx942. * Allow for tuning spec entry points that consume their argument op. This is so that tuning specs can use `transform.foreach_match`. * Require all tuning spec entry points to return `any_op`, so that we can chain includes. This works for both consumed and readonly args. * Add a test to show that user-provided tuning specs take precedence over default ones. * Work around a transform interpreter bug when multiple named sequenced across different modules share the same symbol name. Issue: iree-org#19214 Signed-off-by: Jakub Kuderski <jakub@nod-labs.com>

This is a fixup to the tuning spec materialization that makes default and user-provided specs work e2e. As an example, a working spec for `linalg.matmul_transpose_b` is provided for gfx942. * Allow for tuning spec entry points that consume their argument op. This is so that tuning specs can use `transform.foreach_match`. * Require all tuning spec entry points to return `any_op`, so that we can chain includes. This works for both consumed and readonly args. * Add a test to show that user-provided tuning specs take precedence over default ones. * Work around a transform interpreter bug when multiple named sequenced across different modules share the same symbol name. Issue: #19214 --------- Signed-off-by: Jakub Kuderski <jakub@nod-labs.com>

General intro, list the main flags, show an example. Issue: #19214

This PR is relevant to task in #19214: add [a discardable attr verifier](https://mlir.llvm.org/docs/DefiningDialects/#discardable-attribute-verification) for entry points iree_codegen.tuning_spec_entrypoint --------- Signed-off-by: Bangtian Liu <liubangtian@gmail.com>

This PR adds the unit attribute` iree_codegen.tuning_spec_with_default_entrypoint` to indicate the default tuning spec (typically or user-provided tuning spec but can work in the same manner) must contain one named sequence operation marked with `__kernel_config`, also add the corresponding verification in `verifyOperationAttribute` function. This PR is relevant to task in #19214: add [a discardable attr verifier](https://mlir.llvm.org/docs/DefiningDialects/#discardable-attribute-verification) for entry points iree_codegen.tuning_spec_entrypoint Context: Jakub proposed two approaches for verifying the default tuning specification: 1. Implement a dedicated pass for verification. 2. Add a new attribute and update the verifyOperationAttribute function accordingly. After careful consideration, we agreed on the second approach to avoid introducing an additional pass, ensuring a simple implementation. --------- Signed-off-by: Bangtian Liu <liubangtian@gmail.com>

kuhar · 2025-01-03T17:38:40Z

I'm going to close this as completed because all of the pieces are in place now and the remaining work is to grow the set of default tuning specs included with the compiler.

kuhar added codegen Shared code generation infrastructure and dialects performance ⚡ Performance/optimization related work across the compiler and runtime labels Nov 19, 2024

kuhar self-assigned this Nov 19, 2024

kuhar mentioned this issue Nov 19, 2024

[tuner] Reduce maintenance burden and prepare for more codegen pipelines nod-ai/shark-ai#453

Closed

12 tasks

kuhar added the enhancement ➕ New feature or request label Nov 19, 2024

This was referenced Nov 20, 2024

Add Tuning Support (Umbrella Issue) #16952

Open

[Codegen][Tuner] Add pass to link tuning specs #19281

Merged

kuhar mentioned this issue Nov 27, 2024

[Codegen] Load transform library only once in MaterializeUserConfigs #19313

Merged

kuhar mentioned this issue Nov 28, 2024

[Codegen] Add pass to materialize tuning specs #19337

Merged

kuhar mentioned this issue Dec 4, 2024

[Codegen][Tuner] Allow tuning specs in the LLVMGPU pipeline #19359

Merged

kuhar added the tuner label Dec 4, 2024

kuhar mentioned this issue Dec 6, 2024

[Codegen][Tuner] Add support for default tuning specs #19394

Merged

kuhar mentioned this issue Dec 11, 2024

[Codegen][Tuner] Make default and user-provided specs work #19449

Merged

kuhar mentioned this issue Dec 11, 2024

[Docs][Tuner] Add initial tuning spec docs #19462

Merged

kuhar added a commit that referenced this issue Dec 11, 2024

[Docs][Tuner] Add initial tuning spec docs (#19462)

52ce3d3

General intro, list the main flags, show an example. Issue: #19214

bangtianliu mentioned this issue Dec 16, 2024

[Codegen][Tuner] attr verifier for tuning specs #19486

Merged

ScottTodd mentioned this issue Dec 16, 2024

Release tracker - 3.1.0 #19192

Closed

6 tasks

bangtianliu mentioned this issue Dec 19, 2024

[Codegen][Tuner] verifier for the default tuning spec #19525

Merged

kuhar closed this as completed Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include default tuning specs with the compiler #19214

Include default tuning specs with the compiler #19214

kuhar commented Nov 19, 2024 •

edited

Loading

kuhar commented Nov 19, 2024

MaheshRavishankar commented Nov 19, 2024

ScottTodd commented Nov 19, 2024

kuhar commented Nov 20, 2024

kuhar commented Jan 3, 2025

Include default tuning specs with the compiler #19214

Include default tuning specs with the compiler #19214

Comments

kuhar commented Nov 19, 2024 • edited Loading

kuhar commented Nov 19, 2024

MaheshRavishankar commented Nov 19, 2024

ScottTodd commented Nov 19, 2024

kuhar commented Nov 20, 2024

kuhar commented Jan 3, 2025

kuhar commented Nov 19, 2024 •

edited

Loading