Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tuning Support (Umbrella Issue) #16952

Open
kuhar opened this issue Apr 2, 2024 · 4 comments
Open

Add Tuning Support (Umbrella Issue) #16952

kuhar opened this issue Apr 2, 2024 · 4 comments
Assignees
Labels
performance ⚡ Performance/optimization related work across the compiler and runtime tuner

Comments

@kuhar
Copy link
Member

kuhar commented Apr 2, 2024

This is an umbrella issue for implementing a tuning infrastructure. By tuning we mean a type of Profile Guided Optimization flow where we compile a program/model with extra instrumentation and use the runtime performance numbers to tweak the compilation parameters to achieve better performance. Concretely, this translates to benchmarking dispatches and using the results to apply different #ireee_codegen.compilation_info attributes to root ops, which includes lowering config (with tile sizes), translation info (with the codegen pipeline, workgroup/subgroup sizes, and mma schedule).

The main tuning loop will driven by a python script with the bulk of the implementation split across a few existing tools. We plan to implement it as follows:

  1. iree-compile allows for dumping instrumented benchmarks to a directory. This is similar to the existing flag --iree-hal-dump-executable-benchmarks-to=, with each benchmark being dumped in a separate file, possibly with some top-level shared manifest file if necessary.
    1. This will require adding a compiler instrumentation pass to add the instrumentation.
    2. The compiler annotates root ops that can be tuned.
  2. iree-run-module dumps profile data using the collected trace. This includes precise dispatch mapping and information about (dynamic) shapes, workgroup counts, etc.
  3. Tuning script parses the dumped trace and instrumented benchmarks and locates the root ops. It then detects if the root op is supported or not. The tuning script knows how to generate tuning configurations for a number of supported root ops (e.g., matmul, convolution, contraction). The tuning configs are materialized as transform dialect/PDL specs.
    1. We should allow for the evaluation order to be customizable/pluggable.
  4. Tuning script launches iree-compile as a separate process. First, the existing configuration is stripped and replaced with the one from the tuning spec, and then compilation resumes from the level of executable sources. The compilation either succeeds or the verifier rejects the compilation info. It is the responsibility of the compiler to reconcile the compilation info across all ops in the module.
  5. Tuning script benchmarks a number of dispatch candidates and selects the best one using the collected instrumentation (time). The tuning spec is added to the output file.
flowchart TD;
A[Input program] --> B(iree-compile)
B --> C[Instrumented vmfb]
C --> D(iree-run-module)
D --> E[Profile data]
B --> F[Instrumented benchmarks]

subgraph TuningLoop
  G(Tuner) --> H[Tuning spec]
  H --> I(iree-compile)
  I --> J[Instrumented dispatch vmfb]
  J --> K(iree-benchmark-module)
  K --> L[Benchmark result]
  L --> G
end
E --> G
F --> G

G --> O[Final tuning spec]
Loading

In the v0 for SD-family of models, we do not have to support dynamic shapes. Initially, the dispatches to tune will be selected by the user; later we can extend the tuning script to identify those automatically based on the generated trace.

@kuhar kuhar added the performance ⚡ Performance/optimization related work across the compiler and runtime label Apr 2, 2024
@kuhar
Copy link
Member Author

kuhar commented Apr 2, 2024

@stellaraccident
Copy link
Collaborator

Nice / thank you! Various people have been doing this in a pretty ad-hoc way for years, and it is definitely profitable to do. Would be really nice to have it be a good and supported flow!

kuhar added a commit to kuhar/iree that referenced this issue May 10, 2024
Support disabling workgrouop reordering and shared memory optimization
passes based on translation info config entries. Because these are just
named unit attributes, they do not require custom attributes defined in
tablegen.

These are intended for tuning.

Issue: iree-org#16952
kuhar added a commit to kuhar/iree that referenced this issue May 12, 2024
Support disabling workgrouop reordering and shared memory optimization
passes based on translation info config entries. Because these are just
named unit attributes, they do not require custom attributes defined in
tablegen.

These are intended for tuning.

Issue: iree-org#16952
kuhar added a commit to kuhar/iree that referenced this issue May 13, 2024
Support disabling workgrouop reordering and shared memory optimization
passes based on translation info config entries. Because these are just
named unit attributes, they do not require custom attributes defined in
tablegen.

These are intended for tuning.

Issue: iree-org#16952
kuhar added a commit that referenced this issue May 13, 2024
Support disabling workgroup reordering and shared memory optimization
passes based on translation info config entries. Because these are just
named unit attributes, they do not require custom attributes defined in
tablegen.

These are intended for tuning.

Issue: #16952
bangtianliu pushed a commit to bangtianliu/iree that referenced this issue Jun 5, 2024
…rg#17340)

Support disabling workgroup reordering and shared memory optimization
passes based on translation info config entries. Because these are just
named unit attributes, they do not require custom attributes defined in
tablegen.

These are intended for tuning.

Issue: iree-org#16952
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this issue Jul 30, 2024
…rg#17340)

Support disabling workgroup reordering and shared memory optimization
passes based on translation info config entries. Because these are just
named unit attributes, they do not require custom attributes defined in
tablegen.

These are intended for tuning.

Issue: iree-org#16952
Signed-off-by: Lubo Litchev <lubol@google.com>
@kuhar
Copy link
Member Author

kuhar commented Sep 6, 2024

The scripts that drive the tuning loop landed in the sharktank repo: nod-ai/shark-ai#141 and nod-ai/shark-ai#158.
This is a temporary location, as the tuner in the current form only supports the LLVMGPUVectoDistribute pipeline and was only tested on the SDXL model. From there, we should expand to more targets and models, and then 'graduate' the code to a location under iree-org.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance ⚡ Performance/optimization related work across the compiler and runtime tuner
Projects
None yet
Development

No branches or pull requests

3 participants