Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrating regression suite to iree-test-suite #20124

Closed
wants to merge 41 commits into from

Conversation

geomin12
Copy link

@geomin12 geomin12 commented Feb 27, 2025

Moving regression suite over to iree-test-suite and calling those regression tests via composite action

Comment on lines -130 to -139
--goldentime-rocm-e2e-ms 1100.0 \
--goldentime-rocm-unet-ms 255.0 \
--goldentime-rocm-clip-ms 14.5 \
--goldentime-rocm-vae-ms 310.0 \
--goldendispatch-rocm-unet 1236 \
--goldendispatch-rocm-clip 967 \
--goldendispatch-rocm-vae 208 \
--goldensize-rocm-unet-bytes 2280000 \
--goldensize-rocm-clip-bytes 860000 \
--goldensize-rocm-vae-bytes 840000 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure these values can still be tracked in the IREE repo before migrating, or switch to some other mechanism for watching for benchmark metrics regressions, like comparing the observed values against a baseline stored in a remote database.

See also the discussion about updating benchmark limits from this morning here on Discord.

The workflow we want is:

  1. Developer changes the compiler code in a pull request
  2. The CI runs tests and checks the metrics
  3. The metrics are compared against a baseline (either sourced from the repository itself or some external source)
  4. If the metrics change significantly, the developer can choose to accept those changes or adjust their code. Accepting the changes can mean updating a line of source code like here, or it can mean merging the change and letting the database pick up the new results automatically, or something else.

What we want to avoid is having unactionable errors. If the expected results are stored in another repository and test results not matching those expectations result in failed workflow runs, the developer is stuck.

Here's an example of what we had last year on running on GCP infra: #17765 (comment). Benchmarks would run and report stats. If some stats were outside of some ranges, the top N out of all stats would be summarized. As long as all models compiled and ran, the workflow runs would succeed regardless of those benchmark results. After a commit would land on main from a merged pull request, the database would be updated to track the new benchmark results and they would be visible on a website.

Today we post job summaries with the benchmark results but they are buried, for example: https://github.com/iree-org/iree/actions/runs/13593525275#summary-38005698303 . We could elevate those to pull request comments, especially if the results are outside of expected ranges. Only some parts of the project can affect the current benchmarks though, so I'd be wary of generating too much noise for automated PR comments. See for example the MiGraphX repo that runs benchmarks and posts comments on every PR, even those that only update documentation, like ROCm/AMDMIGraphX#3832 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense! the stats are still tracked like before this line in iree-test-suite, but it would be helpful to display that information in the PR comments.

In terms of storing the data in a db, I was going to hold off on that until the dashboard work is productionalized, so we can get a UI along with the data

rocm-chip: gfx942
backend: rocm
sku: mi300
runs-on: nodai-amdgpu-mi300-x86-64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI we may want to migrate workflows from these nodai-amdgpu-mi300-x86-64 runners to a new cluster of runners with the linux-mi300-gpu-1 label: nod-ai/shark-ai#793

That should be separate from this PR though.

yzhang93 and others added 17 commits March 5, 2025 13:02
There are some attribute changes upstream:

- TOSA: llvm/llvm-project#128115
- NVPTX: llvm/llvm-project#127736

---------

Signed-off-by: yzhang93 <zhyuhang88@gmail.com>
Signed-off-by: geomin12 <geomin12@amd.com>
The convention is [compiler target, runtime HAL driver/device].

This was reported [here on
Discord](https://discord.com/channels/689900678990135345/1166024193599615006/1341505000303759401).

ci-exactly: build_packages, test_sharktank
Signed-off-by: geomin12 <geomin12@amd.com>
Rename changes followed:

- llvm/llvm-project#128751
- llvm/llvm-project#128869

---------

Signed-off-by: yzhang93 <zhyuhang88@gmail.com>
Signed-off-by: geomin12 <geomin12@amd.com>
…ed. (iree-org#20099)

The revision teaches the SpecializeEncoding pass about the difference
between dropping encodings and failing on resolving encodings. It
introduces `IdentityEncodingAttr` resolver that always makes identity
encodings. The implementation detail is using
iree_encoding.pad_encoding_layout with zero elements padding config.

If an encoding resolver (e.g., UnsupportedEncodingAttr) can not get the
resolved layout, the pass signals a failure.

With the change, the HAL default encoding resolver is switched from
UnsupportedEncodingAttr to IdentityEncodingAttr.

---------

Signed-off-by: hanhanW <hanhan0912@gmail.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Add a verifier for tile sizes.

---------

Signed-off-by: erman-gurses <erman@nod-labs.com>
Signed-off-by: geomin12 <geomin12@amd.com>
…ree-org#20114)

With this change, I am able to now follow our ROCm instructions at
https://iree.dev/guides/deployment-configurations/gpu-rocm/ on Windows,
running MobileNet and getting the expected outputs 🎉

This does two things:

1. Only try to enable peering between devices if more than 1 device is
going to be used
2. Check `hipDeviceCanAccessPeer` before calling
`hipDeviceEnablePeerAccess` so there is a chance to log a better error
message

### More information

On Windows 11 with 2x w7900 and the 6.2 HIP SDK, the previous code
returned `hipErrorInvalidDevice` when calling
`hipDeviceEnablePeerAccess`. On the same system under Linux (unknown
HIP/ROCm version but pretty similar), the code works without errors. In
both cases, I only requested a single device, so this peering code did
not need to run at all. See also [this discussion on
Discord](https://discord.com/channels/689900678990135345/1282818085153407038/1344105197353570377).

See documentation for these APIs at
https://rocm.docs.amd.com/projects/HIP/en/latest/doxygen/html/group___peer_to_peer.html.

### Testing

> [!WARNING]
> I've only tested on the machines I have easy access to. I think we
still need CI coverage for multi-device (e.g.
iree-org#19995)

Signed-off-by: geomin12 <geomin12@amd.com>
…ree-org#20126)

The revision simplifies the encoding content with `encoded` because
encoding can have complicated parameters. E.g., they can have arbitrary
dictionary attribute, and naming the variable with such information is
making readability worse.

Fixes iree-org#20090

Signed-off-by: hanhanW <hanhan0912@gmail.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Make sure `rdna4@vulkan` is recognized as a Vulkan target. Handle wmma
shapes not supported by the cooperative matrix extension.

Signed-off-by: geomin12 <geomin12@amd.com>
…ree-org#20120)

This PR flipped the switch for `allow-zero-slice` option, which means
zero slice will be tolerated for igemm that requires a padding. The
dynamic if condition check is unnecessary because we compute the K loop
bound in a reasonable fashion and zero slice situation shouldn't happen.
The motivation of the PR is performance benefit to remove an unnecessary
conditional from padded gemm global loading.

Signed-off-by: jerryyin <zhuoryin@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: Jakub Kuderski <jakub@nod-labs.com>
Signed-off-by: geomin12 <geomin12@amd.com>
When running the ElideTimepoints pass on a large program (llama 8b for
example) we exhaust our state, and fail to elide timepoints.

This significantly reduces the number of timepoints that end up in the
waiting set in those cases.

The number of timepoint waits in the stream output are as follows.
```
23765 << State Size 256
8801 << State Size 8192
```

Signed-off-by: Andrew Woloszyn <awoloszy@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
No local patch is carried.

---------

Signed-off-by: yzhang93 <zhyuhang88@gmail.com>
Signed-off-by: geomin12 <geomin12@amd.com>
…ree-org#20125)

The small constants can be inlined into dispatches, and they could
become the source of `set_encoding` ops. The revision adds the support
for hoisting both operations out if the source is a constant.

---------

Signed-off-by: hanhanW <hanhan0912@gmail.com>
Signed-off-by: geomin12 <geomin12@amd.com>
…ups (iree-org#20123)

Adds cleanup patterns to run during
`tileConsumerAndFuseProducersUsingSCF` in
TileandDistributeToWorkgroupsUsingForall. The PR adds the
`SwapExtractWithExpandPattern`, and some extra cleanup patterns for
slices.

This enables codegen for set_encoding ops on GPU, since the
materialization of set_encoding contains an expand_shape that will block
distribution tiling of the full dispatch.

---------

Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Signed-off-by: geomin12 <geomin12@amd.com>
…iree-org#20133)

The previous test did not catch the case because it did not have an
affinity. The revision adds the affinity to the test, which ensures it
is tested.

Signed-off-by: hanhanW <hanhan0912@gmail.com>
Signed-off-by: geomin12 <geomin12@amd.com>
…ree-org#20131)

Based on the announcement

https://www.amd.com/en/newsroom/press-releases/2025-2-28-amd-unveils-next-generation-amd-rdna-4-architectu.html.

Also sort the list of rocm and vulkan alphabetically, since any other
order is hard to maintain by hand.

Signed-off-by: Jakub Kuderski <jakub@nod-labs.com>
Signed-off-by: geomin12 <geomin12@amd.com>
sogartar and others added 21 commits March 5, 2025 13:02
…ree-org#20142)

When running the build action in the same thread we must still do a
start-finish to get an initialized completed future.

Signed-off-by: Boian Petkantchin <boian.petkantchin@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
This reflects improvements in the compiler for these models.

Signed-off-by: MaheshRavishankar <mravisha@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
The type that IREE's code refered to as f8E4M3, which is an 8-bit float
that has no infinities, 4 bits of exponent, and 3 bits of mantissa, is
called f8E4M3FN in MLIR and LLVM's APFloat. MLIR has a separate f8E4M3
type that has different semantics. The fact that IREE would sometimes be
using a different MLIR type from what we menat was causing crashes on
RDNA4.

Therefore, rename IREE's type to match

Signed-off-by: geomin12 <geomin12@amd.com>
Bumps the github-actions group with 3 updates:
[actions/download-artifact](https://github.com/actions/download-artifact),
[actions/cache](https://github.com/actions/cache) and
[dawidd6/action-download-artifact](https://github.com/dawidd6/action-download-artifact).

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: geomin12 <geomin12@amd.com>
…op. (iree-org#20148)

The encoding content could be vague because the attribute could be
descriptive. However, we realized that some encodings with different
content could result in the same encoding layout. E.g., a matmul
encoding can encode indexing maps for B/M/N/K dimension, while an
encoding attribute only cares about `K` dimension. One of the examples
is `IREE::GPU::GPUPadLayoutAttr`.

The revision introduces encoding compatibility in
SerializableEncodingAttrInterface and relax the verifier of
`stream.tensor.clone` op. If any encoding is not serialized, they are
compatible. If both encodings are serialized, it checks if they are
identical or not.

Note that it defers the real check after encoding specialization. If
they end up with different layouts, a failure is detected.

Signed-off-by: hanhanW <hanhan0912@gmail.com>
Signed-off-by: geomin12 <geomin12@amd.com>
…#20117)

I removed the duplicated code in the implementation of
`CPUDeviceEncodingLayoutResolverAttrInterface`,
`VMVXDeviceEncodingLayoutResolverAttrInterface`, and
`GPUDeviceEncodingLayoutResolverAttrInterface`. I provide a base class
`WrappedExternalModel` to provide `getConfiguration` and
`getEncodingInfo`, so the derived class only needs to implement their
own `getEncodingInfoImpl` method.

Fixing iree-org#20050

---------

Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>
Signed-off-by: geomin12 <geomin12@amd.com>
Drops a test in LLVMGPU/test/rocdl_pipeline_test.mlir according to
changes in iree-org#20152

---------

Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Signed-off-by: geomin12 <geomin12@amd.com>
…iree-org#20152)

With RDNA4, we now have cards that allow the OCP FP8 formats (f8E5M2 and
f8E4M3FN). This means that the previous check that caused any module
that used those formats to be rejected needs to be relaxed.

However, the backend uses the same intrinsics for the gfx942-only FNUZ
types and the OCP types. Therefore, the data type validity check needs
to be aware of the chipset being targetted in order to validate that the
dispatch can be compiled correctly. This patch implements this improved
set of checks.

In addition, it adds tests for these checks (sadly, we can't used
expected-error here because the pipeline scrips debug info) and improves
these checks to also look inside vector<> and memref<>.

(This also means that there is now an IREE-side mechanism to reject
compiling FP8 code pre-gfx942.)

Signed-off-by: geomin12 <geomin12@amd.com>
…rg#20161)

Fixes llama fp8 perf regression introduced by
iree-org#20106. The PR stopped the
linalg.generic from getting hoisted. This was causing a broadcast to get
fused and `tensor<1x1x131072x131072xi1>` to be recomputed on each
prefill call.

---------

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
Signed-off-by: geomin12 <geomin12@amd.com>
…g#20143)

This PR restricts the fusion of set_encoding ops to only dispatches with
element-wise linalg ops. We don't have good codegen for fusions with
reduction ops on GPU yet, so this will allow using the late
materialization data tiling path on GPU while the codegen improves.

---------

Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
@geomin12
Copy link
Author

geomin12 commented Mar 5, 2025

Apologies everyone, this PR has been open for quite some time. Will get those fixes in but will close for now to prevent notifications and not bother everyone!

@geomin12 geomin12 closed this Mar 5, 2025
@ScottTodd
Copy link
Member

Aw, the review history would have been nice to keep here. Looks like some git rebase issues got the branch into a bad state? Feel free to ask for help with git and pull request workflows.

@geomin12
Copy link
Author

geomin12 commented Mar 5, 2025

Aw, the review history would have been nice to keep here. Looks like some git rebase issues got the branch into a bad state? Feel free to ask for help with git and pull request workflows.

I am fixing it! Just feel bad for the notifications to folks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.