-
Notifications
You must be signed in to change notification settings - Fork 673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrating regression suite to iree-test-suite #20124
Conversation
--goldentime-rocm-e2e-ms 1100.0 \ | ||
--goldentime-rocm-unet-ms 255.0 \ | ||
--goldentime-rocm-clip-ms 14.5 \ | ||
--goldentime-rocm-vae-ms 310.0 \ | ||
--goldendispatch-rocm-unet 1236 \ | ||
--goldendispatch-rocm-clip 967 \ | ||
--goldendispatch-rocm-vae 208 \ | ||
--goldensize-rocm-unet-bytes 2280000 \ | ||
--goldensize-rocm-clip-bytes 860000 \ | ||
--goldensize-rocm-vae-bytes 840000 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure these values can still be tracked in the IREE repo before migrating, or switch to some other mechanism for watching for benchmark metrics regressions, like comparing the observed values against a baseline stored in a remote database.
See also the discussion about updating benchmark limits from this morning here on Discord.
The workflow we want is:
- Developer changes the compiler code in a pull request
- The CI runs tests and checks the metrics
- The metrics are compared against a baseline (either sourced from the repository itself or some external source)
- If the metrics change significantly, the developer can choose to accept those changes or adjust their code. Accepting the changes can mean updating a line of source code like here, or it can mean merging the change and letting the database pick up the new results automatically, or something else.
What we want to avoid is having unactionable errors. If the expected results are stored in another repository and test results not matching those expectations result in failed workflow runs, the developer is stuck.
Here's an example of what we had last year on running on GCP infra: #17765 (comment). Benchmarks would run and report stats. If some stats were outside of some ranges, the top N out of all stats would be summarized. As long as all models compiled and ran, the workflow runs would succeed regardless of those benchmark results. After a commit would land on main
from a merged pull request, the database would be updated to track the new benchmark results and they would be visible on a website.
Today we post job summaries with the benchmark results but they are buried, for example: https://github.com/iree-org/iree/actions/runs/13593525275#summary-38005698303 . We could elevate those to pull request comments, especially if the results are outside of expected ranges. Only some parts of the project can affect the current benchmarks though, so I'd be wary of generating too much noise for automated PR comments. See for example the MiGraphX repo that runs benchmarks and posts comments on every PR, even those that only update documentation, like ROCm/AMDMIGraphX#3832 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes sense! the stats are still tracked like before this line in iree-test-suite, but it would be helpful to display that information in the PR comments.
In terms of storing the data in a db, I was going to hold off on that until the dashboard work is productionalized, so we can get a UI along with the data
rocm-chip: gfx942 | ||
backend: rocm | ||
sku: mi300 | ||
runs-on: nodai-amdgpu-mi300-x86-64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI we may want to migrate workflows from these nodai-amdgpu-mi300-x86-64
runners to a new cluster of runners with the linux-mi300-gpu-1
label: nod-ai/shark-ai#793
That should be separate from this PR though.
There are some attribute changes upstream: - TOSA: llvm/llvm-project#128115 - NVPTX: llvm/llvm-project#127736 --------- Signed-off-by: yzhang93 <zhyuhang88@gmail.com> Signed-off-by: geomin12 <geomin12@amd.com>
The convention is [compiler target, runtime HAL driver/device]. This was reported [here on Discord](https://discord.com/channels/689900678990135345/1166024193599615006/1341505000303759401). ci-exactly: build_packages, test_sharktank Signed-off-by: geomin12 <geomin12@amd.com>
…#20115) Follow-up to iree-org#20112. Signed-off-by: geomin12 <geomin12@amd.com>
Rename changes followed: - llvm/llvm-project#128751 - llvm/llvm-project#128869 --------- Signed-off-by: yzhang93 <zhyuhang88@gmail.com> Signed-off-by: geomin12 <geomin12@amd.com>
…ed. (iree-org#20099) The revision teaches the SpecializeEncoding pass about the difference between dropping encodings and failing on resolving encodings. It introduces `IdentityEncodingAttr` resolver that always makes identity encodings. The implementation detail is using iree_encoding.pad_encoding_layout with zero elements padding config. If an encoding resolver (e.g., UnsupportedEncodingAttr) can not get the resolved layout, the pass signals a failure. With the change, the HAL default encoding resolver is switched from UnsupportedEncodingAttr to IdentityEncodingAttr. --------- Signed-off-by: hanhanW <hanhan0912@gmail.com> Signed-off-by: geomin12 <geomin12@amd.com>
Add a verifier for tile sizes. --------- Signed-off-by: erman-gurses <erman@nod-labs.com> Signed-off-by: geomin12 <geomin12@amd.com>
…ree-org#20114) With this change, I am able to now follow our ROCm instructions at https://iree.dev/guides/deployment-configurations/gpu-rocm/ on Windows, running MobileNet and getting the expected outputs 🎉 This does two things: 1. Only try to enable peering between devices if more than 1 device is going to be used 2. Check `hipDeviceCanAccessPeer` before calling `hipDeviceEnablePeerAccess` so there is a chance to log a better error message ### More information On Windows 11 with 2x w7900 and the 6.2 HIP SDK, the previous code returned `hipErrorInvalidDevice` when calling `hipDeviceEnablePeerAccess`. On the same system under Linux (unknown HIP/ROCm version but pretty similar), the code works without errors. In both cases, I only requested a single device, so this peering code did not need to run at all. See also [this discussion on Discord](https://discord.com/channels/689900678990135345/1282818085153407038/1344105197353570377). See documentation for these APIs at https://rocm.docs.amd.com/projects/HIP/en/latest/doxygen/html/group___peer_to_peer.html. ### Testing > [!WARNING] > I've only tested on the machines I have easy access to. I think we still need CI coverage for multi-device (e.g. iree-org#19995) Signed-off-by: geomin12 <geomin12@amd.com>
…ree-org#20126) The revision simplifies the encoding content with `encoded` because encoding can have complicated parameters. E.g., they can have arbitrary dictionary attribute, and naming the variable with such information is making readability worse. Fixes iree-org#20090 Signed-off-by: hanhanW <hanhan0912@gmail.com> Signed-off-by: geomin12 <geomin12@amd.com>
Make sure `rdna4@vulkan` is recognized as a Vulkan target. Handle wmma shapes not supported by the cooperative matrix extension. Signed-off-by: geomin12 <geomin12@amd.com>
…ree-org#20120) This PR flipped the switch for `allow-zero-slice` option, which means zero slice will be tolerated for igemm that requires a padding. The dynamic if condition check is unnecessary because we compute the K loop bound in a reasonable fashion and zero slice situation shouldn't happen. The motivation of the PR is performance benefit to remove an unnecessary conditional from padded gemm global loading. Signed-off-by: jerryyin <zhuoryin@amd.com> Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: Jakub Kuderski <jakub@nod-labs.com> Signed-off-by: geomin12 <geomin12@amd.com>
When running the ElideTimepoints pass on a large program (llama 8b for example) we exhaust our state, and fail to elide timepoints. This significantly reduces the number of timepoints that end up in the waiting set in those cases. The number of timepoint waits in the stream output are as follows. ``` 23765 << State Size 256 8801 << State Size 8192 ``` Signed-off-by: Andrew Woloszyn <awoloszy@amd.com> Signed-off-by: geomin12 <geomin12@amd.com>
No local patch is carried. --------- Signed-off-by: yzhang93 <zhyuhang88@gmail.com> Signed-off-by: geomin12 <geomin12@amd.com>
…ree-org#20125) The small constants can be inlined into dispatches, and they could become the source of `set_encoding` ops. The revision adds the support for hoisting both operations out if the source is a constant. --------- Signed-off-by: hanhanW <hanhan0912@gmail.com> Signed-off-by: geomin12 <geomin12@amd.com>
…ups (iree-org#20123) Adds cleanup patterns to run during `tileConsumerAndFuseProducersUsingSCF` in TileandDistributeToWorkgroupsUsingForall. The PR adds the `SwapExtractWithExpandPattern`, and some extra cleanup patterns for slices. This enables codegen for set_encoding ops on GPU, since the materialization of set_encoding contains an expand_shape that will block distribution tiling of the full dispatch. --------- Signed-off-by: Max Dawkins <max.dawkins@gmail.com> Signed-off-by: geomin12 <geomin12@amd.com>
…iree-org#20133) The previous test did not catch the case because it did not have an affinity. The revision adds the affinity to the test, which ensures it is tested. Signed-off-by: hanhanW <hanhan0912@gmail.com> Signed-off-by: geomin12 <geomin12@amd.com>
…ree-org#20131) Based on the announcement https://www.amd.com/en/newsroom/press-releases/2025-2-28-amd-unveils-next-generation-amd-rdna-4-architectu.html. Also sort the list of rocm and vulkan alphabetically, since any other order is hard to maintain by hand. Signed-off-by: Jakub Kuderski <jakub@nod-labs.com> Signed-off-by: geomin12 <geomin12@amd.com>
…ree-org#20142) When running the build action in the same thread we must still do a start-finish to get an initialized completed future. Signed-off-by: Boian Petkantchin <boian.petkantchin@amd.com> Signed-off-by: geomin12 <geomin12@amd.com>
This reflects improvements in the compiler for these models. Signed-off-by: MaheshRavishankar <mravisha@amd.com> Signed-off-by: geomin12 <geomin12@amd.com>
The type that IREE's code refered to as f8E4M3, which is an 8-bit float that has no infinities, 4 bits of exponent, and 3 bits of mantissa, is called f8E4M3FN in MLIR and LLVM's APFloat. MLIR has a separate f8E4M3 type that has different semantics. The fact that IREE would sometimes be using a different MLIR type from what we menat was causing crashes on RDNA4. Therefore, rename IREE's type to match Signed-off-by: geomin12 <geomin12@amd.com>
Bumps the github-actions group with 3 updates: [actions/download-artifact](https://github.com/actions/download-artifact), [actions/cache](https://github.com/actions/cache) and [dawidd6/action-download-artifact](https://github.com/dawidd6/action-download-artifact). Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: geomin12 <geomin12@amd.com>
…op. (iree-org#20148) The encoding content could be vague because the attribute could be descriptive. However, we realized that some encodings with different content could result in the same encoding layout. E.g., a matmul encoding can encode indexing maps for B/M/N/K dimension, while an encoding attribute only cares about `K` dimension. One of the examples is `IREE::GPU::GPUPadLayoutAttr`. The revision introduces encoding compatibility in SerializableEncodingAttrInterface and relax the verifier of `stream.tensor.clone` op. If any encoding is not serialized, they are compatible. If both encodings are serialized, it checks if they are identical or not. Note that it defers the real check after encoding specialization. If they end up with different layouts, a failure is detected. Signed-off-by: hanhanW <hanhan0912@gmail.com> Signed-off-by: geomin12 <geomin12@amd.com>
…#20117) I removed the duplicated code in the implementation of `CPUDeviceEncodingLayoutResolverAttrInterface`, `VMVXDeviceEncodingLayoutResolverAttrInterface`, and `GPUDeviceEncodingLayoutResolverAttrInterface`. I provide a base class `WrappedExternalModel` to provide `getConfiguration` and `getEncodingInfo`, so the derived class only needs to implement their own `getEncodingInfoImpl` method. Fixing iree-org#20050 --------- Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu> Signed-off-by: geomin12 <geomin12@amd.com>
Drops a test in LLVMGPU/test/rocdl_pipeline_test.mlir according to changes in iree-org#20152 --------- Signed-off-by: Max Dawkins <max.dawkins@gmail.com> Signed-off-by: geomin12 <geomin12@amd.com>
…iree-org#20152) With RDNA4, we now have cards that allow the OCP FP8 formats (f8E5M2 and f8E4M3FN). This means that the previous check that caused any module that used those formats to be rejected needs to be relaxed. However, the backend uses the same intrinsics for the gfx942-only FNUZ types and the OCP types. Therefore, the data type validity check needs to be aware of the chipset being targetted in order to validate that the dispatch can be compiled correctly. This patch implements this improved set of checks. In addition, it adds tests for these checks (sadly, we can't used expected-error here because the pipeline scrips debug info) and improves these checks to also look inside vector<> and memref<>. (This also means that there is now an IREE-side mechanism to reject compiling FP8 code pre-gfx942.) Signed-off-by: geomin12 <geomin12@amd.com>
…rg#20161) Fixes llama fp8 perf regression introduced by iree-org#20106. The PR stopped the linalg.generic from getting hoisted. This was causing a broadcast to get fused and `tensor<1x1x131072x131072xi1>` to be recomputed on each prefill call. --------- Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu> Signed-off-by: geomin12 <geomin12@amd.com>
…g#20143) This PR restricts the fusion of set_encoding ops to only dispatches with element-wise linalg ops. We don't have good codegen for fusions with reduction ops on GPU yet, so this will allow using the late materialization data tiling path on GPU while the codegen improves. --------- Signed-off-by: Max Dawkins <max.dawkins@gmail.com> Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
Signed-off-by: geomin12 <geomin12@amd.com>
742dcf8
to
16fb20c
Compare
Apologies everyone, this PR has been open for quite some time. Will get those fixes in but will close for now to prevent notifications and not bother everyone! |
Aw, the review history would have been nice to keep here. Looks like some git rebase issues got the branch into a bad state? Feel free to ask for help with git and pull request workflows. |
I am fixing it! Just feel bad for the notifications to folks |
Moving regression suite over to iree-test-suite and calling those regression tests via composite action