[AMD] enable amd ci test & fix bug & fix dockerfile #1244

Paran0idy · 2025-11-12T18:14:12Z

Summary by CodeRabbit

New Features
- Added support for float8 data type in GEMM operations.
Tests
- Expanded ROCm test coverage with broader test discovery.
- Added comprehensive test cases for float8 operations.
Chores
- Updated Docker build process for CUDA and ROCm installations.

github-actions · 2025-11-12T18:14:23Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2025-11-12T18:14:24Z

Walkthrough

This PR consolidates Docker installation methods across multiple CUDA versions and ROCm by replacing shell script invocations with Python package installs using environment variables, broadens ROCm test discovery scope, adds float8_e4m3fnuz dtype support to AMD gemm tests, and refactors MFMA intrinsic emitter constructors to centralize base class initialization.

Changes

Cohort / File(s)	Summary
CI/Workflow Configuration `.github/workflows/ci.yml`	Replaces specific ROCm test file invocation with directory path to enable broader test discovery across the AMD test suite
Docker Containerization – CUDA `docker/Dockerfile.cu118`, `docker/Dockerfile.cu120`, `docker/Dockerfile.cu121`, `docker/Dockerfile.cu123`, `docker/Dockerfile.cu124`, `docker/Dockerfile.cu125`, `docker/Dockerfile.cu126`, `docker/Dockerfile.cu128`	Uniformly replaces `./install_cuda.sh` shell script execution with editable Python package install via `USE_CUDA=1 pip install -e . -v`, switching from script-based to pip-based CUDA setup across all CUDA version variants
Docker Containerization – ROCm `docker/Dockerfile.rocm`	Replaces `./install_rocm.sh` with `USE_ROCM=1 pip install -e . -v` to align ROCm installation with CUDA variants using environment variable-driven pip install
AMD Test Infrastructure `testing/python/amd/test_tilelang_gemm_mfma_intrinsic.py`, `testing/python/amd/test_tilelang_gemm_mfma_preshuffle.py`	Removes runtime dtype assertions, adds float8_e4m3fnuz dtype handling via torch casting from float16, updates preshuffle tests with K dimension changes and new test cases for float8 dtype combinations
MFMA Intrinsic Emitter Refactoring `tilelang/intrinsics/mfma_macro_generator.py`	Refactors `MatrixCoreIntrinEmitter` and `MatrixCorePreshuffleIntrinEmitter` constructors by introducing `thread_var` parameter, centralizing base class initialization via single `super().__init__` call, and eliminating duplicate attribute initialization; removes debugging print statement

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Docker file changes: Repetitive pattern across 9 Dockerfiles (low individual complexity, but broad surface area requiring consistency verification)
MFMA constructor refactoring: Requires careful review of inheritance chain, initialization order, and parameter propagation through subclasses
Test dtype extensions: Straightforward additions with clear patterns but need verification that float8 casting and test parameterization are correct

Areas requiring extra attention:

Verify thread_var parameter flow through the inheritance hierarchy in mfma_macro_generator.py and confirm that base class initialization correctly handles all previously explicit attributes
Confirm that the removal of dtype assertions in test files does not bypass necessary validation elsewhere
Validate that float8_e4m3fnuz test cases provide adequate coverage and that torch dtype casting approach is consistent with project patterns

Possibly related PRs

tile-ai/tilelang#1136: Introduces or propagates thread_var parameter through MFMA emitter constructors and related methods
tile-ai/tilelang#966: Adds HIP FP8 type support and float8 test cases, complementing the float8_e4m3fnuz dtype additions in this PR
tile-ai/tilelang#1208: Modifies Docker installation steps, likely coordinating container build strategy changes

Suggested reviewers

LeiWang1999
oraluben

Poem

🐰 With Dockerfiles aligned and tests now bright,
Float8 futures casting rays of light,
MFMA threads through constructors flow,
Inheritance neat from base below,
ROCm's scope expands—what a sight! 🎯

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title covers multiple distinct changes: enabling AMD CI tests, fixing bugs, and fixing Dockerfiles, but does not clearly prioritize or specify the primary change.	Clarify the PR title to highlight the most important change. Consider a more specific title like 'Enable AMD ROCm CI tests and update TileLang Docker builds' or focus on the primary objective.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 468b1b7 and a239de7.

📒 Files selected for processing (13)

.github/workflows/ci.yml (1 hunks)
docker/Dockerfile.cu118 (1 hunks)
docker/Dockerfile.cu120 (1 hunks)
docker/Dockerfile.cu121 (1 hunks)
docker/Dockerfile.cu123 (1 hunks)
docker/Dockerfile.cu124 (1 hunks)
docker/Dockerfile.cu125 (1 hunks)
docker/Dockerfile.cu126 (1 hunks)
docker/Dockerfile.cu128 (1 hunks)
docker/Dockerfile.rocm (1 hunks)
testing/python/amd/test_tilelang_gemm_mfma_intrinsic.py (1 hunks)
testing/python/amd/test_tilelang_gemm_mfma_preshuffle.py (3 hunks)
tilelang/intrinsics/mfma_macro_generator.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

testing/python/amd/test_tilelang_gemm_mfma_preshuffle.py (1)

testing/python/amd/test_tilelang_gemm_mfma_intrinsic.py (1)

assert_tl_matmul_correctness (163-220)

tilelang/intrinsics/mfma_macro_generator.py (3)

tilelang/tileop/gemm/gemm_base.py (3)

accum_dtype (59-60)

chunk (63-64)

k_pack (111-112)

tilelang/primitives/gemm/gemm_mma.py (1)

accum_dtype (252-259)

tilelang/tileop/gemm/__init__.py (1)

k_pack (129-130)

testing/python/amd/test_tilelang_gemm_mfma_intrinsic.py (1)

tilelang/primitives/gemm/gemm_mma.py (1)

in_dtype (235-249)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Quick Lint

🔇 Additional comments (2)

.github/workflows/ci.yml (1)

372-382: <!-- [scratchpad_start]
[task final_analysis]
Understand requires_rocm from TVM and assess risk.

[observations]

requires_rocm is imported from tvm.testing.utils, not custom implementation

TVM's requires_rocm is a standard pytest skip decorator

It skips tests if ROCm support is NOT available

All three test files decorated with @tilelang.testing.requires_rocm

Two test files have hardcoded device="cuda"

[analysis]
RESOLVED: The hardcoded device="cuda" is actually CORRECT for ROCm CI:

PyTorch compiled with ROCm support (which ROCm CI runners have) maps "cuda" device to HIP backend

requires_rocm decorator ensures tests only run when ROCm is available

If ROCm unavailable → tests skipped (safe)

If ROCm available → device="cuda" routes to HIP (works correctly)

[conclusion]
Original concern about device="cuda" incompatibility is INVALID.
The tests are properly gated and device handling is correct for ROCm environment.
No additional issues found in test files.

[status COMPLETE]
Review comment is INCORRECT - it assumes a problem that doesn't exist.

Perfect! I now have all the information needed. The requires_rocm decorator is from TVM's testing utilities and properly gates test execution. Let me verify there are no other issues by checking the complete test scope:Based on my thorough investigation, I can now provide the definitive rewritten review comment. The evidence shows that:

The @tvm.testing.requires_rocm decorator skips tests if ROCm is not available or does not have appropriate hardware

All three AMD test files are properly decorated with @tilelang.testing.requires_rocm

The hardcoded device="cuda" is correct because PyTorch compiled with ROCm support automatically routes CUDA device allocation to the HIP backend

The broader test scope from a single file to the entire ./python/amd directory is intentional and all tests within it are properly gated

No issues found. Tests are properly ROCm-compatible and gated.

All test functions in ./testing/python/amd/ have the @tilelang.testing.requires_rocm decorator, which skips tests if ROCm is unavailable. The hardcoded device="cuda" calls are correct—PyTorch built with ROCm support automatically routes CUDA device operations to the HIP backend. The change from a single test file to the entire ./python/amd directory is intentional and safe since all tests within it are properly decorated.

docker/Dockerfile.cu126 (1)

25-26: No changes exist for docker/Dockerfile.cu126 in this PR.

The git diff shows that docker/Dockerfile.cu126 is unchanged between HEAD and the current branch. The file already contains USE_CUDA=1 pip install -e . -v in both versions. The review comment references a change from ./install_cuda.sh, but this file does not appear to have been modified in this PR, nor was install_cuda.sh the previous state of this Dockerfile (the file was already using USE_CUDA=1 pip install -e . -v).

While the underlying concern about environment variable support is technically valid—TileLang's CMakeLists.txt does properly read the $ENV{USE_CUDA} flag at lines 167–174, and scikit-build-core correctly passes environment variables to CMake—there is no actual code change in this file to verify.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-11-12T18:24:25Z

tilelang/intrinsics/mfma_macro_generator.py

+        super().__init__(
+            a_dtype=a_dtype,
+            b_dtype=b_dtype,
+            accum_dtype=accum_dtype,
+            a_transposed=a_transposed,
+            b_transposed=b_transposed,
+            block_row_warps=block_row_warps,
+            block_col_warps=block_col_warps,
+            warp_row_tiles=warp_row_tiles,
+            warp_col_tiles=warp_col_tiles,
+            chunk=chunk,
+            reduce_k=reduce_k,
+            num_elems_per_byte=num_elems_per_byte,
+            k_pack=k_pack,
+            is_m_first=is_m_first,
+            thread_var=thread_var,
+        )


⚠️ Potential issue | 🟠 Major

Ensure preshuffle emitter honors custom thread binding

The new thread_var argument is exposed here, but ldmatrix_a/ldmatrix_b still fetch the binding from T.KernelLaunchFrame.Current(). If a caller now supplies thread_var, those methods ignore it and still assert on an active kernel frame, so the preshuffle path breaks for the newly supported use case. Please route both loaders through self.get_thread_binding() like the base emitter.

Apply this diff:

@@ def ldmatrix_a(...): - current_frame = T.KernelLaunchFrame.Current() - thread_binding = current_frame.get_thread_binding() + thread_binding = self.get_thread_binding() @@ def ldmatrix_b(...): - current_frame = T.KernelLaunchFrame.Current() - thread_binding = current_frame.get_thread_binding() + thread_binding = self.get_thread_binding()

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In tilelang/intrinsics/mfma_macro_generator.py around lines 681 to 697, the constructor now accepts thread_var but the preshuffle loaders ldmatrix_a/ldmatrix_b still read thread binding from T.KernelLaunchFrame.Current(); update those loader calls so they use self.get_thread_binding() (which respects the supplied thread_var) instead of directly querying KernelLaunchFrame.Current(), and remove or replace the assertion that an active kernel frame is required so the preshuffle path works with an externally provided thread_var.

[AMD] enable amd ci test & fix bug & fix dockerfile

a239de7

coderabbitai bot reviewed Nov 12, 2025

View reviewed changes

LeiWang1999 merged commit b10d49b into tile-ai:main Nov 13, 2025
7 checks passed

This was referenced Nov 21, 2025

[Feat] Add missing support for uint32x2, add unsigned implicit cast in bitwise op, add T.Ref as macro annotation #1302

Closed

[Fix] Remove unused let_bindings_ in CodeGenC to fix #1300 #1305

Merged

[Fix] Fix frame scope error in T.macro #1308

Merged

RubiaCx pushed a commit to RubiaCx/tilelang that referenced this pull request Nov 24, 2025

[AMD] enable amd ci test & fix bug & fix dockerfile (tile-ai#1244)

eeb633e

coderabbitai bot mentioned this pull request Nov 26, 2025

[Enhancement] Add support for k_pack in gemm_mfma #1344

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD] enable amd ci test & fix bug & fix dockerfile #1244

[AMD] enable amd ci test & fix bug & fix dockerfile #1244

Uh oh!

Paran0idy commented Nov 12, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

coderabbitai bot commented Nov 12, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[AMD] enable amd ci test & fix bug & fix dockerfile #1244

[AMD] enable amd ci test & fix bug & fix dockerfile #1244

Uh oh!

Conversation

Paran0idy commented Nov 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

coderabbitai bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Paran0idy commented Nov 12, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 12, 2025 •

edited

Loading