[AMD] fix bf16x2 dtype codegen #847

Paran0idy · 2025-09-18T06:52:12Z

fix bf16x2 dtype codegen

Summary by CodeRabbit

Bug Fixes
- Improved bfloat16 vector load handling on HIP, enhancing stability and correctness on AMD GPUs.
Refactor
- Harmonized bfloat16 pair representation to improve clarity and cross-backend consistency.
Tests
- Expanded GEMM MFMA tests to use a 4D shared-memory layout for C, increasing coverage of tiling and write-back paths.

coderabbitai · 2025-09-18T06:52:19Z

Walkthrough

Updates adjust HIP bfloat16 vector handling and modify a GEMM test’s shared-memory tiling. Specifically, the HIP codegen switches a load cast to bfloat16x2, the bfloat16x2 struct changes from an array to named fields, and the test expands C_shared to a 4D layout with corresponding index mapping.

Changes

Cohort / File(s)	Change summary
HIP bfloat16 vector handling `src/target/codegen_hip.cc`, `src/tl_templates/hip/common.h`	Codegen: changed bfloat16 load cast from `nv_bfloat162` to `bfloat16x2`. Struct: `bfloat16x2` storage updated from `bfloat16_t data[2]` to `bfloat16_t x, y`, altering field access semantics.
AMD GEMM test tiling update `testing/python/amd/test_tilelang_gemm_mfma_intrinsic.py`	Expanded `C_shared_shape` from 3D to 4D: `(block_M // micro_size_x, block_N // micro_size_y, micro_size_x, micro_size_y)`; adjusted indexing to match. Other shared shapes unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

LeiWang1999

Poem

I twitch my ears at tiles that grew,
From three to four, a tidy view.
Two bfloats dance as x and y,
In HIP they pair and swiftly fly.
I stamp my paw—tests still hum—
Bits aligned, the carrots come! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "[AMD] fix bf16x2 dtype codegen" is concise and accurately describes the main intent of the changeset—fixing bf16x2/bfloat16 code generation for the AMD/HIP target as evidenced by updates to codegen and the bfloat16x2 struct. It is specific, relevant to the modified files, and clear for a reviewer scanning history.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-09-18T06:52:23Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

gemini-code-assist

Summary of Changes

Hello @Paran0idy, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue with bfloat16x2 data type codegen for AMD's HIP backend. It corrects the type casting used in the codegen and refines the bfloat16x2 struct definition for better compatibility. Additionally, a related test case has been adjusted to ensure accurate shape calculations.

Highlights

HIP Codegen Correction: The codegen for bfloat16x2 data types on HIP targets has been fixed by replacing the NVIDIA-specific nv_bfloat162 type with the more generic bfloat16x2.
bfloat16x2 Struct Definition: The bfloat16x2 struct in common.h has been updated to define its components as x and y members instead of a data[2] array, aligning with common vector type conventions.
Test Shape Adjustment: A test for TileLang GEMM with MFMA intrinsics for AMD has been updated to correctly calculate the C_shared_shape by including block_M // micro_size_x.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request fixes bf16x2 dtype codegen for AMD targets. The changes involve updating the bfloat16x2 struct definition to use named members x and y and modifying the HIP codegen to use this new struct, replacing the NVIDIA-specific nv_bfloat162. A related fix is included for a test file.

The changes are correct and address the issue. I've added a high-severity comment in codegen_hip.cc to fix a latent bug in vector element loading for 2-lane vectors, which would cause compilation errors. I've also added a medium-severity comment in common.h suggesting to make other bfloat16xN structs consistent with the new bfloat16x2 definition for better maintainability.

gemini-code-assist · 2025-09-18T06:55:48Z

src/target/codegen_hip.cc

+    os << "((bfloat16x2*)(&(" << vec << "." << access[i / 2] << ")))->"
       << access[i % 2];


While this change from nv_bfloat162 to bfloat16x2 is correct for HIP, there's an underlying issue with vector element loading for 2-element vectors.

For a bfloat16 vector with 2 lanes, PrintType generates a scalar uint type for the vector variable vec. The current code then attempts to access a member (e.g., .x) of this scalar uint (vec << "." << access[i / 2]), which is incorrect and will cause compilation errors in the generated code.

The logic should handle 2-lane vectors (represented as a scalar uint) differently from wider vectors (e.g., uint2, uint4). For the 2-lane case, the address of the scalar vec should be cast directly, without member access.

A similar issue exists for float16 vectors on lines 480-481.

Suggested change

os << "((bfloat16x2*)(&(" << vec << "." << access[i / 2] << ")))->"

<< access[i % 2];

os << "((bfloat16x2*)(&(" << (t.lanes() == 2 ? vec : (vec + "." + access[i / 2])) << ")))->"

<< access[i % 2];

gemini-code-assist · 2025-09-18T06:55:48Z

src/tl_templates/hip/common.h


 struct bfloat16x2 {
-  bfloat16_t data[2];
+  bfloat16_t x, y;


This change to bfloat16x2 is correct to support member access like .x and .y. For consistency, you should consider updating bfloat16x4, bfloat16x8, and bfloat16x16 to also use named members (x, y, z, w, etc.) or nested structs instead of a data array. This would make the API for these vector-like structs more uniform and easier to use.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

testing/python/amd/test_tilelang_gemm_mfma_intrinsic.py (1)
214-214: Fix typo in matrix transpose operation.

There's a typo in the matrix transpose operation - A.Tto should be A.T.to.

Apply this diff to fix the typo:
-        ref_c = torch.matmul(A.Tto(torch.float32),
+        ref_c = torch.matmul(A.T.to(torch.float32),

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ebea77d and 944b0a1.

📒 Files selected for processing (3)

src/target/codegen_hip.cc (1 hunks)
src/tl_templates/hip/common.h (1 hunks)
testing/python/amd/test_tilelang_gemm_mfma_intrinsic.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: bot-task

🔇 Additional comments (3)

src/tl_templates/hip/common.h (1)

69-71: LGTM! Improved field access pattern for bfloat16x2.

The change from an array data[2] to named fields x, y aligns with HIP's standard bfloat16x2 conventions. Based on the AMD HIP documentation, this pattern is consistent with the __hip_bfloat162 structure which uses named x, y fields. This improves code readability and maintains compatibility with HIP's vectorized type access patterns.

testing/python/amd/test_tilelang_gemm_mfma_intrinsic.py (1)

58-63: LGTM! C_shared shape expanded to align with micro-tile access pattern.

The change from 3D to 4D shared memory layout creates a more structured tiling pattern that aligns with the micro-tile sizes. The new shape (block_M // micro_size_x, block_N // micro_size_y, micro_size_x, micro_size_y) enables efficient access patterns for the updated bfloat16 vectorization, where elements can be accessed using the pattern shown in lines 155-160.

This change is consistent with the bfloat16x2 improvements in the HIP codegen and header files, ensuring proper memory layout for vectorized operations.

src/target/codegen_hip.cc (1)

483-484: Approve — bfloat16x2 access matches updated struct.

src/tl_templates/hip/common.h defines struct bfloat16x2 { bfloat16_t x, y; } and src/target/codegen_hip.cc (lines 483–484) uses ((bfloat16x2*)(&(...)))->access[i%2], which correctly maps to x/y. Search found no remaining bfloat16x2.data[...] usages.

[AMD] fix bf16x2 dtype codegen

944b0a1

gemini-code-assist bot reviewed Sep 18, 2025

View reviewed changes

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

LeiWang1999 merged commit 6efeb74 into tile-ai:main Sep 18, 2025
5 of 6 checks passed

This was referenced Oct 24, 2025

[Language] Initial version of tilelang frontend v2 #1120

Merged

[BugFix] alloc_var init failed to handle complex expression #1144

Merged

This was referenced Nov 12, 2025

[Fix] Fix a type that make wrong T.macro backtrace #1234

Merged

[Language] Add type stubs for tir op #1239

Merged

This was referenced Nov 21, 2025

[Feat] Add missing support for uint32x2, add unsigned implicit cast in bitwise op, add T.Ref as macro annotation #1302

Closed

[Fix] Remove unused let_bindings_ in CodeGenC to fix #1300 #1305

Merged

[Fix] Fix frame scope error in T.macro #1308

Merged

RubiaCx pushed a commit to RubiaCx/tilelang that referenced this pull request Nov 24, 2025

[AMD] fix bf16x2 dtype codegen (tile-ai#847)

91245c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD] fix bf16x2 dtype codegen #847

[AMD] fix bf16x2 dtype codegen #847

Uh oh!

Paran0idy commented Sep 18, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 18, 2025

Uh oh!

gemini-code-assist bot Sep 18, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		os << "((bfloat16x2*)(&(" << vec << "." << access[i / 2] << ")))->"
		<< access[i % 2];

[AMD] fix bf16x2 dtype codegen #847

[AMD] fix bf16x2 dtype codegen #847

Uh oh!

Conversation

Paran0idy commented Sep 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Sep 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Paran0idy commented Sep 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 18, 2025 •

edited

Loading