Skip to content

Conversation

@Paran0idy
Copy link
Contributor

@Paran0idy Paran0idy commented Sep 28, 2025

  • fix plot layout issue.
  • support more dtypes.

Summary by CodeRabbit

  • New Features
    • Added per-matrix (A/B) handling and transposed support for MMA load layouts.
    • Improved example to explicitly plot the final block layout alongside intermediate layouts.
  • Bug Fixes
    • Completed 32-bit dtype layout mapping, removing a placeholder and ensuring correct SR transform selection.
    • Aligned forward/inverse indexing with updated layout logic.
  • Refactor
    • Split transformation logic by matrix and operator; reordered fragment dimensions to match SR/RS orientation.
    • Updated API to accept dtype, matrix (A/B), and transposed flags for base layout creation.
  • Documentation
    • Refreshed docstrings to reflect new parameters and behavior.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 28, 2025

Walkthrough

The example script now builds MMA load fragments using a matrix selector (A/B) and transposition flag, with per-matrix transform functions and reordered fragment dimensions. The macro generator completes dtype_bits==32 handling by mapping SR transforms for A and B. The example also switches to explicit plot_layout for the final block layout.

Changes

Cohort / File(s) Summary
MMA load layout example script
examples/plot_layout/fragment_mma_load_a.py
Renamed API to accept matrix Literal["A","B"] and transposed flag; split transform selection into A/B and SR paths, including swapped logic for transposed; computed micro sizes per matrix; reordered base fragment dims based on SR/RS and matrix; updated forward/inverse indexing; replaced no-op with explicit plot_layout; updated docstrings.
MMA macro generator
tilelang/intrinsics/mma_macro_generator.py
Filled dtype_bits==32 branch: set transform_func_sr_a/b to shared_16x8_to_mma_32x4_layout_sr_a/b; no other changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Script as fragment_mma_load_a.py
  participant Selector as Transform Selector
  participant Gen as mma_macro_generator.py
  participant Plot as plot_layout

  User->>Script: make_mma_load_base_layout(dtype, matrix=A|B, transposed)
  Script->>Selector: Determine micro sizes (x,y,k) per matrix
  Selector->>Gen: Request SR transform funcs for dtype_bits
  alt dtype_bits == 32 (newly completed)
    Gen-->>Selector: shared_16x8_to_mma_32x4_layout_sr_a/b
  else other bit-widths
    Gen-->>Selector: Existing SR transform funcs
  end
  alt transposed
    Selector-->>Script: Use swapped A/B or index-swapped lambda
  else non-transposed
    Selector-->>Script: Use matrix-specific transform_func_sr_a or sr_b
  end
  Script->>Script: Build base fragment with SR/RS dim order per matrix
  Script->>Script: Forward/Inverse index via selected transform
  Script->>Plot: plot_layout(final_block_layout)
  Plot-->>User: Rendered layouts
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A hop, a skip, through tensors I go,
Swapping A with B in a tidy flow.
Bits turn thirty-two—no gaps to fear,
SR maps align, the path is clear.
I plot my carrots in neat array—
Grid by grid, I’m ready to play! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title Check ❓ Inconclusive The title “[Layout] Fix plot layout” is partially related since it references the layout fix in the example plotting code, but it’s overly generic and doesn’t convey the specific scope of the change or mention the new dtype support introduced in the generator. Consider revising the title to clearly state the affected component and key updates, for example “Fix MMA load fragment plot layout and add 32-bit dtype support” to summarize both the layout fix and dtype enhancements.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

@Paran0idy Paran0idy changed the title [Layout] fix plot layout [Layout] Fix plot layout Sep 28, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
examples/plot_layout/fragment_mma_load_a.py (4)

59-62: Simplify axis-order predicate

Minor simplification for readability.

-is_sr_conditions = [False]
-is_sr_conditions.append(matrix == "A" and not transposed)
-is_sr_conditions.append(matrix == "B" and transposed)
-is_sr_axis_order = any(is_sr_conditions)
+is_sr_axis_order = ((matrix == "A" and not transposed) or
+                    (matrix == "B" and transposed))

39-39: Prefer explicit ValueError over assert; remove unreachable else

Use exceptions for input validation (asserts can be stripped with -O) and avoid duplicate checks.

-assert matrix in ["A", "B"], "matrix should be either A or B"
+if matrix not in ("A", "B"):
+    raise ValueError("matrix must be 'A' or 'B'")
@@
-else:
-    raise ValueError(f"Unsupported matrix {matrix}")

Note: also addresses Ruff TRY003 noise by keeping short, fixed messages.

Also applies to: 77-78


45-46: Tighten type hints for transform callables (optional)

Make None explicit to help tooling.

-from typing import Literal, Callable
+from typing import Literal, Callable, Optional
@@
-transform_func_sr_a: Callable = None
-transform_func_sr_b: Callable = None
+transform_func_sr_a: Optional[Callable[[int, int], tuple[int, int]]] = None
+transform_func_sr_b: Optional[Callable[[int, int], tuple[int, int]]] = None
@@
-transform_func: Callable = None
+transform_func: Optional[Callable[[int, int], tuple[int, int]]] = None

Also applies to: 68-69


12-15: Docstring: clarify “load” vs “store” and params

Reads like a store path; this builds a load layout into fragments.

-    Create a layout function for storing MMA results into a fragment buffer.
-    This layout is used in conjunction with `inverse_mma_store_layout` to
-    map fragment indices to threads and local indices.
+    Create the layout for loading MMA operands from shared memory into a fragment buffer.
+    Pairs with the corresponding store layout to map fragment indices to threads and local indices.
@@
-    matrix : Literal["A", "B"]
-        The mma operand to be loaded.
+    matrix : Literal["A", "B"]
+        Which MMA operand layout to build.
@@
-        Describes how threads and indices in fragment are laid out.
+        Describes how fragment indices map to threads and local IDs.

Also applies to: 20-23, 28-28

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 599264c and d84b070.

📒 Files selected for processing (2)
  • examples/plot_layout/fragment_mma_load_a.py (3 hunks)
  • tilelang/intrinsics/mma_macro_generator.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tilelang/intrinsics/mma_macro_generator.py
🧰 Additional context used
🧬 Code graph analysis (1)
examples/plot_layout/fragment_mma_load_a.py (2)
tilelang/intrinsics/utils.py (1)
  • get_mma_micro_size (89-109)
tilelang/tools/plot_layout.py (1)
  • plot_layout (4-207)
🪛 Ruff (0.13.1)
examples/plot_layout/fragment_mma_load_a.py

57-57: Avoid specifying long messages outside the exception class

(TRY003)


78-78: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: format-check
  • GitHub Check: bot-task
  • GitHub Check: format-check
🔇 Additional comments (4)
examples/plot_layout/fragment_mma_load_a.py (4)

97-100: LGTM: Dimension order matches axis-order selection

Choosing [s, r] vs [r, s] in sync with transform index swap is correct.


125-125: LGTM: Final block layout is now plotted

Adds the missing plot for the block layout; helpful for debugging.


40-57: Remove FP8 dtype normalization shim suggestion
TVM’s Python DataType accepts "float8_e4m3" and "float8_e5m2", so the existing dtype_bits = DataType(dtype).bits handles these correctly.


47-56: LGTM: dtype→transform mapping for 32/16/8 verified
The sr_a/sr_b assignments align with the alias definitions in tilelang/intrinsics/mma_layout.py.

Comment on lines +73 to +76
elif matrix == "B":
transform_func = transform_func_sr_b if is_sr_axis_order else lambda i, j: transform_func_sr_b(
j, i)
micro_size_s, micro_size_r = micro_size_k, micro_size_y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix B micro-size mapping (s vs r) — currently reversed

For matrix "B", spatial should be y and reduction should be k. Using s=k and r=y breaks 8‑bit shapes (expects 16x32 sr), and can misalign with shared_16x32_to_mma_32x16_layout_sr_b. Apply:

 elif matrix == "B":
   transform_func = transform_func_sr_b if is_sr_axis_order else lambda i, j: transform_func_sr_b(
       j, i)
-  micro_size_s, micro_size_r = micro_size_k, micro_size_y
+  micro_size_s, micro_size_r = micro_size_y, micro_size_k

Based on relevant code snippet (get_mma_micro_size defines (x, y, k) and B’s s=y, r=k).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
elif matrix == "B":
transform_func = transform_func_sr_b if is_sr_axis_order else lambda i, j: transform_func_sr_b(
j, i)
micro_size_s, micro_size_r = micro_size_k, micro_size_y
elif matrix == "B":
transform_func = transform_func_sr_b if is_sr_axis_order else lambda i, j: transform_func_sr_b(
j, i)
micro_size_s, micro_size_r = micro_size_y, micro_size_k
🤖 Prompt for AI Agents
In examples/plot_layout/fragment_mma_load_a.py around lines 73 to 76, the
micro-size mapping for matrix "B" is reversed: it currently assigns micro_size_s
= micro_size_k and micro_size_r = micro_size_y, but B’s spatial should be y and
reduction should be k. Change the assignment to micro_size_s, micro_size_r =
micro_size_y, micro_size_k so s maps to y and r maps to k (keep the
transform_func logic as-is).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants