Skip to content

Conversation

@Rachmanino
Copy link
Collaborator

@Rachmanino Rachmanino commented Sep 28, 2025

This pull request makes a targeted change to the CopyNode::Lower function in src/op/copy.cc to enhance the logic for selecting the copy instruction. The update ensures that the disable_tma flag is now considered in addition to the configuration option when determining whether to disable TMA lowering.

Key change:

  • Improved the call to GetCopyInst by combining the disable_tma_lower configuration and the disable_tma flag, ensuring TMA lowering is properly disabled when either is set.

Summary by CodeRabbit

  • Bug Fixes

    • Per-operation and global disable flags now consistently prevent bulk-transfer fast-paths, fixing incorrect path selection and improving correctness.
  • Performance

    • More predictable behavior when bulk transfers are disabled; slight performance variations where fast-paths are intentionally bypassed.
  • Examples

    • Updated examples to use default lowering and enable per-copy SIMT fallbacks with clarifying comments.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 28, 2025

Walkthrough

Updates propagate a per-copy disable flag into copy-instruction selection by changing CopyNode::Lower to use disable_tma_lower || disable_tma. Separately, an example (examples/flash_decoding/...) removes an explicit TL_DISABLE_TMA_LOWER setting and adds disable_tma=True on several T.copy calls with explanatory comments.

Changes

Cohort / File(s) Summary
Copy lowering logic
src/op/copy.cc
CopyNode::Lower now passes `disable_tma_lower
Flash decoding example edits
examples/flash_decoding/example_mha_inference.py
Removed explicit TL_DISABLE_TMA_LOWER decorator option; added disable_tma=True on multiple T.copy calls and inserted comments explaining SIMT/TMA rationale and barrier considerations. Control flow of algorithm unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant CopyNode
  participant GetCopyInst

  Caller->>CopyNode: Lower(copy_op, ...)
  Note over CopyNode: evaluate flags\n`disable_tma_lower`, `disable_tma`
  alt Either flag true\n(disable_tma_lower || disable_tma)
    CopyNode->>GetCopyInst: request copy inst (disable=true)
    Note right of GetCopyInst #D6EAF8: selects non‑TMA / SIMT copy
  else Both flags false
    CopyNode->>GetCopyInst: request copy inst (disable=false)
    Note right of GetCopyInst #FDEBD0: selects TMA bulk load/store
  end
  GetCopyInst-->>CopyNode: chosen copy instruction
  CopyNode-->>Caller: lowered copy operation
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I twitch my whiskers at an OR,
Two tiny flags now roam the floor.
If either whispers "pause the stream,"
I hop a lane — a gentler scheme.
Small change, new path, I skip and cheer. 🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly identifies the main change—modifying the CopyNode::Lower method to include the disable_tma flag in the GetCopyInst call—and appropriately labels it as a bugfix, making it clear and specific enough to understand the primary update at a glance.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

@Rachmanino
Copy link
Collaborator Author

cc @LeiWang1999 @xwhzz

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2850eb9 and d4eadf0.

📒 Files selected for processing (1)
  • src/op/copy.cc (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test-amd
🔇 Additional comments (1)
src/op/copy.cc (1)

695-696: Good: Lower now respects the per-copy disable flag.

Propagating disable_tma into GetCopyInst aligns Lower with InferLayout. ✔️

Comment on lines +695 to +696
auto copy_inst = GetCopyInst(target, disable_tma_lower || disable_tma,
T.layout_map, analyzer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Also forward buffer_oob (and make bool explicit) to avoid 1D TMA on OOB paths.

InferLayout passes T.buffer_oob to GetCopyInst, but Lower does not. This can select 1D TMA in lowering where layout inference avoided it, risking OOB behavior. If LowerArgs exposes buffer_oob, pass it through and make the bool cast explicit.

Apply if available:

-  auto copy_inst = GetCopyInst(target, disable_tma_lower || disable_tma,
-                               T.layout_map, analyzer);
+  const bool tma_disabled = disable_tma_lower || static_cast<bool>(disable_tma);
+  auto copy_inst = GetCopyInst(target, tma_disabled,
+                               T.layout_map, analyzer, T.buffer_oob);

To verify availability and other call sites:


🏁 Script executed:

#!/bin/bash
# 1) Check if LowerArgs has buffer_oob
rg -nP 'struct\s+LowerArgs|class\s+LowerArgs|LowerArgs\s*\{' -C3

# 2) List all GetCopyInst call sites and their arg counts
rg -nP 'GetCopyInst\s*\(' -C2

# 3) Show where buffer_oob is used today
rg -nP '\bbuffer_oob\b' -C3

Length of output: 5793


🏁 Script executed:

#!/bin/bash
# List files under src/op to locate copy.cc
ls src/op

Length of output: 276


🏁 Script executed:

#!/bin/bash
# Show LowerArgs definition in src/op/operator.h
rg -nP 'struct\s+LowerArgs\s*\{' -C5 src/op/operator.h

Length of output: 217


🏁 Script executed:

#!/bin/bash
# Show full LowerArgs definition in operator.h
rg -nP 'struct\s+LowerArgs' -C20 src/op/operator.h

# Show GetCopyInst declaration in copy.h
sed -n '180,220p' src/op/copy.h

# Show context around the 4-arg GetCopyInst call in copy.cc
sed -n '670,720p' src/op/copy.cc

Length of output: 4269


🏁 Script executed:

sed -n '350,450p' src/op/copy.cc

Length of output: 4653


🏁 Script executed:

#!/bin/bash
# Locate occurrences of 'disable_tma' in layout inference to confirm variable origin
rg -nP 'disable_tma' -C3 src/transform/layout_inference.cc

Length of output: 0


Extend LowerArgs with buffer_oob and forward it to GetCopyInst
Add a bool buffer_oob = false field to struct LowerArgs in src/op/operator.h and update every LowerArgs{…} constructor (e.g. in src/transform/lower_tile_op.cc) to pass the new flag. In CopyNode::Lower (src/op/copy.cc:695–696), replace the 4-arg call with:

- auto copy_inst = GetCopyInst(target, disable_tma_lower || disable_tma,
-                              T.layout_map, analyzer);
+ bool tma_disabled = disable_tma_lower || disable_tma;
+ auto copy_inst = GetCopyInst(target, tma_disabled,
+                              T.layout_map, analyzer, T.buffer_oob);

This aligns lowering with InferLayout and prevents 1D TMA on out-of-bounds paths.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto copy_inst = GetCopyInst(target, disable_tma_lower || disable_tma,
T.layout_map, analyzer);
// Disable TMA if either global or per-node flag is set.
bool tma_disabled = disable_tma_lower || disable_tma;
// Now forward the new buffer_oob flag as well.
auto copy_inst = GetCopyInst(target,
tma_disabled,
T.layout_map,
analyzer,
T.buffer_oob);
🤖 Prompt for AI Agents
In src/op/copy.cc around lines 695-696, the call to GetCopyInst must receive the
new LowerArgs.buffer_oob flag; add a bool buffer_oob = false field to struct
LowerArgs in src/op/operator.h, update every LowerArgs aggregate/constructor
use-sites (e.g. src/transform/lower_tile_op.cc and any LowerArgs{...}
initializers) to supply the new boolean (default false where appropriate), and
forward that flag into the GetCopyInst call from CopyNode::Lower so the call
uses LowerArgs.buffer_oob; ensure all compilation units including operator.h are
rebuilt/updated to match the new struct layout.

@LeiWang1999 LeiWang1999 merged commit 599264c into tile-ai:main Sep 28, 2025
5 of 6 checks passed
RubiaCx pushed a commit to RubiaCx/tilelang that referenced this pull request Nov 24, 2025
…CopyInst (tile-ai#888)

* Fix CopyNode Lower method to include disable_tma flag in GetCopyInst call

* Refactor flash attention implementation to disable TMA for specific copy and allow TMA for other operations

* attempt to fix lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants