-
Notifications
You must be signed in to change notification settings - Fork 333
[Bugfix] Fix CopyNode Lower method to include disable_tma flag in GetCopyInst #888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughUpdates propagate a per-copy disable flag into copy-instruction selection by changing Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Caller
participant CopyNode
participant GetCopyInst
Caller->>CopyNode: Lower(copy_op, ...)
Note over CopyNode: evaluate flags\n`disable_tma_lower`, `disable_tma`
alt Either flag true\n(disable_tma_lower || disable_tma)
CopyNode->>GetCopyInst: request copy inst (disable=true)
Note right of GetCopyInst #D6EAF8: selects non‑TMA / SIMT copy
else Both flags false
CopyNode->>GetCopyInst: request copy inst (disable=false)
Note right of GetCopyInst #FDEBD0: selects TMA bulk load/store
end
GetCopyInst-->>CopyNode: chosen copy instruction
CopyNode-->>Caller: lowered copy operation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
…opy and allow TMA for other operations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/op/copy.cc(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-test-amd
🔇 Additional comments (1)
src/op/copy.cc (1)
695-696: Good: Lower now respects the per-copy disable flag.Propagating
disable_tmaintoGetCopyInstalignsLowerwithInferLayout. ✔️
| auto copy_inst = GetCopyInst(target, disable_tma_lower || disable_tma, | ||
| T.layout_map, analyzer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Also forward buffer_oob (and make bool explicit) to avoid 1D TMA on OOB paths.
InferLayout passes T.buffer_oob to GetCopyInst, but Lower does not. This can select 1D TMA in lowering where layout inference avoided it, risking OOB behavior. If LowerArgs exposes buffer_oob, pass it through and make the bool cast explicit.
Apply if available:
- auto copy_inst = GetCopyInst(target, disable_tma_lower || disable_tma,
- T.layout_map, analyzer);
+ const bool tma_disabled = disable_tma_lower || static_cast<bool>(disable_tma);
+ auto copy_inst = GetCopyInst(target, tma_disabled,
+ T.layout_map, analyzer, T.buffer_oob);To verify availability and other call sites:
🏁 Script executed:
#!/bin/bash
# 1) Check if LowerArgs has buffer_oob
rg -nP 'struct\s+LowerArgs|class\s+LowerArgs|LowerArgs\s*\{' -C3
# 2) List all GetCopyInst call sites and their arg counts
rg -nP 'GetCopyInst\s*\(' -C2
# 3) Show where buffer_oob is used today
rg -nP '\bbuffer_oob\b' -C3Length of output: 5793
🏁 Script executed:
#!/bin/bash
# List files under src/op to locate copy.cc
ls src/opLength of output: 276
🏁 Script executed:
#!/bin/bash
# Show LowerArgs definition in src/op/operator.h
rg -nP 'struct\s+LowerArgs\s*\{' -C5 src/op/operator.hLength of output: 217
🏁 Script executed:
#!/bin/bash
# Show full LowerArgs definition in operator.h
rg -nP 'struct\s+LowerArgs' -C20 src/op/operator.h
# Show GetCopyInst declaration in copy.h
sed -n '180,220p' src/op/copy.h
# Show context around the 4-arg GetCopyInst call in copy.cc
sed -n '670,720p' src/op/copy.ccLength of output: 4269
🏁 Script executed:
sed -n '350,450p' src/op/copy.ccLength of output: 4653
🏁 Script executed:
#!/bin/bash
# Locate occurrences of 'disable_tma' in layout inference to confirm variable origin
rg -nP 'disable_tma' -C3 src/transform/layout_inference.ccLength of output: 0
Extend LowerArgs with buffer_oob and forward it to GetCopyInst
Add a bool buffer_oob = false field to struct LowerArgs in src/op/operator.h and update every LowerArgs{…} constructor (e.g. in src/transform/lower_tile_op.cc) to pass the new flag. In CopyNode::Lower (src/op/copy.cc:695–696), replace the 4-arg call with:
- auto copy_inst = GetCopyInst(target, disable_tma_lower || disable_tma,
- T.layout_map, analyzer);
+ bool tma_disabled = disable_tma_lower || disable_tma;
+ auto copy_inst = GetCopyInst(target, tma_disabled,
+ T.layout_map, analyzer, T.buffer_oob);This aligns lowering with InferLayout and prevents 1D TMA on out-of-bounds paths.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| auto copy_inst = GetCopyInst(target, disable_tma_lower || disable_tma, | |
| T.layout_map, analyzer); | |
| // Disable TMA if either global or per-node flag is set. | |
| bool tma_disabled = disable_tma_lower || disable_tma; | |
| // Now forward the new buffer_oob flag as well. | |
| auto copy_inst = GetCopyInst(target, | |
| tma_disabled, | |
| T.layout_map, | |
| analyzer, | |
| T.buffer_oob); |
🤖 Prompt for AI Agents
In src/op/copy.cc around lines 695-696, the call to GetCopyInst must receive the
new LowerArgs.buffer_oob flag; add a bool buffer_oob = false field to struct
LowerArgs in src/op/operator.h, update every LowerArgs aggregate/constructor
use-sites (e.g. src/transform/lower_tile_op.cc and any LowerArgs{...}
initializers) to supply the new boolean (default false where appropriate), and
forward that flag into the GetCopyInst call from CopyNode::Lower so the call
uses LowerArgs.buffer_oob; ensure all compilation units including operator.h are
rebuilt/updated to match the new struct layout.
…CopyInst (tile-ai#888) * Fix CopyNode Lower method to include disable_tma flag in GetCopyInst call * Refactor flash attention implementation to disable TMA for specific copy and allow TMA for other operations * attempt to fix lint
This pull request makes a targeted change to the
CopyNode::Lowerfunction insrc/op/copy.ccto enhance the logic for selecting the copy instruction. The update ensures that thedisable_tmaflag is now considered in addition to the configuration option when determining whether to disable TMA lowering.Key change:
GetCopyInstby combining thedisable_tma_lowerconfiguration and thedisable_tmaflag, ensuring TMA lowering is properly disabled when either is set.Summary by CodeRabbit
Bug Fixes
Performance
Examples