-
Notifications
You must be signed in to change notification settings - Fork 332
[Language] Refactor reduce and support shared memory as its in/out #1219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ion and remove unused shared memory reduction logic * Changed Max computation for AbsMax type to use absolute values of lhs and rhs. * Removed unused shared memory reduction logic and related checks for buffer dimensions and thread extents, simplifying the Lower method. * Added a fatal log for unsupported buffer scope reductions.
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
|
Caution Review failedThe pull request is closed. WalkthroughAbsMax reduction now computes Max(abs(lhs), abs(rhs)). The shared-memory reduction lowering block in Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant reduce_macro_py as reduce_macro (py)
participant FragmentBuf as fragment_buf
participant Intrinsic as intrinsic_reduce
participant MakeReduceC as MakeReduce (C++)
Caller->>reduce_macro_py: reduce(input_buf, output_buf, op)
alt supported scope combos (fragment / shared→fragment / fragment→shared)
reduce_macro_py->>FragmentBuf: alloc/rename/copy as needed
reduce_macro_py->>Intrinsic: call reduce on fragment
Intrinsic->>MakeReduceC: invoke backend lowering
MakeReduceC-->>Intrinsic: fragment-local lowering path
reduce_macro_py->>output_buf: copy result back if required
reduce_macro_py-->>Caller: return result
else unsupported (shared/shared.dyn)
reduce_macro_py-->>Caller: raise ValueError or MakeReduceC will fatal
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tilelang/language/reduce.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/language/reduce.py (4)
tilelang/language/copy.py (1)
copy(11-87)tilelang/language/tir/entry.py (1)
macro(66-117)tilelang/language/allocate.py (1)
alloc_fragment(59-70)tilelang/utils/language.py (2)
is_shared(25-39)is_fragment(81-91)
🪛 Ruff (0.14.3)
tilelang/language/reduce.py
100-100: Avoid specifying long messages outside the exception class
(TRY003)
| copy(buffer, red_frag_in) | ||
| tir.call_intrin( | ||
| "handle", | ||
| tir.op.Op.get("tl.reduce"), | ||
| red_frag_in.access_ptr("r"), | ||
| red_frag_out.access_ptr("w"), | ||
| reduce_type, | ||
| dim, | ||
| clear, | ||
| ) | ||
| copy(red_frag_out, out) | ||
| elif is_shared(buffer) and is_fragment(out): | ||
| red_frag_in = alloc_fragment(buffer.shape, buffer.dtype) | ||
| IRBuilder.name(buffer.name + "_frag", red_frag_in) | ||
|
|
||
| copy(buffer, red_frag_in) | ||
| tir.call_intrin( | ||
| "handle", | ||
| tir.op.Op.get("tl.reduce"), | ||
| red_frag_in.access_ptr("r"), | ||
| out.access_ptr("w"), | ||
| reduce_type, | ||
| dim, | ||
| clear, | ||
| ) | ||
| elif is_fragment(buffer) and is_shared(out): | ||
| red_frag_out = alloc_fragment(out.shape, out.dtype) | ||
| IRBuilder.name(out.name + "_frag", red_frag_out) | ||
|
|
||
| tir.call_intrin( | ||
| "handle", | ||
| tir.op.Op.get("tl.reduce"), | ||
| buffer.access_ptr("r"), | ||
| red_frag_out.access_ptr("w"), | ||
| reduce_type, | ||
| dim, | ||
| clear, | ||
| ) | ||
| copy(red_frag_out, out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve destination contents when clear=False.
When out lives in shared memory, we stage it through red_frag_out, but we never preload its existing values. For reduce_* calls with clear=False, that staging buffer must start from the prior accumulator; otherwise we zero out (or leave undefined) red_frag_out, call tl.reduce, and lose the intended accumulation semantics. This regresses any caller relying on clear=False.
Please seed red_frag_out from out before the intrinsic whenever clear is false, in both shared→shared and fragment→shared branches, e.g.:
red_frag_out = alloc_fragment(out.shape, out.dtype)
IRBuilder.name(out.name + "_frag", red_frag_out)
copy(buffer, red_frag_in)
+ if not clear:
+ copy(out, red_frag_out)
tir.call_intrin(
@@
red_frag_out = alloc_fragment(out.shape, out.dtype)
IRBuilder.name(out.name + "_frag", red_frag_out)
+ if not clear:
+ copy(out, red_frag_out)
tir.call_intrin(🤖 Prompt for AI Agents
In tilelang/language/reduce.py around lines 50 to 88, when staging a
shared-memory destination through red_frag_out we never preload its existing
values, so reduce calls with clear=False lose prior accumulator state; fix by,
after allocating and naming red_frag_out (both in the shared->shared branch and
the fragment->shared branch), conditionally copy(out, red_frag_out) when clear
is False before invoking tir.call_intrin so the fragment starts seeded from the
current out contents.
* Changed the type check for eval values to raise a TypeError for unsupported types, specifically excluding instances of tvm.tir.Buffer. This improves error handling and clarity in the Builder class.
…ile-ai#1219) * [Refactor] Update ReduceOpNode to use absolute values in Max computation and remove unused shared memory reduction logic * Changed Max computation for AbsMax type to use absolute values of lhs and rhs. * Removed unused shared memory reduction logic and related checks for buffer dimensions and thread extents, simplifying the Lower method. * Added a fatal log for unsupported buffer scope reductions. * reduce fix * [Fix] Update type check for eval value in Builder class * Changed the type check for eval values to raise a TypeError for unsupported types, specifically excluding instances of tvm.tir.Buffer. This improves error handling and clarity in the Builder class.
Summary by CodeRabbit