-
Notifications
You must be signed in to change notification settings - Fork 332
[Refactor] Update TVM subproject and streamline buffer store handling #816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Updated the TVM subproject to the latest commit for improved functionality. - Refactored `warp_specialized_rewriter.cc` to replace placeholder implementations for `BlockNode` and `BlockRealizeNode` with proper role filtering, enhancing code clarity and maintainability. - Ensured consistent handling of the `cp_async_barrier_noinc` function in `builtin.py` by adding a newline at the end of the file.
WalkthroughUpdated TVM submodule pointer. Adjusted WSCodeEmitter in warp_specialized_rewriter.cc to handle BlockNode and BlockRealizeNode via FilterByRole instead of hard ICHECK aborts. Minor EOF newline normalization in tilelang/language/builtin.py. No public API changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant WSCodeEmitter
participant FilterByRole
Caller->>WSCodeEmitter: Visit(BlockNode | BlockRealizeNode)
note right of WSCodeEmitter: Previously: ICHECK(0) abort
WSCodeEmitter->>FilterByRole: FilterByRole(op)
alt role matches
FilterByRole-->>WSCodeEmitter: Emit/Traverse block
WSCodeEmitter-->>Caller: Continue traversal
else role does not match
FilterByRole-->>WSCodeEmitter: Skip block
WSCodeEmitter-->>Caller: Continue traversal
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @chengyupku, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request focuses on enhancing the project's foundation and code quality. It updates the embedded TVM subproject to its most recent version, ensuring access to the latest features and bug fixes. Additionally, it refactors a core component responsible for warp specialization, replacing temporary code with more robust and maintainable logic for handling block nodes. Finally, a minor but important fix addresses file consistency in a Python utility by adding a missing newline.
Highlights
- TVM Subproject Update: The TVM subproject has been updated to its latest commit (87b845fa0e14c2029bbf5799b9d490db4f20), bringing improved functionality and aligning with the most recent upstream changes.
- Warp Specialized Rewriter Refactoring: The
warp_specialized_rewriter.ccfile has been refactored to replace placeholderICHECK(0)implementations forBlockNodeandBlockRealizeNodestatement visitors with properFilterByRole(op)calls, enhancing code clarity and maintainability in buffer store handling. - Builtin Python File Consistency: A newline character was added to the end of
builtin.pyto ensure consistent file formatting, specifically affecting thecp_async_barrier_noincfunction definition.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request delivers a series of beneficial updates. The TVM subproject is updated to a newer version. In warp_specialized_rewriter.cc, placeholder implementations for BlockNode and BlockRealizeNode that would cause a crash are correctly replaced with proper role filtering logic, which is a great improvement. Finally, a missing newline is added to tilelang/language/builtin.py, which is a good stylistic fix. All changes are sound and improve the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tilelang/language/builtin.py (1)
265-278: Fix default parity=None in barrier_wait (currently passes None to intrinsic).This is a real runtime bug if callers omit parity.
-def barrier_wait(barrier_id: Union[int, PrimExpr, tir.Call], parity: Union[int, Var, None] = None): +def barrier_wait(barrier_id: Union[int, PrimExpr, tir.Call], parity: Union[int, Var] = 0): """Wait for a memory barrier to complete. Args: barrier_id: Optional[int, PrimExpr] The memory barrier to wait on - parity: Optional[int, Var] - The parity value to wait for + parity: int | Var + The parity value to wait for (default: 0) @@ - Current implementation is a sugar syntax for mbarrier_wait_parity, as we only support parity 0 and 1. + Sugar over mbarrier_wait_parity; only parity 0 and 1 are supported. """ return mbarrier_wait_parity(barrier_id, parity)
🧹 Nitpick comments (5)
tilelang/language/builtin.py (5)
355-359: Normalize barrier_id to a handle in cp_async_barrier_noinc (align with mbarrier_arrive).Prevents passing raw ints/PrimExpr when the intrinsic expects a barrier handle; keeps APIs consistent.
def cp_async_barrier_noinc(barrier_id: Union[int, PrimExpr, tir.Call]): """Perform a ptx async copy barrier using cp.async.mbarrier.arrive.noinc. """ - return tir.call_intrin("handle", tir.op.Op.get("tl.ptx_cp_async_barrier_noinc"), barrier_id) + if isinstance(barrier_id, (tir.Call, tir.BufferLoad)): + mbar = barrier_id + elif isinstance(barrier_id, (tir.PrimExpr, int)): + mbar = get_mbarrier(barrier_id) + elif isinstance(barrier_id, tir.Buffer): + mbar = tir.BufferLoad(barrier_id, [0]) + else: + raise TypeError(f"barrier_id must be int/PrimExpr/Call/Buffer, got {type(barrier_id)}") + return tir.call_intrin("handle", tir.op.Op.get("tl.ptx_cp_async_barrier_noinc"), mbar)
290-305: value can be int but .dtype is used — tighten types or coerce.Either drop int from value’s type or coerce ints to TIR consts before use.
Option A (tighten annotations):
-def shfl_xor(value: Union[int, PrimExpr, tir.Call], offset: Union[int, PrimExpr, tir.Call]): +def shfl_xor(value: Union[PrimExpr, tir.Call], offset: Union[int, PrimExpr, tir.Call]): @@ -def shfl_down(value: Union[int, PrimExpr, tir.Call], offset: Union[int, PrimExpr, tir.Call]): +def shfl_down(value: Union[PrimExpr, tir.Call], offset: Union[int, PrimExpr, tir.Call]): @@ -def shfl_up(value: Union[int, PrimExpr, tir.Call], offset: Union[int, PrimExpr, tir.Call]): +def shfl_up(value: Union[PrimExpr, tir.Call], offset: Union[int, PrimExpr, tir.Call]):Option B (coerce when int):
- if _IS_HIP_AVAILABLE: + if isinstance(value, int): + value = tir.const(value, "int32") + if _IS_HIP_AVAILABLE: return tir.call_extern(value.dtype, "__shfl_xor", value, offset)(Apply similarly in shfl_down/shfl_up.)
Also applies to: 307-318, 320-331
333-337: Docstring nit: this is a block-wide barrier, not warp.tvm_storage_sync("shared") maps to block sync.
-def sync_threads(): - """Synchronize all threads in a warp. +def sync_threads(): + """Synchronize all threads in a block (shared memory barrier). """ return tir.op.tvm_storage_sync("shared")
340-347: Remove debug print from sync_global; avoid noisy stdout in kernels/tools.Also keeps helper pure. Consider adding a verbose flag if needed.
def sync_global(): """Synchronize all threads in a block. """ tx, ty, tz = get_thread_bindings() ex, ey, ez = get_block_extents() - print(tx, ty, tz, ex, ey, ez) args = ["global", tx == 0 and ty == 0 and tz == 0, ex * ey * ez] return evaluate(tir.Call("handle", "tir.tvm_storage_sync", args))
211-220: Error text mentions only int/Call, but Buffer/PrimExpr are accepted.Align messages with actual accepted types.
- raise TypeError(f"mbarrier must be an integer or a tir.Call, but got {type(mbarrier)}") + raise TypeError(f"mbarrier must be int/PrimExpr/Call/Buffer, got {type(mbarrier)}") @@ - raise TypeError(f"mbarrier must be an integer or a tir.Call, but got {type(mbarrier)}") + raise TypeError(f"mbarrier must be int/PrimExpr/Call/Buffer, got {type(mbarrier)}")Also applies to: 229-238
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
3rdparty/tvm(1 hunks)src/transform/warp_specialized_rewriter.cc(1 hunks)tilelang/language/builtin.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-12T09:47:46.474Z
Learnt from: kurisu6912
PR: tile-ai/tilelang#794
File: tilelang/transform/add_bufstore_wrapper.py:30-33
Timestamp: 2025-09-12T09:47:46.474Z
Learning: In TVM's PyStmtExprMutator, visit_block_ methods typically call super().visit_block_(op) to process child nodes and update internal state, but return the original op when the block itself doesn't need transformation. The pattern `return op` is correct for blocks that serve as containers where mutations happen at deeper levels.
Applied to files:
src/transform/warp_specialized_rewriter.cc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: format-check
- GitHub Check: bot-task
🔇 Additional comments (2)
3rdparty/tvm (1)
1-1: TVM submodule bumped to 87b845f — verify CI/compatibility and upstream
- Submodule pointer: eddefbd65acb7b1ea51dd18068b4049754c4fa7a → 87b845fa0e14c2029bbf5799fbbbb9d490db4f20.
- .gitmodules points to https://github.com/TileLang/tvm — confirm this fork is intentional (canonical upstream is usually https://github.com/apache/tvm) and that any local/vendor patches were not dropped.
- Run full CI (CUDA/LLVM/Python bindings) and validate ABI/API compatibility.
- PR also modifies src/transform/warp_specialized_rewriter.cc and tilelang/language/builtin.py — verify these changes are intended with the submodule bump.
src/transform/warp_specialized_rewriter.cc (1)
892-894: Good replacement: Block/BlockRealize now honor role filtering instead of aborting.Delegating both nodes to
FilterByRolealigns them with the rest of the emitter, enabling proper traversal in mbarrier-only mode and correct inclusion/exclusion based onWarpSpecializedRoleMarker. This should unblock nested blocks without changing behavior for producer/consumer split.Please run an example that contains nested Block/BlockRealize inside a SeqStmt (with mixed producer/consumer statements) to confirm:
- producer emission excludes pure-consumer sub-blocks,
- consumer emission preserves grouping assumptions (GroupOpRewriter still sees BlockNodes),
- barriers are still inserted as expected.
…tile-ai#816) - Updated the TVM subproject to the latest commit for improved functionality. - Refactored `warp_specialized_rewriter.cc` to replace placeholder implementations for `BlockNode` and `BlockRealizeNode` with proper role filtering, enhancing code clarity and maintainability. - Ensured consistent handling of the `cp_async_barrier_noinc` function in `builtin.py` by adding a newline at the end of the file.
warp_specialized_rewriter.ccto replace placeholder implementations forBlockNodeandBlockRealizeNodewith proper role filtering, enhancing code clarity and maintainability.cp_async_barrier_noincfunction inbuiltin.pyby adding a newline at the end of the file.Summary by CodeRabbit