-
Notifications
You must be signed in to change notification settings - Fork 331
[Enhancement] Support Auto Layout Inference and Parallelism with variable constraint #417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Support Auto Layout Inference and Parallelism with variable constraint #417
Conversation
…on handling * Added `SetThreadRange` method to `FragmentNode` for managing thread ranges. * Updated `LayoutNode::Inverse` to provide more informative error messages. * Refactored layout inference and operation lowering to utilize `thread_bounds` instead of `block_size`, enhancing flexibility for thread management. * Introduced new tests for tilelang operations to validate thread range functionality and ensure correctness in parallel execution scenarios.
…peration lowering * Removed workaround for undefined thread_var in layout inference, ensuring proper handling of thread bounds. * Updated logic to define thread bounds based on the presence of thread_var, enhancing robustness in thread management. * Refactored thread_var initialization in lower_tile_op to maintain consistency across the codebase.
…eration lowering * Refactored thread variable checks to ensure bounds are only accessed when defined, improving safety and clarity. * Initialized thread_var with a default range to prevent undefined behavior. * Updated logic in lower_tile_op to align with new thread variable handling, enhancing consistency across the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances layout inference and parallelism by replacing the block size parameter with a thread_bounds parameter throughout the codebase, improving thread range handling and overall maintainability. Key changes include updating LowerArgs and LayoutInferArgs to use thread_bounds, adding new methods in FragmentNode to set and retrieve thread ranges, and refactoring loop partitioning and layout inference logic.
Reviewed Changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tilelang/transform/init.py | Added a helper function get_pass_context for fetching the current pass context. |
| tilelang/engine/phase.py | Introduced allow_tma_and_warp_specialized and updated target-specific lowering in phase logic. |
| testing/python/language/test_tilelang_language_mask_op.py | Added new tests for different tilelang copy mask functions using parallel and copy ranges. |
| src/transform/lower_tile_op.cc | Updated thread bounds computation and replaced Var with IterVar for thread handling. |
| src/transform/loop_partition.h & .cc | Added an overload of PlanLoopPartition using a thread range and adjusted substitution logic. |
| src/transform/layout_inference.cc | Replaced StmtExprVisitor with IRVisitorWithAnalyzer and refined parallel loop transformation. |
| src/op/reduce.cc | Updated workspace computation to derive values from thread_bounds instead of block_size. |
| src/op/parallel.cc | Modified InferLayout to compute block_size using thread_bounds and adjusted predicate checks. |
| src/op/op.h | Replaced block_size with thread_bounds in LowerArgs and LayoutInferArgs structures. |
| src/op/gemm.cc | Updated warp partition computation and layout inference logic based on the new thread_bounds. |
| src/op/elem.cc | Changed InferLayout calls to use thread_bounds while preserving previous behavior. |
| src/layout/layout.h & layout.cc | Introduced SetThreadRange and ThreadRange methods in FragmentNode and enhanced layout errors. |
Files not reviewed (1)
- 3rdparty/tvm: Language not supported
Comments suppressed due to low confidence (3)
src/transform/loop_partition.cc:169
- Ensure that thread_range is always defined with a min value less than or equal to its extent to avoid an underflow in the num_thread calculation.
size_t num_thread = *as_const_int(thread_range->extent) - *as_const_int(thread_range->min);
src/transform/lower_tile_op.cc:305
- Verify that the fallback thread_bounds (Range from 0 to 1) is appropriate for all target backends; if not, adjust the default to match expected thread range semantics.
thread_bounds = Range::FromMinExtent(0, 1);
src/transform/layout_inference.cc:408
- [nitpick] Confirm that switching from StmtExprVisitor to IRVisitorWithAnalyzer in BufferUseDefCollector covers all necessary node types handled by the previous implementation.
IRVisitorWithAnalyzer::VisitExpr_(op);
…able constraint (tile-ai#417) * [Enhancement] Introduce thread range management in layout and operation handling * Added `SetThreadRange` method to `FragmentNode` for managing thread ranges. * Updated `LayoutNode::Inverse` to provide more informative error messages. * Refactored layout inference and operation lowering to utilize `thread_bounds` instead of `block_size`, enhancing flexibility for thread management. * Introduced new tests for tilelang operations to validate thread range functionality and ensure correctness in parallel execution scenarios. * lint fix * [Refactor] Improve thread variable handling in layout inference and operation lowering * Removed workaround for undefined thread_var in layout inference, ensuring proper handling of thread bounds. * Updated logic to define thread bounds based on the presence of thread_var, enhancing robustness in thread management. * Refactored thread_var initialization in lower_tile_op to maintain consistency across the codebase. * [Refactor] Update thread variable handling in layout inference and operation lowering * Refactored thread variable checks to ensure bounds are only accessed when defined, improving safety and clarity. * Initialized thread_var with a default range to prevent undefined behavior. * Updated logic in lower_tile_op to align with new thread variable handling, enhancing consistency across the codebase.
…able constraint (tile-ai#417) * [Enhancement] Introduce thread range management in layout and operation handling * Added `SetThreadRange` method to `FragmentNode` for managing thread ranges. * Updated `LayoutNode::Inverse` to provide more informative error messages. * Refactored layout inference and operation lowering to utilize `thread_bounds` instead of `block_size`, enhancing flexibility for thread management. * Introduced new tests for tilelang operations to validate thread range functionality and ensure correctness in parallel execution scenarios. * lint fix * [Refactor] Improve thread variable handling in layout inference and operation lowering * Removed workaround for undefined thread_var in layout inference, ensuring proper handling of thread bounds. * Updated logic to define thread bounds based on the presence of thread_var, enhancing robustness in thread management. * Refactored thread_var initialization in lower_tile_op to maintain consistency across the codebase. * [Refactor] Update thread variable handling in layout inference and operation lowering * Refactored thread variable checks to ensure bounds are only accessed when defined, improving safety and clarity. * Initialized thread_var with a default range to prevent undefined behavior. * Updated logic in lower_tile_op to align with new thread variable handling, enhancing consistency across the codebase.
This pull request introduces significant updates to the handling of thread bounds and layout inference across multiple files. Key changes include replacing
block_sizewiththread_bounds, adding new methods to manage thread ranges in theFragmentNodeclass, and refactoring layout inference logic to improve accuracy and maintainability. Additionally, it introduces a new utility for collecting thread bounds and updates several classes to useIRVisitorWithAnalyzerinstead ofStmtExprVisitor.Thread bounds and layout inference updates:
Replaced
block_sizewiththread_boundsinLowerArgsandLayoutInferArgsstructures, and updated all related method calls to reflect this change. (src/op/op.h[1] [2];src/op/elem.cc[3] [4] [5];src/op/gemm.cc[6] [7] [8] [9] [10];src/op/parallel.cc[11] [12];src/op/reduce.cc[13]Added a utility in
BufferUseDefCollectorto compute and store thread bounds for each thread variable, enabling more precise layout inference. (src/transform/layout_inference.cc[1] [2]FragmentNode enhancements:
SetThreadRangeandThreadRangemethods in theFragmentNodeclass to manage thread ranges, and added a newthread_range_member variable. (src/layout/layout.h[1];src/layout/layout.cc[2]Refactoring and code quality improvements:
Replaced
StmtExprVisitorwithIRVisitorWithAnalyzerinBufferUseDefCollectorand updated related method calls for consistency and better analysis capabilities. (src/transform/layout_inference.cc[1] [2] [3] [4]Simplified and clarified logic in
ParallelLoopTransformerby restructuring the handling of parallel loops and improving code readability. (src/transform/layout_inference.cc[1] [2] [3] [4] [5]Miscellaneous:
Improved error messages in
LayoutNode::Inverseto include layout details for better debugging. (src/layout/layout.ccsrc/layout/layout.ccR207-R218)Added missing include for
arith/ir_visitor_with_analyzer.hinlayout_inference.cc. (src/transform/layout_inference.ccsrc/transform/layout_inference.ccR20)