Skip to content

Conversation

@LeiWang1999
Copy link
Member

This pull request introduces significant updates to the handling of thread bounds and layout inference across multiple files. Key changes include replacing block_size with thread_bounds, adding new methods to manage thread ranges in the FragmentNode class, and refactoring layout inference logic to improve accuracy and maintainability. Additionally, it introduces a new utility for collecting thread bounds and updates several classes to use IRVisitorWithAnalyzer instead of StmtExprVisitor.

Thread bounds and layout inference updates:

  • Replaced block_size with thread_bounds in LowerArgs and LayoutInferArgs structures, and updated all related method calls to reflect this change. (src/op/op.h [1] [2]; src/op/elem.cc [3] [4] [5]; src/op/gemm.cc [6] [7] [8] [9] [10]; src/op/parallel.cc [11] [12]; src/op/reduce.cc [13]

  • Added a utility in BufferUseDefCollector to compute and store thread bounds for each thread variable, enabling more precise layout inference. (src/transform/layout_inference.cc [1] [2]

FragmentNode enhancements:

  • Introduced SetThreadRange and ThreadRange methods in the FragmentNode class to manage thread ranges, and added a new thread_range_ member variable. (src/layout/layout.h [1]; src/layout/layout.cc [2]

Refactoring and code quality improvements:

  • Replaced StmtExprVisitor with IRVisitorWithAnalyzer in BufferUseDefCollector and updated related method calls for consistency and better analysis capabilities. (src/transform/layout_inference.cc [1] [2] [3] [4]

  • Simplified and clarified logic in ParallelLoopTransformer by restructuring the handling of parallel loops and improving code readability. (src/transform/layout_inference.cc [1] [2] [3] [4] [5]

Miscellaneous:

…on handling

* Added `SetThreadRange` method to `FragmentNode` for managing thread ranges.
* Updated `LayoutNode::Inverse` to provide more informative error messages.
* Refactored layout inference and operation lowering to utilize `thread_bounds` instead of `block_size`, enhancing flexibility for thread management.
* Introduced new tests for tilelang operations to validate thread range functionality and ensure correctness in parallel execution scenarios.
…peration lowering

* Removed workaround for undefined thread_var in layout inference, ensuring proper handling of thread bounds.
* Updated logic to define thread bounds based on the presence of thread_var, enhancing robustness in thread management.
* Refactored thread_var initialization in lower_tile_op to maintain consistency across the codebase.
…eration lowering

* Refactored thread variable checks to ensure bounds are only accessed when defined, improving safety and clarity.
* Initialized thread_var with a default range to prevent undefined behavior.
* Updated logic in lower_tile_op to align with new thread variable handling, enhancing consistency across the codebase.
@LeiWang1999 LeiWang1999 requested a review from Copilot April 22, 2025 02:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances layout inference and parallelism by replacing the block size parameter with a thread_bounds parameter throughout the codebase, improving thread range handling and overall maintainability. Key changes include updating LowerArgs and LayoutInferArgs to use thread_bounds, adding new methods in FragmentNode to set and retrieve thread ranges, and refactoring loop partitioning and layout inference logic.

Reviewed Changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tilelang/transform/init.py Added a helper function get_pass_context for fetching the current pass context.
tilelang/engine/phase.py Introduced allow_tma_and_warp_specialized and updated target-specific lowering in phase logic.
testing/python/language/test_tilelang_language_mask_op.py Added new tests for different tilelang copy mask functions using parallel and copy ranges.
src/transform/lower_tile_op.cc Updated thread bounds computation and replaced Var with IterVar for thread handling.
src/transform/loop_partition.h & .cc Added an overload of PlanLoopPartition using a thread range and adjusted substitution logic.
src/transform/layout_inference.cc Replaced StmtExprVisitor with IRVisitorWithAnalyzer and refined parallel loop transformation.
src/op/reduce.cc Updated workspace computation to derive values from thread_bounds instead of block_size.
src/op/parallel.cc Modified InferLayout to compute block_size using thread_bounds and adjusted predicate checks.
src/op/op.h Replaced block_size with thread_bounds in LowerArgs and LayoutInferArgs structures.
src/op/gemm.cc Updated warp partition computation and layout inference logic based on the new thread_bounds.
src/op/elem.cc Changed InferLayout calls to use thread_bounds while preserving previous behavior.
src/layout/layout.h & layout.cc Introduced SetThreadRange and ThreadRange methods in FragmentNode and enhanced layout errors.
Files not reviewed (1)
  • 3rdparty/tvm: Language not supported
Comments suppressed due to low confidence (3)

src/transform/loop_partition.cc:169

  • Ensure that thread_range is always defined with a min value less than or equal to its extent to avoid an underflow in the num_thread calculation.
size_t num_thread = *as_const_int(thread_range->extent) - *as_const_int(thread_range->min);

src/transform/lower_tile_op.cc:305

  • Verify that the fallback thread_bounds (Range from 0 to 1) is appropriate for all target backends; if not, adjust the default to match expected thread range semantics.
thread_bounds = Range::FromMinExtent(0, 1);

src/transform/layout_inference.cc:408

  • [nitpick] Confirm that switching from StmtExprVisitor to IRVisitorWithAnalyzer in BufferUseDefCollector covers all necessary node types handled by the previous implementation.
IRVisitorWithAnalyzer::VisitExpr_(op);

@LeiWang1999 LeiWang1999 merged commit e3ee8b2 into tile-ai:main Apr 22, 2025
4 of 5 checks passed
LeiWang1999 added a commit to LeiWang1999/tilelang that referenced this pull request Jul 18, 2025
…able constraint (tile-ai#417)

* [Enhancement] Introduce thread range management in layout and operation handling

* Added `SetThreadRange` method to `FragmentNode` for managing thread ranges.
* Updated `LayoutNode::Inverse` to provide more informative error messages.
* Refactored layout inference and operation lowering to utilize `thread_bounds` instead of `block_size`, enhancing flexibility for thread management.
* Introduced new tests for tilelang operations to validate thread range functionality and ensure correctness in parallel execution scenarios.

* lint fix

* [Refactor] Improve thread variable handling in layout inference and operation lowering

* Removed workaround for undefined thread_var in layout inference, ensuring proper handling of thread bounds.
* Updated logic to define thread bounds based on the presence of thread_var, enhancing robustness in thread management.
* Refactored thread_var initialization in lower_tile_op to maintain consistency across the codebase.

* [Refactor] Update thread variable handling in layout inference and operation lowering

* Refactored thread variable checks to ensure bounds are only accessed when defined, improving safety and clarity.
* Initialized thread_var with a default range to prevent undefined behavior.
* Updated logic in lower_tile_op to align with new thread variable handling, enhancing consistency across the codebase.
LeiWang1999 added a commit to LeiWang1999/tilelang that referenced this pull request Jul 20, 2025
…able constraint (tile-ai#417)

* [Enhancement] Introduce thread range management in layout and operation handling

* Added `SetThreadRange` method to `FragmentNode` for managing thread ranges.
* Updated `LayoutNode::Inverse` to provide more informative error messages.
* Refactored layout inference and operation lowering to utilize `thread_bounds` instead of `block_size`, enhancing flexibility for thread management.
* Introduced new tests for tilelang operations to validate thread range functionality and ensure correctness in parallel execution scenarios.

* lint fix

* [Refactor] Improve thread variable handling in layout inference and operation lowering

* Removed workaround for undefined thread_var in layout inference, ensuring proper handling of thread bounds.
* Updated logic to define thread bounds based on the presence of thread_var, enhancing robustness in thread management.
* Refactored thread_var initialization in lower_tile_op to maintain consistency across the codebase.

* [Refactor] Update thread variable handling in layout inference and operation lowering

* Refactored thread variable checks to ensure bounds are only accessed when defined, improving safety and clarity.
* Initialized thread_var with a default range to prevent undefined behavior.
* Updated logic in lower_tile_op to align with new thread variable handling, enhancing consistency across the codebase.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant