Skip to content

Conversation

@LeiWang1999
Copy link
Member

This pull request refactors the KernelLaunch function in src/ir.cc to improve clarity and consistency in handling CPU and GPU kernel launches. The most significant changes include reorganizing comments, ensuring proper annotations for blocks, and replacing empty block names with a more descriptive default.

Improvements to code clarity:

  • Reorganized the comment for launching the CPU kernel to align with the relevant code block, improving readability. [1] [2]

Consistency in block handling:

  • Replaced empty block names (Block("")) with a more descriptive default (Block("root")) to enhance code clarity and maintain consistency. This change applies both when attributes are defined and in the fallback case.

…ogic

* Added comments to distinguish between CPU and GPU kernel launch sections for better code readability.
* Changed the creation of empty blocks to use a consistent "root" identifier, enhancing clarity in frame management.
@LeiWang1999 LeiWang1999 merged commit 6fc627e into tile-ai:main Apr 27, 2025
2 of 3 checks passed
LeiWang1999 added a commit to LeiWang1999/tilelang that referenced this pull request Apr 28, 2025
…i#441)

* Added logic to use non-replicated buffers as source buffers for more accurate layout inference.
* Enhanced comments to clarify the rationale behind buffer selection in layout inference process.
LeiWang1999 added a commit that referenced this pull request Apr 28, 2025
…442)

* Added logic to use non-replicated buffers as source buffers for more accurate layout inference.
* Enhanced comments to clarify the rationale behind buffer selection in layout inference process.
LeiWang1999 added a commit that referenced this pull request Apr 29, 2025
* [Enhancement] Improve layout inference accuracy in ParallelOp (#441)

* Added logic to use non-replicated buffers as source buffers for more accurate layout inference.
* Enhanced comments to clarify the rationale behind buffer selection in layout inference process.

* [Enhancement] Add error handling macros and refactor loop partitioning logic

* Introduced TILELANG_CHECK macro for improved error handling in CUDA and HIP code, providing detailed error messages for kernel launches.
* Enhanced loop partitioning logic to handle fragment buffers more effectively, ensuring correct replication based on thread extent.
* Added logging for thread range in PlanLoopPartition to aid in debugging and performance analysis.
* Updated pass configuration management to streamline vectorization control in the optimization process.

* lint fix

* remove debug print
LeiWang1999 added a commit that referenced this pull request May 1, 2025
* [Enhancement] Improve layout inference accuracy in ParallelOp (#441)

* Added logic to use non-replicated buffers as source buffers for more accurate layout inference.
* Enhanced comments to clarify the rationale behind buffer selection in layout inference process.

* [Enhancement] Add error handling macros and refactor loop partitioning logic

* Introduced TILELANG_CHECK macro for improved error handling in CUDA and HIP code, providing detailed error messages for kernel launches.
* Enhanced loop partitioning logic to handle fragment buffers more effectively, ensuring correct replication based on thread extent.
* Added logging for thread range in PlanLoopPartition to aid in debugging and performance analysis.
* Updated pass configuration management to streamline vectorization control in the optimization process.

* lint fix

* remove debug print

* [Refactor] Update legalize_safe_memory_access.cc to improve memory access handling

* Replaced Apache License header with MIT License.
* Added logic to handle local buffer conditions in memory access.
* Introduced IsLocalBuffer function to check buffer scope.
* Enhanced comments for clarity on memory access operations.
LeiWang1999 added a commit to LeiWang1999/tilelang that referenced this pull request Jul 18, 2025
…ogic (tile-ai#441)

* Added comments to distinguish between CPU and GPU kernel launch sections for better code readability.
* Changed the creation of empty blocks to use a consistent "root" identifier, enhancing clarity in frame management.
LeiWang1999 added a commit to LeiWang1999/tilelang that referenced this pull request Jul 18, 2025
…i#441) (tile-ai#442)

* Added logic to use non-replicated buffers as source buffers for more accurate layout inference.
* Enhanced comments to clarify the rationale behind buffer selection in layout inference process.
LeiWang1999 added a commit to LeiWang1999/tilelang that referenced this pull request Jul 18, 2025
* [Enhancement] Improve layout inference accuracy in ParallelOp (tile-ai#441)

* Added logic to use non-replicated buffers as source buffers for more accurate layout inference.
* Enhanced comments to clarify the rationale behind buffer selection in layout inference process.

* [Enhancement] Add error handling macros and refactor loop partitioning logic

* Introduced TILELANG_CHECK macro for improved error handling in CUDA and HIP code, providing detailed error messages for kernel launches.
* Enhanced loop partitioning logic to handle fragment buffers more effectively, ensuring correct replication based on thread extent.
* Added logging for thread range in PlanLoopPartition to aid in debugging and performance analysis.
* Updated pass configuration management to streamline vectorization control in the optimization process.

* lint fix

* remove debug print
LeiWang1999 added a commit to LeiWang1999/tilelang that referenced this pull request Jul 18, 2025
* [Enhancement] Improve layout inference accuracy in ParallelOp (tile-ai#441)

* Added logic to use non-replicated buffers as source buffers for more accurate layout inference.
* Enhanced comments to clarify the rationale behind buffer selection in layout inference process.

* [Enhancement] Add error handling macros and refactor loop partitioning logic

* Introduced TILELANG_CHECK macro for improved error handling in CUDA and HIP code, providing detailed error messages for kernel launches.
* Enhanced loop partitioning logic to handle fragment buffers more effectively, ensuring correct replication based on thread extent.
* Added logging for thread range in PlanLoopPartition to aid in debugging and performance analysis.
* Updated pass configuration management to streamline vectorization control in the optimization process.

* lint fix

* remove debug print

* [Refactor] Update legalize_safe_memory_access.cc to improve memory access handling

* Replaced Apache License header with MIT License.
* Added logic to handle local buffer conditions in memory access.
* Introduced IsLocalBuffer function to check buffer scope.
* Enhanced comments for clarity on memory access operations.
LeiWang1999 added a commit to LeiWang1999/tilelang that referenced this pull request Jul 20, 2025
…ogic (tile-ai#441)

* Added comments to distinguish between CPU and GPU kernel launch sections for better code readability.
* Changed the creation of empty blocks to use a consistent "root" identifier, enhancing clarity in frame management.
LeiWang1999 added a commit to LeiWang1999/tilelang that referenced this pull request Jul 20, 2025
…i#441) (tile-ai#442)

* Added logic to use non-replicated buffers as source buffers for more accurate layout inference.
* Enhanced comments to clarify the rationale behind buffer selection in layout inference process.
LeiWang1999 added a commit to LeiWang1999/tilelang that referenced this pull request Jul 20, 2025
* [Enhancement] Improve layout inference accuracy in ParallelOp (tile-ai#441)

* Added logic to use non-replicated buffers as source buffers for more accurate layout inference.
* Enhanced comments to clarify the rationale behind buffer selection in layout inference process.

* [Enhancement] Add error handling macros and refactor loop partitioning logic

* Introduced TILELANG_CHECK macro for improved error handling in CUDA and HIP code, providing detailed error messages for kernel launches.
* Enhanced loop partitioning logic to handle fragment buffers more effectively, ensuring correct replication based on thread extent.
* Added logging for thread range in PlanLoopPartition to aid in debugging and performance analysis.
* Updated pass configuration management to streamline vectorization control in the optimization process.

* lint fix

* remove debug print
LeiWang1999 added a commit to LeiWang1999/tilelang that referenced this pull request Jul 20, 2025
* [Enhancement] Improve layout inference accuracy in ParallelOp (tile-ai#441)

* Added logic to use non-replicated buffers as source buffers for more accurate layout inference.
* Enhanced comments to clarify the rationale behind buffer selection in layout inference process.

* [Enhancement] Add error handling macros and refactor loop partitioning logic

* Introduced TILELANG_CHECK macro for improved error handling in CUDA and HIP code, providing detailed error messages for kernel launches.
* Enhanced loop partitioning logic to handle fragment buffers more effectively, ensuring correct replication based on thread extent.
* Added logging for thread range in PlanLoopPartition to aid in debugging and performance analysis.
* Updated pass configuration management to streamline vectorization control in the optimization process.

* lint fix

* remove debug print

* [Refactor] Update legalize_safe_memory_access.cc to improve memory access handling

* Replaced Apache License header with MIT License.
* Added logic to handle local buffer conditions in memory access.
* Introduced IsLocalBuffer function to check buffer scope.
* Enhanced comments for clarity on memory access operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant