-
Notifications
You must be signed in to change notification settings - Fork 333
[Refactor] Update KernelLaunch to clarify block name #441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ogic * Added comments to distinguish between CPU and GPU kernel launch sections for better code readability. * Changed the creation of empty blocks to use a consistent "root" identifier, enhancing clarity in frame management.
LeiWang1999
added a commit
to LeiWang1999/tilelang
that referenced
this pull request
Apr 28, 2025
…i#441) * Added logic to use non-replicated buffers as source buffers for more accurate layout inference. * Enhanced comments to clarify the rationale behind buffer selection in layout inference process.
LeiWang1999
added a commit
that referenced
this pull request
Apr 28, 2025
LeiWang1999
added a commit
that referenced
this pull request
Apr 29, 2025
* [Enhancement] Improve layout inference accuracy in ParallelOp (#441) * Added logic to use non-replicated buffers as source buffers for more accurate layout inference. * Enhanced comments to clarify the rationale behind buffer selection in layout inference process. * [Enhancement] Add error handling macros and refactor loop partitioning logic * Introduced TILELANG_CHECK macro for improved error handling in CUDA and HIP code, providing detailed error messages for kernel launches. * Enhanced loop partitioning logic to handle fragment buffers more effectively, ensuring correct replication based on thread extent. * Added logging for thread range in PlanLoopPartition to aid in debugging and performance analysis. * Updated pass configuration management to streamline vectorization control in the optimization process. * lint fix * remove debug print
LeiWang1999
added a commit
that referenced
this pull request
May 1, 2025
* [Enhancement] Improve layout inference accuracy in ParallelOp (#441) * Added logic to use non-replicated buffers as source buffers for more accurate layout inference. * Enhanced comments to clarify the rationale behind buffer selection in layout inference process. * [Enhancement] Add error handling macros and refactor loop partitioning logic * Introduced TILELANG_CHECK macro for improved error handling in CUDA and HIP code, providing detailed error messages for kernel launches. * Enhanced loop partitioning logic to handle fragment buffers more effectively, ensuring correct replication based on thread extent. * Added logging for thread range in PlanLoopPartition to aid in debugging and performance analysis. * Updated pass configuration management to streamline vectorization control in the optimization process. * lint fix * remove debug print * [Refactor] Update legalize_safe_memory_access.cc to improve memory access handling * Replaced Apache License header with MIT License. * Added logic to handle local buffer conditions in memory access. * Introduced IsLocalBuffer function to check buffer scope. * Enhanced comments for clarity on memory access operations.
LeiWang1999
added a commit
to LeiWang1999/tilelang
that referenced
this pull request
Jul 18, 2025
…ogic (tile-ai#441) * Added comments to distinguish between CPU and GPU kernel launch sections for better code readability. * Changed the creation of empty blocks to use a consistent "root" identifier, enhancing clarity in frame management.
LeiWang1999
added a commit
to LeiWang1999/tilelang
that referenced
this pull request
Jul 18, 2025
…i#441) (tile-ai#442) * Added logic to use non-replicated buffers as source buffers for more accurate layout inference. * Enhanced comments to clarify the rationale behind buffer selection in layout inference process.
LeiWang1999
added a commit
to LeiWang1999/tilelang
that referenced
this pull request
Jul 18, 2025
* [Enhancement] Improve layout inference accuracy in ParallelOp (tile-ai#441) * Added logic to use non-replicated buffers as source buffers for more accurate layout inference. * Enhanced comments to clarify the rationale behind buffer selection in layout inference process. * [Enhancement] Add error handling macros and refactor loop partitioning logic * Introduced TILELANG_CHECK macro for improved error handling in CUDA and HIP code, providing detailed error messages for kernel launches. * Enhanced loop partitioning logic to handle fragment buffers more effectively, ensuring correct replication based on thread extent. * Added logging for thread range in PlanLoopPartition to aid in debugging and performance analysis. * Updated pass configuration management to streamline vectorization control in the optimization process. * lint fix * remove debug print
LeiWang1999
added a commit
to LeiWang1999/tilelang
that referenced
this pull request
Jul 18, 2025
* [Enhancement] Improve layout inference accuracy in ParallelOp (tile-ai#441) * Added logic to use non-replicated buffers as source buffers for more accurate layout inference. * Enhanced comments to clarify the rationale behind buffer selection in layout inference process. * [Enhancement] Add error handling macros and refactor loop partitioning logic * Introduced TILELANG_CHECK macro for improved error handling in CUDA and HIP code, providing detailed error messages for kernel launches. * Enhanced loop partitioning logic to handle fragment buffers more effectively, ensuring correct replication based on thread extent. * Added logging for thread range in PlanLoopPartition to aid in debugging and performance analysis. * Updated pass configuration management to streamline vectorization control in the optimization process. * lint fix * remove debug print * [Refactor] Update legalize_safe_memory_access.cc to improve memory access handling * Replaced Apache License header with MIT License. * Added logic to handle local buffer conditions in memory access. * Introduced IsLocalBuffer function to check buffer scope. * Enhanced comments for clarity on memory access operations.
LeiWang1999
added a commit
to LeiWang1999/tilelang
that referenced
this pull request
Jul 20, 2025
…ogic (tile-ai#441) * Added comments to distinguish between CPU and GPU kernel launch sections for better code readability. * Changed the creation of empty blocks to use a consistent "root" identifier, enhancing clarity in frame management.
LeiWang1999
added a commit
to LeiWang1999/tilelang
that referenced
this pull request
Jul 20, 2025
…i#441) (tile-ai#442) * Added logic to use non-replicated buffers as source buffers for more accurate layout inference. * Enhanced comments to clarify the rationale behind buffer selection in layout inference process.
LeiWang1999
added a commit
to LeiWang1999/tilelang
that referenced
this pull request
Jul 20, 2025
* [Enhancement] Improve layout inference accuracy in ParallelOp (tile-ai#441) * Added logic to use non-replicated buffers as source buffers for more accurate layout inference. * Enhanced comments to clarify the rationale behind buffer selection in layout inference process. * [Enhancement] Add error handling macros and refactor loop partitioning logic * Introduced TILELANG_CHECK macro for improved error handling in CUDA and HIP code, providing detailed error messages for kernel launches. * Enhanced loop partitioning logic to handle fragment buffers more effectively, ensuring correct replication based on thread extent. * Added logging for thread range in PlanLoopPartition to aid in debugging and performance analysis. * Updated pass configuration management to streamline vectorization control in the optimization process. * lint fix * remove debug print
LeiWang1999
added a commit
to LeiWang1999/tilelang
that referenced
this pull request
Jul 20, 2025
* [Enhancement] Improve layout inference accuracy in ParallelOp (tile-ai#441) * Added logic to use non-replicated buffers as source buffers for more accurate layout inference. * Enhanced comments to clarify the rationale behind buffer selection in layout inference process. * [Enhancement] Add error handling macros and refactor loop partitioning logic * Introduced TILELANG_CHECK macro for improved error handling in CUDA and HIP code, providing detailed error messages for kernel launches. * Enhanced loop partitioning logic to handle fragment buffers more effectively, ensuring correct replication based on thread extent. * Added logging for thread range in PlanLoopPartition to aid in debugging and performance analysis. * Updated pass configuration management to streamline vectorization control in the optimization process. * lint fix * remove debug print * [Refactor] Update legalize_safe_memory_access.cc to improve memory access handling * Replaced Apache License header with MIT License. * Added logic to handle local buffer conditions in memory access. * Introduced IsLocalBuffer function to check buffer scope. * Enhanced comments for clarity on memory access operations.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request refactors the
KernelLaunchfunction insrc/ir.ccto improve clarity and consistency in handling CPU and GPU kernel launches. The most significant changes include reorganizing comments, ensuring proper annotations for blocks, and replacing empty block names with a more descriptive default.Improvements to code clarity:
Consistency in block handling:
Block("")) with a more descriptive default (Block("root")) to enhance code clarity and maintain consistency. This change applies both when attributes are defined and in the fallback case.