[Bugfix] Fix layout inference for free fragment buffer #443
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces several enhancements and bug fixes across multiple components of the TileLang compiler and runtime. The changes focus on improving error handling, adding configuration flexibility, optimizing performance, and enhancing code maintainability. Below is a categorized summary of the most important changes:
Error Handling Improvements:
Added
TILELANG_CHECKmacros in bothcuda/common.handhip/common.hto standardize error checking for CUDA and HIP API calls. These macros capture and log errors with detailed information, improving debugging capabilities. [1] [2]Enhanced kernel launch error handling in
tilelang/jit/adapter/wrapper.pyby adding checks for CUDA errors after kernel execution. Errors are logged with function-specific details, and execution halts on failure.Layout and Loop Optimization:
Updated the
LoopPartitionerclass inloop_partition.ccto handle fragment buffers more effectively. Introduced logic to avoid replicating loop layouts for fragment buffers, improving performance for certain workloads. [1] [2]Modified the
InferLayoutfunction inparallel.ccto prioritize non-replicated buffers for layout inference, enhancing accuracy.Configuration and Flexibility Enhancements:
Introduced a new
PassConfigKeyclass intilelang/transform/pass_config.pyto centralize and document configuration options for TileLang compiler passes. This includes options for enabling/disabling specific optimizations.Updated
tilelang/engine/phase.pyto allow passing aPassContextobject to functions likeallow_tma_and_warp_specializedandallow_vectorize, enabling more flexible configuration management. [1] [2] [3]Codebase Simplification and Maintenance:
Replaced direct imports of
tvm.transform.PassContextwith a unified import intilelang/transform/__init__.py, ensuring consistency and reducing redundancy.Refactored
_load_tile_lang_libintilelang/__init__.pyto includePassConfigKey, aligning it with new configuration management practices.Minor Fixes:
PREDEF_HOST_FUNCtemplate intilelang/jit/adapter/wrapper.pyto align with coding standards.