You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Bugfix] Fix layout inference for free fragment buffer (tile-ai#443)
* [Enhancement] Improve layout inference accuracy in ParallelOp (tile-ai#441)
* Added logic to use non-replicated buffers as source buffers for more accurate layout inference.
* Enhanced comments to clarify the rationale behind buffer selection in layout inference process.
* [Enhancement] Add error handling macros and refactor loop partitioning logic
* Introduced TILELANG_CHECK macro for improved error handling in CUDA and HIP code, providing detailed error messages for kernel launches.
* Enhanced loop partitioning logic to handle fragment buffers more effectively, ensuring correct replication based on thread extent.
* Added logging for thread range in PlanLoopPartition to aid in debugging and performance analysis.
* Updated pass configuration management to streamline vectorization control in the optimization process.
* lint fix
* remove debug print
0 commit comments