[Enhancement] Fallback transposed_ldmatrix into `SM75_U16x4_LDSM_N` when warp_n is 8 #498

LeiWang1999 · 2025-05-17T11:09:52Z

Update Copy type in OperandTraits for GEMM templates to use conditional selection based on num_warp_n. This change enhances memory access
patterns for different configurations in CUDA kernels.

…lement a timeout handler in autotuner for function execution. This enhances the robustness of the autotuner by allowing it to handle timeouts gracefully.

… execution, improving robustness in handling long-running tasks. This change includes the introduction of a custom TimeoutException and updates to the run_with_timeout function for better signal management.

- Introduced a new pass for merging shared memory allocations in GPU kernels, allowing for more efficient memory usage. - Registered configuration options for debugging and controlling the merging behavior. - Updated relevant files to integrate the new pass into the TileLang engine and transform modules. - Adjusted import paths and added documentation for the new functionality.

…d performance in test_tilelang_kernel_gemm.py

…al selection based on num_warp_n. This change enhances memory access patterns for different configurations in CUDA kernels.

…merge_smem

github-actions · 2025-05-17T11:10:02Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run bash format.sh in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work!

🚀

…hen warp_n is 8 (tile-ai#498) * Remove debug print statement from block_sparse_attn_triton.py and implement a timeout handler in autotuner for function execution. This enhances the robustness of the autotuner by allowing it to handle timeouts gracefully. * Enhance the autotuner module by adding a timeout handler for function execution, improving robustness in handling long-running tasks. This change includes the introduction of a custom TimeoutException and updates to the run_with_timeout function for better signal management. * Add merge shared memory allocations pass and related configurations - Introduced a new pass for merging shared memory allocations in GPU kernels, allowing for more efficient memory usage. - Registered configuration options for debugging and controlling the merging behavior. - Updated relevant files to integrate the new pass into the TileLang engine and transform modules. - Adjusted import paths and added documentation for the new functionality. * Reduce num_stages parameter in GEMM functions from 3 to 1 for improved performance in test_tilelang_kernel_gemm.py * Update Copy type in OperandTraits for GEMM templates to use conditional selection based on num_warp_n. This change enhances memory access patterns for different configurations in CUDA kernels. * lint fix

LeiWang1999 added 8 commits May 14, 2025 19:43

Remove debug print statement from block_sparse_attn_triton.py and imp…

e82712f

…lement a timeout handler in autotuner for function execution. This enhances the robustness of the autotuner by allowing it to handle timeouts gracefully.

Enhance the autotuner module by adding a timeout handler for function…

2417094

… execution, improving robustness in handling long-running tasks. This change includes the introduction of a custom TimeoutException and updates to the run_with_timeout function for better signal management.

Merge branch 'main' of https://github.com/tile-ai/tilelang into HEAD

e105d12

Reduce num_stages parameter in GEMM functions from 3 to 1 for improve…

aa8ab88

…d performance in test_tilelang_kernel_gemm.py

Update Copy type in OperandTraits for GEMM templates to use condition…

73fb9a0

…al selection based on num_warp_n. This change enhances memory access patterns for different configurations in CUDA kernels.

Merge branch 'main' of https://github.com/tile-ai/tilelang into 0516_…

cd3fe03

…merge_smem

lint fix

08f2418

LeiWang1999 merged commit 3544a64 into tile-ai:main May 17, 2025
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] Fallback transposed_ldmatrix into `SM75_U16x4_LDSM_N` when warp_n is 8 #498

[Enhancement] Fallback transposed_ldmatrix into `SM75_U16x4_LDSM_N` when warp_n is 8 #498

Uh oh!

LeiWang1999 commented May 17, 2025

Uh oh!

github-actions bot commented May 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Enhancement] Fallback transposed_ldmatrix into SM75_U16x4_LDSM_N when warp_n is 8 #498

[Enhancement] Fallback transposed_ldmatrix into SM75_U16x4_LDSM_N when warp_n is 8 #498

Uh oh!

Conversation

LeiWang1999 commented May 17, 2025

Uh oh!

github-actions bot commented May 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Enhancement] Fallback transposed_ldmatrix into `SM75_U16x4_LDSM_N` when warp_n is 8 #498

[Enhancement] Fallback transposed_ldmatrix into `SM75_U16x4_LDSM_N` when warp_n is 8 #498