[Benchmark] Add benchmark scripts for block sparse attention #114

LeiWang1999 · 2025-02-24T09:23:46Z

This pull request introduces several new benchmarking scripts for block-sparse attention mechanisms using different libraries and configurations. These changes are aimed at improving the performance analysis of sparse attention models. The most important changes include the addition of new benchmarking scripts for dense, tilelang, torch, and triton implementations, as well as the inclusion of a new dependency in the requirements file.

New Benchmarking Scripts:

benchmark/blocksparse_attention/benchmark_library_dense_fmha.py: Added a new benchmarking script for dense FlashAttention using the flash_attn library.
benchmark/blocksparse_attention/benchmark_tilelang_block_sparse_fmha.py: Added a new benchmarking script for block-sparse attention using the tilelang library.
benchmark/blocksparse_attention/benchmark_torch_block_sparse_fmha.py: Added a new benchmarking script for block-sparse attention using PyTorch.
benchmark/blocksparse_attention/benchmark_triton_block_sparse_fmha.py: Added a new benchmarking script for block-sparse attention using the triton library.

Configuration and Dependencies:

benchmark/blocksparse_attention/benchmark_configs.py: Added a configuration file with predefined benchmark settings.
benchmark/blocksparse_attention/requirements.txt: Added flash-attn as a new dependency.

This commit introduces two new example scripts demonstrating advanced GEMM (matrix multiplication) techniques: - `example_tilelang_gemm_splitk.py`: Implements a Split-K GEMM kernel using TileLang - `example_tilelang_gemm_streamk.py`: Implements a Stream-K GEMM kernel using TileLang Both examples showcase different parallel computation strategies for matrix multiplication, with comprehensive testing using PyTorch reference implementations.

Clean up and improve code formatting for the SplitK and StreamK GEMM example scripts: - Remove unused import (Profiler) in splitk example - Simplify line breaks and improve code readability - Standardize indentation and remove unnecessary whitespace - Optimize atomic add and copy operations for better clarity

This commit introduces comprehensive block sparse attention benchmarks for different libraries: - TileLang block sparse FMHA implementation - Triton block sparse FMHA implementation - PyTorch reference block sparse FMHA implementation - FlashAttention dense FMHA reference implementation The benchmarks include: - Configurable benchmark parameters (batch size, heads, sequence length, etc.) - Sparse mask generation using top-k and threshold methods - Performance measurement for different sparse attention configurations - Utility functions for mask generation and benchmarking

- Add Ruff linter ignore comments to benchmark files - Improve code formatting and line breaks - Remove unused imports - Standardize print statement formatting - Enhance code readability across multiple library benchmarks

LeiWang1999 added 7 commits February 23, 2025 17:39

Add DeepSeek MLA decode example with Flash Attention implementation

7d35ac5

Merge branch 'main' of https://github.com/tile-ai/tilelang into dev

c86430d

lint fix

166ef78

LeiWang1999 merged commit 3661298 into tile-ai:main Feb 24, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Benchmark] Add benchmark scripts for block sparse attention #114

[Benchmark] Add benchmark scripts for block sparse attention #114

Uh oh!

LeiWang1999 commented Feb 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Benchmark] Add benchmark scripts for block sparse attention #114

[Benchmark] Add benchmark scripts for block sparse attention #114

Uh oh!

Conversation

LeiWang1999 commented Feb 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant