[Refactor] Introduce quantize components of TileLang and add testing for dequant gemm exmaple #494

LeiWang1999 · 2025-05-13T14:12:09Z

This pull request introduces several changes across multiple files to enhance functionality, improve modularity, and add new features for dequantization and flash attention operations. The most significant updates include the addition of new examples for dequantization and flash attention, refactoring of existing scripts to improve usability, and updates to the tilelang library to support these operations.

New Examples and Features:

Dequantization Enhancements:

Added a new example in examples/dequantize_gemm/example_dequant_gemv_fp16xint4.py for dequantizing GEMV operations with support for various configurations, including fast decoding and scaling. This includes a comprehensive implementation of the dequantize_gemv function and a testable main function.
Refactored examples/dequantize_gemm/example_dequant_gemm_fp4_hopper.py to encapsulate logic within a main function, improving modularity. The script now accepts parameters via the main function or command-line arguments. [1] [2]

Flash Attention Enhancements:

Introduced a new example in examples/flash_attention/example_mha_fwd_bhsd_wgmma_pipelined.py for flash attention with pipelined execution. This implementation includes kernel macros for matrix multiplication, softmax, and rescaling, along with support for autotuning configurations.

Testing and Licensing:

Added a new test suite in examples/dequantize_gemm/test_example_dequantize_gemm.py to validate the functionality of the dequantization examples for different configurations.
Standardized licensing headers across files, adding or updating the MIT license notice where applicable. [1] [2] [3] [4]

Library Updates:

Updated tilelang/quantize/__init__.py to expose additional utility functions and intrinsics for quantization, such as interleave_weight and get_lop3_intrin_group.
Added a missing import for DataType in tilelang/__init__.py to resolve potential issues with type handling.

…__init__.py

…ata type handling in quantization utilities

…for dequant gemm exmaple (tile-ai#494) * Remove deprecated example_dequant_gemm.py and add DataType import in __init__.py * lint fix * lint fix * Refactor dequantization examples to use tilelang imports and update data type handling in quantization utilities * lint fix

LeiWang1999 added 5 commits May 13, 2025 21:23

Remove deprecated example_dequant_gemm.py and add DataType import in …

b8910f0

…__init__.py

lint fix

09b2ccc

lint fix

4b3974f

Refactor dequantization examples to use tilelang imports and update d…

13daf25

…ata type handling in quantization utilities

lint fix

4fcaa7c

LeiWang1999 merged commit 9cfa724 into tile-ai:main May 14, 2025
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor] Introduce quantize components of TileLang and add testing for dequant gemm exmaple #494

[Refactor] Introduce quantize components of TileLang and add testing for dequant gemm exmaple #494

Uh oh!

LeiWang1999 commented May 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Refactor] Introduce quantize components of TileLang and add testing for dequant gemm exmaple #494

[Refactor] Introduce quantize components of TileLang and add testing for dequant gemm exmaple #494

Uh oh!

Conversation

LeiWang1999 commented May 13, 2025

New Examples and Features:

Dequantization Enhancements:

Flash Attention Enhancements:

Testing and Licensing:

Library Updates:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant