[Carver] Remove legacy todo items in carver's readme (#74)

LeiWang1999 · web-flow · commit b427ec45e3f3 · 2025-02-11T21:52:09.000+08:00
* [Enhancement] Add VectorizeLoop function and update imports for compatibility

* [CI][Test] Improve test cases for vectorization and fix typos in parser comments

* lint fix

* Fix incorrect module reference for VectorizeLoop transformation

* Refactor vectorize_loop transformation by removing unused extent mutation logic

* [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen

* Fix formatting in CUDA FP8 header file for consistency

* Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity

* Update submodule 'tvm' to latest commit for improved functionality

* Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.

* Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.

* Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency

* Add CUDA requirements to FP8 test cases and update references for clarity

* Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py

* Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py

* Add CUDA requirements and FP8 test cases for matmul and gemv simulations

* Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py

* Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py

* Add BF16 support to matrix multiplication and introduce corresponding test cases

* Add a blank line for improved readability in BF16 GEMM test

* Update acknowledgements in README to include supervision by Zhi Yang at Peking University

* enhance acknowledgement

* Replace tutorial on memory layout optimization with new tutorial on writing high-performance kernels with thread primitives

* Update subproject commit for TVM dependency

* Update subproject commit for TVM dependency

* Add int4_t type and functions for packing char values in CUDA common header

* Add plot_layout example and implement GetForwardVars method in layout classes

* Refactor code for improved readability by adjusting line breaks and formatting in layout and test files

* Fix formatting by removing unnecessary line break in layout.h

* Refactor make_int4 function for improved readability by adjusting parameter formatting

* Add legend to plot_layout for improved clarity of thread and local IDs

* Remove unnecessary dependencies from requirements files for cleaner setup

* Remove flash_mha.py and add .gitkeep to deepseek_mla directory

* Add build requirements and update installation scripts for improved setup

* Introduce carver

* Refactor imports and improve code formatting for consistency

* Add unit tests for carver recommendation hints

* lint fix

* Enhance ElementwiseTemplate and BaseTemplate with detailed docstrings for improved code documentation and clarity

* Refactor import statements and clean up whitespace in template files for improved readability

* Add README.md for Carver framework with usage examples and architecture support

* Refactor import statement in matmul_analysis.py for consistency

* Refactor TileDict and TensorCorePolicy methods for improved clarity and functionality

* Add tests for general matrix multiplication emit configurations

* Refactor formatting in test_tilelang_carver_generate_hints.py for improved readability

* Add FlashAttentionTemplate and related functionality for hint recommendations

* Refactor whitespace in FlashAttentionTemplate and test_tilelang_carver_recommend_hints for improved readability

* Update README.md to include FlashAttentionTemplate in the carver section
diff --git a/tilelang/carver/README.md b/tilelang/carver/README.md
@@ -196,6 +196,7 @@ This helps quickly test multiple configurations without manually guessing.
 
 Carver abstracts common loop patterns through templates:
 - **`GeneralReductionTemplate`**: For general `Spatial-Spatial-Reduce` (SSR) structures or similar.
+- **`FlashAttentionTemplate`**: For attention-like operations with flash memory.
 - **`MatmulTemplate`**: For standard matrix multiplication `C = A * B`.
 - **`GEMVTemplate`**: For `y = Ax` or `y = xA` style operations.
 - **`ElementwiseTemplate`**: For elementwise transformations or pointwise ops.
@@ -205,6 +206,5 @@ You can also create your own specialized templates if you have unique loop struc
 
 ## TODO Items
 
-- [ ] **Flash Attention** and its variants: Support search-space generation for specialized attention kernels.
 - [ ] **Adapt to tile language**: Provide ready-made scheduling calls or wrappers for [tilelang](https://github.com/LeiYanggh/tilelang) to streamline end-to-end integration.