Skip to content

Commit b427ec4

Browse files
authored
[Carver] Remove legacy todo items in carver's readme (#74)
* [Enhancement] Add VectorizeLoop function and update imports for compatibility * [CI][Test] Improve test cases for vectorization and fix typos in parser comments * lint fix * Fix incorrect module reference for VectorizeLoop transformation * Refactor vectorize_loop transformation by removing unused extent mutation logic * [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen * Fix formatting in CUDA FP8 header file for consistency * Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity * Update submodule 'tvm' to latest commit for improved functionality * Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule. * Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files. * Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency * Add CUDA requirements to FP8 test cases and update references for clarity * Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py * Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py * Add CUDA requirements and FP8 test cases for matmul and gemv simulations * Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py * Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py * Add BF16 support to matrix multiplication and introduce corresponding test cases * Add a blank line for improved readability in BF16 GEMM test * Update acknowledgements in README to include supervision by Zhi Yang at Peking University * enhance acknowledgement * Replace tutorial on memory layout optimization with new tutorial on writing high-performance kernels with thread primitives * Update subproject commit for TVM dependency * Update subproject commit for TVM dependency * Add int4_t type and functions for packing char values in CUDA common header * Add plot_layout example and implement GetForwardVars method in layout classes * Refactor code for improved readability by adjusting line breaks and formatting in layout and test files * Fix formatting by removing unnecessary line break in layout.h * Refactor make_int4 function for improved readability by adjusting parameter formatting * Add legend to plot_layout for improved clarity of thread and local IDs * Remove unnecessary dependencies from requirements files for cleaner setup * Remove flash_mha.py and add .gitkeep to deepseek_mla directory * Add build requirements and update installation scripts for improved setup * Introduce carver * Refactor imports and improve code formatting for consistency * Add unit tests for carver recommendation hints * lint fix * Enhance ElementwiseTemplate and BaseTemplate with detailed docstrings for improved code documentation and clarity * Refactor import statements and clean up whitespace in template files for improved readability * Add README.md for Carver framework with usage examples and architecture support * Refactor import statement in matmul_analysis.py for consistency * Refactor TileDict and TensorCorePolicy methods for improved clarity and functionality * Add tests for general matrix multiplication emit configurations * Refactor formatting in test_tilelang_carver_generate_hints.py for improved readability * Add FlashAttentionTemplate and related functionality for hint recommendations * Refactor whitespace in FlashAttentionTemplate and test_tilelang_carver_recommend_hints for improved readability * Update README.md to include FlashAttentionTemplate in the carver section
1 parent 03dbbe4 commit b427ec4

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

tilelang/carver/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,7 @@ This helps quickly test multiple configurations without manually guessing.
196196

197197
Carver abstracts common loop patterns through templates:
198198
- **`GeneralReductionTemplate`**: For general `Spatial-Spatial-Reduce` (SSR) structures or similar.
199+
- **`FlashAttentionTemplate`**: For attention-like operations with flash memory.
199200
- **`MatmulTemplate`**: For standard matrix multiplication `C = A * B`.
200201
- **`GEMVTemplate`**: For `y = Ax` or `y = xA` style operations.
201202
- **`ElementwiseTemplate`**: For elementwise transformations or pointwise ops.
@@ -205,6 +206,5 @@ You can also create your own specialized templates if you have unique loop struc
205206

206207
## TODO Items
207208

208-
- [ ] **Flash Attention** and its variants: Support search-space generation for specialized attention kernels.
209209
- [ ] **Adapt to tile language**: Provide ready-made scheduling calls or wrappers for [tilelang](https://github.com/LeiYanggh/tilelang) to streamline end-to-end integration.
210210

0 commit comments

Comments
 (0)