[Bugfix] bug fix for bitblas dependency (#71)

LeiWang1999 · web-flow · commit f707855f3006 · 2025-02-10T21:45:25.000+08:00
* [Enhancement] Add VectorizeLoop function and update imports for compatibility

* [CI][Test] Improve test cases for vectorization and fix typos in parser comments

* lint fix

* Fix incorrect module reference for VectorizeLoop transformation

* Refactor vectorize_loop transformation by removing unused extent mutation logic

* [Enhancement] Add support for FP8 data types and global barriers in CUDA codegen

* Fix formatting in CUDA FP8 header file for consistency

* Refactor CI workflow to use 'tilelang_ci' virtual environment and update CUDA type printing for better clarity

* Update submodule 'tvm' to latest commit for improved functionality

* Refactor execution backend references from 'dl_pack' to 'dlpack' for consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.

* Refactor CUDA code for improved readability; clean up formatting and remove unnecessary whitespace in multiple files.

* Refactor import statement in test_tilelang_kernel_dequantize_gemm.py to use 'tilelang.language' for consistency

* Add CUDA requirements to FP8 test cases and update references for clarity

* Add a blank line for improved readability in test_tilelang_kernel_fp8_gemm_mma.py

* Fix data type in reference result calculation for consistency in test_tilelang_kernel_gemm_mma_intrinsic.py

* Add CUDA requirements and FP8 test cases for matmul and gemv simulations

* Remove debug print statements and use tilelang's testing assertion for result validation in test_tilelang_kernel_gemm_mma_intrinsic.py

* Remove outdated comment regarding FP8 tests in test_tilelang_kernel_gemv_simt.py

* Add BF16 support to matrix multiplication and introduce corresponding test cases

* Add a blank line for improved readability in BF16 GEMM test

* Update acknowledgements in README to include supervision by Zhi Yang at Peking University

* enhance acknowledgement

* Replace tutorial on memory layout optimization with new tutorial on writing high-performance kernels with thread primitives

* Update subproject commit for TVM dependency

* Update subproject commit for TVM dependency

* Add int4_t type and functions for packing char values in CUDA common header

* Add plot_layout example and implement GetForwardVars method in layout classes

* Refactor code for improved readability by adjusting line breaks and formatting in layout and test files

* Fix formatting by removing unnecessary line break in layout.h

* Refactor make_int4 function for improved readability by adjusting parameter formatting

* Add legend to plot_layout for improved clarity of thread and local IDs

* Remove unnecessary dependencies from requirements files for cleaner setup

* Remove flash_mha.py and add .gitkeep to deepseek_mla directory

* Add build requirements and update installation scripts for improved setup

* Introduce carver

* Refactor imports and improve code formatting for consistency

* Add unit tests for carver recommendation hints

* lint fix

* Enhance ElementwiseTemplate and BaseTemplate with detailed docstrings for improved code documentation and clarity

* Refactor import statements and clean up whitespace in template files for improved readability

* Add README.md for Carver framework with usage examples and architecture support

* Refactor import statement in matmul_analysis.py for consistency
diff --git a/tilelang/carver/matmul_analysis.py b/tilelang/carver/matmul_analysis.py
@@ -18,7 +18,7 @@
 )
 from tvm.target.target import Target
 from tvm.tir.stmt_functor import pre_order_visit
-from bitblas.base.arch import get_arch, is_tensorcore_supported_precision
+from .arch import get_arch, is_tensorcore_supported_precision
 import logging
 
 logger = logging.getLogger(__name__)

Original file line number	Diff line number	Diff line change
`@@ -18,7 +18,7 @@`
`18`	`18`	`)`
`19`	`19`	`from tvm.target.target import Target`
`20`	`20`	`from tvm.tir.stmt_functor import pre_order_visit`
`21`		`-from bitblas.base.arch import get_arch, is_tensorcore_supported_precision`
	`21`	`+from .arch import get_arch, is_tensorcore_supported_precision`
`22`	`22`	`import logging`
`23`	`23`
`24`	`24`	`logger = logging.getLogger(__name__)`