-
Notifications
You must be signed in to change notification settings - Fork 333
[Carver] Introduce a tile-structure based cost model for auto tuning #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ate CUDA type printing for better clarity
…consistency and clarity; add apply_simplify function to simplify PrimFunc or IRModule.
…remove unnecessary whitespace in multiple files.
…to use 'tilelang.language' for consistency
…_tilelang_kernel_gemm_mma_intrinsic.py
…r result validation in test_tilelang_kernel_gemm_mma_intrinsic.py
…at Peking University
…riting high-performance kernels with thread primitives
…ormatting in layout and test files
…ameter formatting
… for improved code documentation and clarity
…for improved readability
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Carver: A Tile-Structure Based Hint Recommend Framework for Machine Learning Compilers
Carver is a lightweight framework for generating and ranking tile configurations (also known as tiling strategies, blocking schemes, or scheduling hints) for common GPU, CPU, and accelerator backends. It helps you explore efficient mappings of loops for operations such as matrix multiplication, elementwise transforms, and other reduction-oriented kernels.
Carver combines hardware architecture information, user-defined tile structures, and built-in heuristics to recommend tiling strategies (or "hints"). The recommended hints are easily adaptable to multiple backends, including TVM, triton, tilelang (or other domain-specific compilers).
Key Features
smem_cap, warp size, CPU cache structure, etc.) when generating hints.MatmulTemplate,GeneralReductionTemplate,ElementwiseTemplate) let you concisely specify kernel structures.Usage Examples
Basic Usage: General Reduction Template
Once installed tilelang, you can import Carver and start creating templates:
Example Output (truncated):
{ 'block': [1, 128], 'thread': [1, 128], 'rstep': [64], ... }, { 'block': [2, 64], 'thread': [2, 64], 'rstep': [64], ... }, ... { 'block': [1, 16], 'thread': [1, 16], 'rstep': [512], 'reduce_thread': [8], ... }A tile structure composed of S and R can simulate various cases. For example, structure
SSrepresents a 2D element-wise operation, whileSSRcan represent a general matrix multiplication.We can specialize more advanced templates to provide finer-grained information, such as
MatmulTemplate.Matmul Template
Carver also provides a specialized
MatmulTemplatefor matrix multiplication (e.g.,C = A * B), automatically inferring common tiling strategies (thread blocks, warps, use of tensor cores, etc.).Example Output:
{ 'block': [32, 64], 'warp': [16, 32], 'rstep': [128], 'use_tc': True, ... }, { 'block': [64, 32], 'warp': [32, 16], 'rstep': [128], 'use_tc': True, ... }, ... { 'block': [256, 32], 'warp': [128, 16], 'rstep': [32], 'use_tc': True, ... }Supported Architectures
Carver currently provides out-of-the-box support for:
arch = CUDA("nvidia/geforce-rtx-4090")Adding a new architecture is as simple as implementing a new subclass of
TileDeviceor providing a custom target that describes:Below is an illustrative snippet of the CUDA backend:
Adapting Hints to Other Compilers
One of Carver’s main benefits is its adaptability. Here are a examples for triton lang:
Given a Carver hint like:
{ 'block': [32, 64], 'warp': [16, 32], 'rstep': [128], 'use_tc': True, 'vectorize': {'A_reindex': 8, 'B_reindex': 8} }You might interpret this in Triton as:
block_m = 32, block_n = 64, block_k = 128warp_m = 16, warp_n = 32vectorize: load data with a vector width of 8use_tcis true, consider using Tensor Cores (TensorOps in Triton) if supported.This helps quickly test multiple configurations without manually guessing.
Supported Templates
Carver abstracts common loop patterns through templates:
GeneralReductionTemplate: For generalSpatial-Spatial-Reduce(SSR) structures or similar.MatmulTemplate: For standard matrix multiplicationC = A * B.GEMVTemplate: Fory = Axory = xAstyle operations.ElementwiseTemplate: For elementwise transformations or pointwise ops.You can also create your own specialized templates if you have unique loop structures or constraints. For instance, you might define specialized templates for convolution, flash attention, etc.
TODO Items