[BYOC] Add GEMM kernel from FasterTransformer as submodule #15046
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I extracted fp16 A - int8/4 GEMM kernel from FasterTransformer (see NVIDIA/cutlass#911) to make it easier to build and integrate into TVM. The code has been extracted and cleaned in the repo under
tlc-pack
and it is being added as a submodule.A follow-up PR will update the CUTLASS BYOC to support offloading to this kernel. It is going to be useful for weight-quantized LLM inference.
Please review the license stuff etc @tqchen @junrushao @vinx13 @sunggg