Skip to content

Commit 4b44479

Browse files
robertgshaw2-redhatafeldman-nmalexm-redhat
authored
Rs/marlin downstream v0.3.2 (vllm-project#43)
Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+rib-2@users.noreply.github.com> Co-authored-by: alexm <alexm@neuralmagic.com>
1 parent acb8615 commit 4b44479

File tree

15 files changed

+1563
-17
lines changed

15 files changed

+1563
-17
lines changed

csrc/ops.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,15 @@ torch::Tensor awq_dequantize(
8080
int split_k_iters,
8181
int thx,
8282
int thy);
83+
84+
torch::Tensor marlin_gemm(
85+
torch::Tensor &a,
86+
torch::Tensor &b_q_weight,
87+
torch::Tensor &b_scales,
88+
torch::Tensor &workspace,
89+
int64_t size_m,
90+
int64_t size_n,
91+
int64_t size_k);
8392
#endif
8493

8594
void squeezellm_gemm(

csrc/pybind.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
5252
#ifndef USE_ROCM
5353
ops.def("awq_gemm", &awq_gemm, "Quantized GEMM for AWQ");
5454
ops.def("awq_dequantize", &awq_dequantize, "Dequantization for AWQ");
55+
ops.def("marlin_gemm", &marlin_gemm, "Marlin Optimized Quantized GEMM for GPTQ");
5556
#endif
5657
ops.def("gptq_gemm", &gptq_gemm, "Quantized GEMM for GPTQ");
5758
ops.def("gptq_shuffle", &gptq_shuffle, "Post processing for GPTQ");

0 commit comments

Comments
 (0)