You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to implement a GEMM with FP16 and INT4. I hope to call the fpA_intB_gemm_fp16_int4 kernel located in FasterTransformer/src/fastertransformer/kernels/cutlass_kernels/fpA_intB_gemm, but I see that the examples are all implementations for model inference. If I only want to reproduce the GEMM kernel, what should I do?
The text was updated successfully, but these errors were encountered:
I am trying to implement a GEMM with FP16 and INT4. I hope to call the fpA_intB_gemm_fp16_int4 kernel located in FasterTransformer/src/fastertransformer/kernels/cutlass_kernels/fpA_intB_gemm, but I see that the examples are all implementations for model inference. If I only want to reproduce the GEMM kernel, what should I do?
The text was updated successfully, but these errors were encountered: