[FEA] W4A8 gemm surpport. #1316

Ageliss · 2024-01-23T08:35:31Z

It seems we had INT4.GEMM and INT8.GEMM, even FP8.GEMM, but actually activations in LLMs are hard to be compressed. We sometime use W4A8 in the gemm, however, we first uncompress the two-INT4 stored in one INT8 and carry out the INT8.GEMM. This is ugly and very slow due to the memory reading and writing for INT4 decompress to INT8 process.

[FEA]: I ask for the surpport for INT4 * INT8 and output INT32/8.

W-A-KV bits:

figure is citing from the LLM.QAT paper.

hwu36 · 2024-01-23T18:26:34Z

several sources:

w4a8 is already supported on hopper. check 3.3 and 3.4 changelog

on ampere, #1190 adds int4 support (@alexsamardzic ). FastTransformer also has it which requires a special layout (#911 @rhenry-nv). there is also a paper (https://arxiv.org/abs/2311.09550) about a w4a8 product solution.

Ageliss · 2024-01-24T03:20:36Z

several sources:

w4a8 is already supported on hopper. check 3.3 and 3.4 changelog

on ampere, #1190 adds int4 support (@alexsamardzic ). FastTransformer also has it which requires a special layout (#911 @rhenry-nv). there is also a paper (https://arxiv.org/abs/2311.09550) about a w4a8 product solution.

Thanks very much for your reply.

alexsamardzic · 2024-03-19T15:00:15Z

I'm now working on this feature - #1413 is created.

Ageliss added ? - Needs Triage feature request New feature or request labels Jan 23, 2024

mnicely closed this as completed Jan 31, 2024

jianfei-wangg mentioned this issue Feb 5, 2024

[BUG] w4a8 mixed-input gemm for fine-grained quantization #1332

Closed

supriyar mentioned this issue Mar 18, 2024

[New Feature] CUTLASS kernels for w4a8 quantization pytorch/ao#64

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] W4A8 gemm surpport. #1316

[FEA] W4A8 gemm surpport. #1316

Ageliss commented Jan 23, 2024 •

edited

Loading

hwu36 commented Jan 23, 2024

Ageliss commented Jan 24, 2024

alexsamardzic commented Mar 19, 2024 •

edited

Loading

[FEA] W4A8 gemm surpport. #1316

[FEA] W4A8 gemm surpport. #1316

Comments

Ageliss commented Jan 23, 2024 • edited Loading

hwu36 commented Jan 23, 2024

Ageliss commented Jan 24, 2024

alexsamardzic commented Mar 19, 2024 • edited Loading

Ageliss commented Jan 23, 2024 •

edited

Loading

alexsamardzic commented Mar 19, 2024 •

edited

Loading