Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] W4A8 gemm surpport. #1316

Closed
Ageliss opened this issue Jan 23, 2024 · 3 comments
Closed

[FEA] W4A8 gemm surpport. #1316

Ageliss opened this issue Jan 23, 2024 · 3 comments
Labels

Comments

@Ageliss
Copy link

Ageliss commented Jan 23, 2024

It seems we had INT4.GEMM and INT8.GEMM, even FP8.GEMM, but actually activations in LLMs are hard to be compressed. We sometime use W4A8 in the gemm, however, we first uncompress the two-INT4 stored in one INT8 and carry out the INT8.GEMM. This is ugly and very slow due to the memory reading and writing for INT4 decompress to INT8 process.

[FEA]: I ask for the surpport for INT4 * INT8 and output INT32/8.

W-A-KV bits:
image
figure is citing from the LLM.QAT paper.

@Ageliss Ageliss added ? - Needs Triage feature request New feature or request labels Jan 23, 2024
@hwu36
Copy link
Collaborator

hwu36 commented Jan 23, 2024

several sources:

w4a8 is already supported on hopper. check 3.3 and 3.4 changelog

on ampere, #1190 adds int4 support (@alexsamardzic ). FastTransformer also has it which requires a special layout (#911 @rhenry-nv). there is also a paper (https://arxiv.org/abs/2311.09550) about a w4a8 product solution.

@Ageliss
Copy link
Author

Ageliss commented Jan 24, 2024

several sources:

w4a8 is already supported on hopper. check 3.3 and 3.4 changelog

on ampere, #1190 adds int4 support (@alexsamardzic ). FastTransformer also has it which requires a special layout (#911 @rhenry-nv). there is also a paper (https://arxiv.org/abs/2311.09550) about a w4a8 product solution.

Thanks very much for your reply.

@alexsamardzic
Copy link
Contributor

alexsamardzic commented Mar 19, 2024

I'm now working on this feature - #1413 is created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants