Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWQ is not working #1240

Open
4 tasks
endomorphosis opened this issue Aug 11, 2024 · 4 comments
Open
4 tasks

AWQ is not working #1240

endomorphosis opened this issue Aug 11, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@endomorphosis
Copy link

System Info

Transformers fails with the following error, when trying to use AWQ with TGI / neural compression enginer, or optimum habana
ValueError: AWQ is only available on GPU

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

.

Expected behavior

.

@endomorphosis endomorphosis added the bug Something isn't working label Aug 11, 2024
@regisss
Copy link
Collaborator

regisss commented Aug 19, 2024

Is it supposed to work on Gaudi?

@endomorphosis
Copy link
Author

The primary goal is to get llama405b on a single gaudi node

I had read originally that huggingface TGI was supposed to use awq, but i was unable to use any sort of quantization method at all, provided by huggingface quants, including GPTQ, uint4, etc, its just spread amongst different issues.

@regisss
Copy link
Collaborator

regisss commented Aug 20, 2024

I think GPTQ should work on Gaudi no?

@endomorphosis
Copy link
Author

no, neither generating quantized models with the intel neural compressor nor does https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4 work on tgi_gaudi, nor does fp8 work with INC on a single node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants