-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPTQ Collaboration? #75
Comments
Hey Dan, nice to hear from you. i had a couple of questions for you also regarding kernels. - |
@Wingie llama.cpp has supported (4bit) GPTQ inference for 4 days now. There is a script in that repo called
GPTQ is indeed better than RtN even in pure CPU implementations.
With the latest optimizations to GPTQ, 13B 3bit is superior to 7B 4bit and 30B 3bit is superior to 13B 4bit, etc. So you will likely want to optimize for the maximum amount of parameters you can fit in the RAM/VRAM you have. If you have memory to spare then more bits may produce marginally better results at the same parameter count. |
https://github.com/IST-DASLab/gptq is the repository mentioned in the OP, for anyone who comes across this thread. |
@dalistarh this is just a gardening thing, but I submitted a PR to this repo to make it pip-installable. I briefly browsed your repo, and think it should more or less just work for your repo as well, if you want to borrow it (or maybe @qwopqwop200 will upstream their repo to yours.) Just FYI! |
Dear Qwopqwop200,
I'm writing on behalf of the authors of the GPTQ paper. We have been following your excellent work, and wanted to mention that we added a few updates to our repository yesterday, which may be interesting to you:
In case you would be interested in collaborating more closely with us, please feel free to write us at dan.alistarh@ist.ac.at / elias.frantar@ist.ac.at
Best regards,
Dan
The text was updated successfully, but these errors were encountered: