-
-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support to disable exllama for gptq #604
Conversation
The author of the linked issue mentions
after setting it. Is that due to cuda oom? |
Executing the example yaml file in this branch thows an error related to modifying the LlamaConfig object:
Also:
I doubt is a CUDA memory issue, I was executing on a RTX3090 24GB. Wasn't a fluke either, as the error persisted in two different machines. |
@Napuh I updated the fix, lmk if that works. |
Now it's throwing the same error as in #599, Im executing Also script keeps loading after model downloading for about 5 minutes, and no memory is ever allocated on the gpu (monitored manually via |
@Napuh hopefully this most recent commit resolves it. |
#609 should fix the issue for the device check when logging gpu utilization |
* support to disable exllama for gptq * update property instead of item * fix config key
fixes #599
adding
gptq_disable_exllama: true
to the yml config should fix the issue with gptq