-
-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] After updating to exllamav2-0.1.9 (from 0.1.8) cannot load Mistral Large 2 123B with a draft model #177
Comments
I also experience this issue as well and have been able to reproduce it. |
Yep, can also confirm the problem. After upgrade, can't load big models with draft models anymore. True both for Mistral Large + Mistral 7B, and also true for Llama 70B + Llama 8B. The same issue with fine-tuned large models. The bug affects many large models, both Llama and Mistral. This is what I get when trying to load Llama 70B with 8B draft model:
After hitting the error, cannot load Llama 70B, removing the draft model does not help. But I can still load 8B Llama after this error. Thanks for sharing workaround, downgrading to ecaddec and running ./update_scripts/update_deps.sh helped. Hopefully this will get fixed soon, I wanted to try the tensor parallel features in the new version (maybe these new features unintentionally caused the issue, but this is just a guess, I am not familiar enough with the code to debug this). |
I believe this is a synchronization issue with streams and the custom safetensors loader. Do you have |
Switched to dev branch and pulled the update and built from source. After building, loaded with fasttensors: true and got the following error: But, with fasttensors: false I am able to successfully load both the model and draft model now. |
Did you include the latest commit, 7e15947 ? |
Okay, I pushed another commit which might help |
Just built off the most recent commit and everything seems to work as expected now. |
I cloned+build last night. I can load Mistral-large with the draft model, but it outputs gibberish. Not an issue if I disable the draft model, which I've done, since 22t/s is plenty. |
I updated tabbyAPI, and in exllamav2 folder (using the dev branch) I ran the following commands to install it within TabbyAPI's venv:
Now I was able successfully load Mistral-Large-Instruct-2407-123B-5.0bpw-exl2 + Mistral-7B-Instruct-v0.3-exl2-3.5bpw as a draft model. So far, it seems to work correctly (even with Fast Tensors enabled). I also tested with 4bpw version of Mistral Large 2. In addition to that, I tested with Llama-3.1-70B-Instruct-6.0bpw-h6-exl2 + Llama-3.1-8B-Instruct-3.0bpw-exl2 as a draft model, and that worked too. @turboderp, thank you very much for fixing this bug. |
OS
Linux
GPU Library
CUDA 12.x
Python version
3.12
Describe the bug
I updated TabbyAPI recently, and each time I try to load a large model, I get the following error. I tried loading Mistral Large 2 with Mistral v0.3 7B 3.5bpw as the draft model, a got the following error:
Then I tried without the draft model, and now the error is different:
Strangely enough, if I load Llama 8B, unload and try to load Mistral Large 2 on its own, it loads successfully.
Reproduction steps
Try to load Mistral Large 2 4bpw ( https://huggingface.co/LoneStriker/Mistral-Large-Instruct-2407-4.0bpw-h6-exl2/tree/main ) with https://huggingface.co/bartowski/Mistral-7B-Instruct-v0.3-exl2/tree/3_5 as the draft model.
Get an error (RuntimeError: q_proj is wrong shape)
Try to load Mistral Large 2 on its own after the first failure, but this time without the draft model.
Get another error (ZeroDivisionError: integer division or modulo by zero)
The error will not go away, even if I try again to load Mistral Large 2 on its own. Only loading and unloading a small model (like Mistral v0.3 7B or Llama 3.1 8B), and then trying to load the large model again will help. Unloading and loading again the large model on its own does not trigger the bug. Only attempt to load with a draft model will trigger it.
In case it matters, I am loading the model via SillyTavern Extension ( https://github.com/theroyallab/ST-tabbyAPI-loader ).
Expected behavior
Both the main and draft models load without errors, instead, getting errors when loading the main model.
Logs
No response
Additional context
Reverting to ecaddec and running ./update_scripts/update_deps.sh (causing downgrade to exllamav2-0.1.8) allows to load both the main and the draft model without issues.
Acknowledgements
The text was updated successfully, but these errors were encountered: