-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv model size is twice large then llama.cpp/main #66
Comments
Probably related to #49. Same slow is also observed. |
@hengjiUSTC those outputs look slightly different, do you mind cloning the repo and comparing against the version of llama.cpp that's pinned here? It should be fairly recent (just updated earlier today) |
I updated both llama.cpp repo and vendor/llama.cpp to same version (newest) and rebuild cpp code. Still kv self size is different. Link to model https://huggingface.co/Pi3141/gpt4-x-alpaca-native-13B-ggml/tree/main, and also happens for another model https://huggingface.co/Pi3141/alpaca-native-13B-ggml/tree/main. I think it isn't caused by version mismatch of llama.cpp code.
And on llama.cpp side:
|
Changed a computer and using model https://huggingface.co/Pi3141/alpaca-native-13B-ggml/tree/main, result still the same.
llama.cpp
Result is same even I directly use build under vender/llama.cpp . |
The reason for the larger kv size is because |
Thank you, I think this default changed since I originally implemented the parameters for the Llama class, I'll fix this. |
I use following code to lode model
model, tokenizer = LlamaCppModel.from_pretrained(MODEL_PATH)
and got this printbut when loading using llama.cpp/main
The kv self size is different.
The text was updated successfully, but these errors were encountered: