-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GGML model showing noticeable quality issues when compared to HF model #2354
Comments
|
@JohannesGaessler 7B output (completely correct!)
7B output (completely correct again!)
I guess I will close this issue now, but thank you very much for your feedback! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I tested a specific LLama2 7B model using llama.cpp and observed noticeable quality issues when comparing it to the LLama2 7B HF model with the original lora applied, as well as when using a HF model merge created by the alpaca-lora export_hf_checkpoint script.
The issues I encountered were primarily related to double lines getting merged into one, and the model's confusion about the lora's format, which resulted in a low-quality of the overall output.
Initially, I was unsure if the problem was due to an error on my part, but after coming across this discussion, I realized that others were facing the same problem when using llama.cpp. This leads me to believe that the issue likely lies with ggml/llama.cpp itself. Consequently, I have decided to open this issue to address the matter.
As a comparison:
Output expected from the 7B model
Output from llama.cpp (try 1)
Command line:
main_cublas.exe -m limarp-llama2-7b.ggmlv3.f16.bin -e -p "<<SYSTEM>>\nJack's Persona: A vampire hunter" -c 4096 -t 5
Output from llama.cpp (try 2, recommended preset from model card)
Command line:
main_cublas.exe -m limarp-llama2-7b.ggmlv3.f16.bin -e -p "<<SYSTEM>>\nJack's Persona: A vampire hunter" -c 4096 -t 5 --temp 0.70 --tfs 0.85 --repeat-penalty 1.10 --top-p 1 --top-k 0 --typical 1
The output can get even worse when you don't prime it with the
X's Persona
.Output from llama.cpp (recommended preset from model card)
Command line:
main_cublas.exe -m limarp-llama2-7b.ggmlv3.f16.bin -e -p "<<SYSTEM>>\n" -c 4096 -t 5 --temp 0.70 --tfs 0.85 --repeat-penalty 1.10 --top-p 1 --top-k 0 --typical 1
The text was updated successfully, but these errors were encountered: