Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output of quantized Vicuna is so inappropriate that I can't use it #1656

Closed
JerryYao80 opened this issue May 31, 2023 · 5 comments
Closed

Output of quantized Vicuna is so inappropriate that I can't use it #1656

JerryYao80 opened this issue May 31, 2023 · 5 comments
Labels

Comments

@JerryYao80
Copy link

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [√] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [√] I carefully followed the README.md.
  • [√] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [√] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Current Behavior

In my llama.cpp environment, I get the following files under the guide of README:

vicuna-7b-hf => ggml-model-f16.bin => ggml-model-q4_0.bin

When I executed the command:

./main -m /media/ggml-model-q4_0.bin -p "You are a linguistics professor, translate this sentences from Englisth to Chinese: Across the Great Wall, we can reach every corner in the world. Do not repeate the instruction." -n 512

But I got the output:

You are a linguistics professor, translate this sentences from Englisth to Chinese: Across the Great Wall, we can reach every corner in the world. Do not repeate the instruction.
Translate these sentences from English to Chinese:

  1. Crossing the Great Wall, you can reach all the places on earth. Don't repeat the instructions.
  2. We can learn about different cultures at home and abroad by exchanging ideas with other people who are interested in international affairs. Don't repeat the instruction.
  3. Our school has a unique advantage: we have both Chinese students and overseas students, which enables us to interact with each other. Don't repeat the instructions. [end of text]

This is one of my experiment, and the outputs of quantized Vicuna are so inappropriate that I can't do anything,
I'm not sure what's wrong with my quantized model. Does anyone meet it?

Environment and Context

Docker Toolbox 1.13.1
docker client: 1.13.1 os/arch: windows 7 /amd64
docker server:19.03.12 os/arch:ubuntu 22.04 /amd64
CPU type: Intel Core i7 6700 , supported command set: MMX, SSE, SSE2, ......, AVX, AVX2, FMA3, TSX

@cmp-nct
Copy link
Contributor

cmp-nct commented May 31, 2023

I lack experience with that particular model but I do notice that you attempt a complex instruction solving translation using a 7B model.
So even if it is very well instruction tuned, I'd yet have to see a 7B model that can do that type of translation good and follow such a relative complex instruction.
So for your report (which I believe is not that well suited as error report for the project in general) you should of course have shown a full precision example of the expected behavior. Not your wishes, given you complain about quantization.
And secondly, you used 4_0, which is the worst available variant in terms of precision.
So after confirming that 16bit precision works for your purpose you might want to try 4_1 5_x and 8_0 to see how those perform.

@KerfuffleV2
Copy link
Collaborator

KerfuffleV2 commented May 31, 2023

@JerryYao80 You didn't use the correct prompt format for Vicuna models. You also asked it to translate from "Englisth" to Chinese.

我没想批评您的英语水平,希望我的话不会让您不舒服。显然您的英语比我的中文好得多啊!

Because of the way LLMs just complete text, the input makes a huge difference. Typos and grammar mistakes in the prompt, unfortunately will generally cause you to get low quality output. Also not using the prompt format the model expects.

I'd also note that while Vicuna can speak a little Mandarin, that only made up a small part of its training. Even with the best possible prompting, I wouldn't expect the results for translations or generating text to be very good (especially if you're using a 7B model).

@LostRuins
Copy link
Collaborator

Also this really isn't a llamacpp issue unless it's a tokenizer problem. You can confirm whether the input tokens match the vocab.

@ungil
Copy link

ungil commented Jun 2, 2023

Does the following work better?

./main -m /media/ggml-model-q4_0.bin -p "### Human: You are a linguistics professor, translate this sentence from English to Chinese: Across the Great Wall, we can reach every corner in the world.
### Assistant:" -n 512

@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants