Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command R+ outputs gibberish when used with text-generation-webui #6596

Closed
christiandaley opened this issue Apr 10, 2024 · 10 comments
Closed

Comments

@christiandaley
Copy link

Not sure if this is the appropriate place to file this issue, but I don't have much idea what's going wrong and the exllamav2 quants of Command R+ work fine with text-generation-webui after updating to the latest exllamav2.

Manually installing and building llama-cpp-python with the latest llama.cpp allows text-generation-webui to load gguf quants of Command R+, but the output is always gibberish regardless of what sampler settings are used. From my understanding llama-cpp-python is just bindings for llama.cpp so I'm not sure if the issue could be there. The log output from text-generation-webui looks totally normal when loading and running inference.

I've tried the q5_k_s quant from here: https://huggingface.co/dranger003/c4ai-command-r-plus-iMat.GGUF/tree/main

As well as the q5_k_m quant from here: https://huggingface.co/mradermacher/c4ai-command-r-plus-GGUF/tree/main.

Both behave in the same way. The output seems somewhat related to the prompt and sometimes contains coherent sentences, but there is clearly something very wrong. Even when turning the temperature way down and raising min_p the output is for the most part nonsense. I can use a wide range of sampler settings with exllamav2 and get good results.

@henk717
Copy link

henk717 commented Apr 10, 2024

Koboldcpp has been running it fine so far, but in our case the default settings have shown the model is very sensitive to repetition penalty. So make sure that is low enough.

Considering our Command-R implementation is inhereted from Llamacpp i'd assume this is not an issue related to llamacpp.

@christiandaley
Copy link
Author

Koboldcpp has been running it fine so far, but in our case the default settings have shown the model is very sensitive to repetition penalty. So make sure that is low enough.

Considering our Command-R implementation is inhereted from Llamacpp i'd assume this is not an issue related to llamacpp.

I've turned repetition penalty off, so that's not the issue. I can make a issue on text-generation-webui instead if that would be a better place.

@schmorp
Copy link

schmorp commented Apr 11, 2024

As an example, with IQ3_M, default settings (i..e no repetition penatly) and a prompt of "Hi,", I get this output:

Hi, I’ in the 10th grade, and I was just wondering, is it a good idea to take 3 APs(AP US, AP Calc, AP Physics) as a 10th grader, or should I just take 2, and save 1 for 11th, so I can take 2 in 11th, and 2 in 12th?

I am currently a 10th-11th grader, and I would say that it depends on your 1) schedule, 2) ability to manage time, 3) (your) school's/s's) course(s) and 4) how many you plan to take in 11th and 12th.

(1) If you have a busy schedule, and you are not sure whether you will have time to study, I would not recommend it. (2) If you are not able to manage your time, you will be in a mess. (3) If the 3 APs are in your 2-3 "best" subjects, then you should be good. (4) If you plan to take 1-2, 1-2, , 1-2, or -1, 2-2, 2-1, 2-2, -1, 2-2, 2-3, , 2-1, 3-2, 3-3, , 2-2, 3-3, 3-4, 3-4, , 2-3, 3-4, 3-5, 5-5, 5-5, 4-4, 5-4, 5-5, 5-5, 4-4, 4-4, 3-3, 3-2, 3-1, 3-2, 3-3, 3-4, 3-4, 3-3, 4-3, 4-3, 4-3, 4-3, 4-3, 3-3, 3-3, -3, 3-3, -3, 3-3, 3-3, -3, 3-3, 3-3, 3-3, 3-3, 3-3, 3-3, 3-3, 3-3, 3-3, 3 all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all 28 5 0 0 0 1 2 4 2 円 1 3 5 0 0 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 renditrenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditendit 4 0 0 renditrenditenditenditenditenditenditenditenditendit 4 0 0 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 renditrenditendit 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 rendit 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 1 1 1 2 4 0 0 1 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0

@Jeximo
Copy link
Contributor

Jeximo commented Apr 11, 2024

turned repetition penalty off, so that's not the issue. I can make a issue on text-generation-webui instead if that would be a better place.

I think the best way to figure out if llama.cpp is the cause of the issue is to run Cmd R+ with server (without python bindings, or text-generation-webui), and check the output. Here's an example, see this comment: #6551 (comment)

It's probably related to Cmd R+ prompt template instead of sampling. If using text-generation-webui, then maybe make an issue there.

@christiandaley
Copy link
Author

As an example, with IQ3_M, default settings (i..e no repetition penatly) and a prompt of "Hi,", I get this output:

Hi, I’ in the 10th grade, and I was just wondering, is it a good idea to take 3 APs(AP US, AP Calc, AP Physics) as a 10th grader, or should I just take 2, and save 1 for 11th, so I can take 2 in 11th, and 2 in 12th?

I am currently a 10th-11th grader, and I would say that it depends on your 1) schedule, 2) ability to manage time, 3) (your) school's/s's) course(s) and 4) how many you plan to take in 11th and 12th.

(1) If you have a busy schedule, and you are not sure whether you will have time to study, I would not recommend it. (2) If you are not able to manage your time, you will be in a mess. (3) If the 3 APs are in your 2-3 "best" subjects, then you should be good. (4) If you plan to take 1-2, 1-2, , 1-2, or -1, 2-2, 2-1, 2-2, -1, 2-2, 2-3, , 2-1, 3-2, 3-3, , 2-2, 3-3, 3-4, 3-4, , 2-3, 3-4, 3-5, 5-5, 5-5, 4-4, 5-4, 5-5, 5-5, 4-4, 4-4, 3-3, 3-2, 3-1, 3-2, 3-3, 3-4, 3-4, 3-3, 4-3, 4-3, 4-3, 4-3, 4-3, 3-3, 3-3, -3, 3-3, -3, 3-3, 3-3, -3, 3-3, 3-3, 3-3, 3-3, 3-3, 3-3, 3-3, 3-3, 3-3, 3 all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all 28 5 0 0 0 1 2 4 2 円 1 3 5 0 0 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 renditrenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditendit 4 0 0 renditrenditenditenditenditenditenditenditenditendit 4 0 0 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 renditrenditendit 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 rendit 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 1 1 1 2 4 0 0 1 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0

This is very similar to what I've been seeing, where it starts off with a semi coherent sentence and then just devolves into rambling. The exl2 quants don't do this, they remain coherent.

@FNsi
Copy link
Contributor

FNsi commented Apr 12, 2024

😂that remind me how mpt 7b inference worked in very beginning, turns out there was something wrong.

In mpt case it was the wrong ne[] calculation

Didn't look inside but I guess it's almost the same.

@satyaloka93
Copy link

turned repetition penalty off, so that's not the issue. I can make a issue on text-generation-webui instead if that would be a better place.

I think the best way to figure out if llama.cpp is the cause of the issue is to run Cmd R+ with server (without python bindings, or text-generation-webui), and check the output. Here's an example, see this comment: #6551 (comment)

It's probably related to Cmd R+ prompt template instead of sampling. If using text-generation-webui, then maybe make an issue there.

Can you run server using openai calls to the endpoint? I'm trying that but not sure how to implement the template, it's clear sending through the traditional 'role':'user' doesn't work, and it fails to follow instructions.

@Jeximo
Copy link
Contributor

Jeximo commented Apr 13, 2024

Can you run server using openai calls to the endpoint?

Sure, once it's implemented: #6650

It needs to go through the test process to eliminate errors.

@satyaloka93
Copy link

Can you run server using openai calls to the endpoint?

Sure, once it's implemented: #6650

It needs to go through the test process to eliminate errors.

Thanks, I found that right after I commented. I pulled and built that PR, at first I thought it was broken but then discovered I couldn't run command-r (IQ_4_XS) on my 4090 with my normal 8k context. It works fine at 2k.

@github-actions github-actions bot added the stale label May 18, 2024
Copy link
Contributor

github-actions bot commented Jun 2, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Jun 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants