-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Command R+ outputs gibberish when used with text-generation-webui #6596
Comments
Koboldcpp has been running it fine so far, but in our case the default settings have shown the model is very sensitive to repetition penalty. So make sure that is low enough. Considering our Command-R implementation is inhereted from Llamacpp i'd assume this is not an issue related to llamacpp. |
I've turned repetition penalty off, so that's not the issue. I can make a issue on text-generation-webui instead if that would be a better place. |
As an example, with IQ3_M, default settings (i..e no repetition penatly) and a prompt of "Hi,", I get this output: Hi, I’ in the 10th grade, and I was just wondering, is it a good idea to take 3 APs(AP US, AP Calc, AP Physics) as a 10th grader, or should I just take 2, and save 1 for 11th, so I can take 2 in 11th, and 2 in 12th? I am currently a 10th-11th grader, and I would say that it depends on your 1) schedule, 2) ability to manage time, 3) (your) school's/s's) course(s) and 4) how many you plan to take in 11th and 12th. (1) If you have a busy schedule, and you are not sure whether you will have time to study, I would not recommend it. (2) If you are not able to manage your time, you will be in a mess. (3) If the 3 APs are in your 2-3 "best" subjects, then you should be good. (4) If you plan to take 1-2, 1-2, , 1-2, or -1, 2-2, 2-1, 2-2, -1, 2-2, 2-3, , 2-1, 3-2, 3-3, , 2-2, 3-3, 3-4, 3-4, , 2-3, 3-4, 3-5, 5-5, 5-5, 4-4, 5-4, 5-5, 5-5, 4-4, 4-4, 3-3, 3-2, 3-1, 3-2, 3-3, 3-4, 3-4, 3-3, 4-3, 4-3, 4-3, 4-3, 4-3, 3-3, 3-3, -3, 3-3, -3, 3-3, 3-3, -3, 3-3, 3-3, 3-3, 3-3, 3-3, 3-3, 3-3, 3-3, 3-3, 3 all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all all 28 5 0 0 0 1 2 4 2 円 1 3 5 0 0 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 renditrenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditenditendit 4 0 0 renditrenditenditenditenditenditenditenditenditendit 4 0 0 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 renditrenditendit 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 rendit 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 1 1 1 2 4 0 0 1 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 0 1 1 1 1 2 4 0 |
I think the best way to figure out if It's probably related to Cmd R+ prompt template instead of sampling. If using text-generation-webui, then maybe make an issue there. |
This is very similar to what I've been seeing, where it starts off with a semi coherent sentence and then just devolves into rambling. The exl2 quants don't do this, they remain coherent. |
😂that remind me how mpt 7b inference worked in very beginning, turns out there was something wrong. In mpt case it was the wrong ne[] calculation Didn't look inside but I guess it's almost the same. |
Can you run server using openai calls to the endpoint? I'm trying that but not sure how to implement the template, it's clear sending through the traditional 'role':'user' doesn't work, and it fails to follow instructions. |
Sure, once it's implemented: #6650 It needs to go through the test process to eliminate errors. |
Thanks, I found that right after I commented. I pulled and built that PR, at first I thought it was broken but then discovered I couldn't run command-r (IQ_4_XS) on my 4090 with my normal 8k context. It works fine at 2k. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Not sure if this is the appropriate place to file this issue, but I don't have much idea what's going wrong and the exllamav2 quants of Command R+ work fine with text-generation-webui after updating to the latest exllamav2.
Manually installing and building llama-cpp-python with the latest llama.cpp allows text-generation-webui to load gguf quants of Command R+, but the output is always gibberish regardless of what sampler settings are used. From my understanding llama-cpp-python is just bindings for llama.cpp so I'm not sure if the issue could be there. The log output from text-generation-webui looks totally normal when loading and running inference.
I've tried the
q5_k_s
quant from here: https://huggingface.co/dranger003/c4ai-command-r-plus-iMat.GGUF/tree/mainAs well as the
q5_k_m
quant from here: https://huggingface.co/mradermacher/c4ai-command-r-plus-GGUF/tree/main.Both behave in the same way. The output seems somewhat related to the prompt and sometimes contains coherent sentences, but there is clearly something very wrong. Even when turning the temperature way down and raising
min_p
the output is for the most part nonsense. I can use a wide range of sampler settings with exllamav2 and get good results.The text was updated successfully, but these errors were encountered: