-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama Ignoring Reverse Prompt Every Other Time #1224
Comments
This is the biggest problem right now with llama.cpp. Maybe it's not capable of recognizing the prompt when it arrives in disjoint tokens? |
Happens to me quite often, although I know some people who almost never experience this. |
@loukylor Do you experience this issue only when using the |
Sorry I should've clarified this in my issue, but no, I experience it while using and not using it. In the example on my issue where I don't use the argument, the only input I gave was |
Are you using Command Prompt? Can you try some other terminal - I think there is Power Shell or something for Windows |
Yea, I was using command prompt. I just tested on PowerShell as well as a WSL shell and both still have the issue. |
Did you take in consideration that windows end of line is but model generation will go for new line with only lf ? |
Nah, happens on linux too. |
I don't know if it's because I updated my llama.cpp, or that I'm now testing with Mirostat v2, but I haven't had this problem lately. I can now have very long conversations with the LLM without it filling in my side of the conversation for me. I just added |
I have the same issue and could not fix this, even with Mirostat v2... |
@akumaburn You added a stop parameter that closes the entire program when a stop word is found. |
I made a fix #1297 that works for me personally. |
Make sure to put |
The fix by @newTomas works for me as well. Thanks a lot! |
Did I put correctly? Haven't done pull requests before. |
You have to put in in the description, not the title. (And I think it has to be |
* fix reverse prompt and multi line * Code Formatting Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
I've recompiled llama.cpp with f647ce0 merged and the issue still seem to be present.
At no point here I had typed in anything. The model just continued with the prompt. |
It only checks for the tokens. so the check for
however, the tokenizer prefers the words with prefixed space, instead of a space + the word.
which is not an exact match. not sure how all the prefix stuff words, i have not looked at the exact code in a while. |
Maybe we need a warning when the reverse prompt ends with a space. |
Or we roll back the tokens. |
ggerganov addressed this in the PR, suggesting it's not as trivial as it sounds. Maybe it would require restructuring too much of the main loop/flow. I think I might be able to make it work but there might be edge cases I'm not thinking of. |
The most naive solution I can think of is tracking the token/ctx-index for each char for a lookup. |
Currently, we print a token immediately as it is generated and AFAIK, you cannot simply erase things you have already printed to Not sure if it is very worth going down this road, but if you can provide a concise implementation - we could probably add it |
Due to the streaming nature of tokens, it would probably would need more than just the last generated token, The buffer would probably need to meet the following criterion:
The buffer can then be printed when:
The printing must be such that if a buffer was flushed due to a space/newline token being encountered, that the space/newline token is also printed out. |
Would it be faster to ignore the ending space.s in reverse prompt check ? ie : "###" + " Human" + ":" |
@kazord It should be trivial to trim the reverse prompt check, but the question is should we actually do that. If i define the reverse prompt as "SomeGuy: ", is it okay if the reverse prompt check is actually checking "SomeGuy:" - then should we also trim tabs/new lines as well? I don't think sanitizing user input should be the responsibility of llama.cpp |
as mention Green-sky, the token generation, the space is special , as it's include in the token (" theword" unlike tab, newline ...) |
@kazord I would be for a warning message, it seems the simplest solution to ensure someone doesn't use a trailing space. |
Excuse me, but is anyone already working on a normal solution to the problem? I think doing some processing before outputting to the console is a good idea and might come in handy somewhere else. For example, to censor some words, secret data. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Generation is expected to stop once reverse prompt is encountered.
Current Behavior
Generation continues until reverse prompt encountered twice.
Environment and Context
Windows 10 version 19045.2728
Intel i7 9700k
Python 3.10.7
Make and g++ install from w64devkit version 1.18.0
Failure Information (for bugs)
Steps to Reproduce
User:
, and the promptchat-with-bob.txt
.For me, it happens to both my 7B and 13B models. I don't have the hardware to test the 32B and 65B models.
Just as reference, this issue started as discussion #1200.
Failure Logs
For context, the only user input was
whats the tallest tower
. The rest is the prompt or generated.Here's what happens without the
--in-prefix
argument. Again, the only user input waswhats the tallest tower
, the rest is generated or the prompt.The text was updated successfully, but these errors were encountered: