-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: '--in-prefix STRING' option #426
Conversation
Prefix user inputs with a string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally i don't see much value in this change.
for your case specifically you could just add the space to the reverse prompt Bob:
-> Bob:
reverse prompt with extra space seems to not work for me atleast, llama.cpp goes on as if there was no reverse prompt in that case |
that sounds like a bug |
it's because the reverse prompt only test for last output. "user:" and " " are two different tokens, so it doesn't work. idk if it should be changed though. in any case there is different value in this, you would not want to use |
actually i think, it is because the space is part of the next token, so there is no tailing space to catch... |
True, the space after "user:" can be either a token of its' own or part of the next token. The reverse prompt code should be fixed to check more than the last output so that it can match even when the reverse prompt spans multiple tokens. Also noticed another issue with it: main.cpp#L435 that the antiprompt functions should be wrapped in a antiprompt.empty() check as currently that function runs even if reverse prompt is not used. Anyway we are getting derailed here since the point I'm trying to make here is that this functionality is not related to reverse prompt, it was just an usage example. This is simply preinjecting any text to each user input which can be used to build various new interactions. It can be used with or without reverse prompts. |
@anzz1 The reverse prompt can span multiple tokens. However, there is no way for it to interrupt generation mid-token. (That's why I opted to use token vectors rather than strings for reverse prompts when I first wrote interactive mode) Therefore, the best thing you could hope for is that if, say, the generation outputs tokens amounting to The PR's idea seems like the best we can do to me if we want to simultaneously (1) always correctly detect when reverse prompts of form "Name:" are emitted, (2) not force the user to have to enter the space after that manually, (3) not have the reverse prompt be followed by one model-imposed word like the "Hello" in the above example and (4) don't want to implement generation rollback. |
Thanks for putting into words how it works better than I could. Yes, implementing rollback doesn't pass cost-benefit analysis. However, it might be a good idea to put in the backlog of scanning the text output between last interaction and now (not only between last token and now) after generating a token to scan whether the reverse prompt was found as the computation required is insignificant. So like you said, a
My communication on what this aims to achieve was less than stellar. This is exactly what I was going for, I just couldn't put it into words properly. Can add whatever to the output with basically zero cost. In the future with sliding context window (and infinite generation that can come with it), it could be great for testing things like "Please continue" and mashing enter without having to type it out. |
should be "pretty cheap". you just need to track the tokenindex for each char. |
--in-prefix STRING
command line option prefixes user inputs withSTRING
For example, chatting with bob:
./main -m ./models/llama-13B-ggml/ggml-model-q4_0.bin -n 256 --repeat_penalty 1.0 -f ./prompts/chat-with-bob.txt -i -r "User:" --in-prefix " "
adds a space after the reverse prompt "User:"
So instead of
its
and matches the original prompt better.
It could be useful for other prompts too, alignment or maybe testing multiple similar questions like "What do you think about X" or whatever.