-
Notifications
You must be signed in to change notification settings - Fork 11.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the --ignore-eos flag redundant? #309
Comments
Hm. The difference in use is the |
When I made the PR for --ignore-eos the code that ignores eos in interactive mode wasn't added yet. However I think that my solution is better because it avoids sampling eos at all in the first place, otherwise the eos is going to end in the context and that may make the LLM do weird things. But it's just an assumption. |
I can confirm that. I actually tried that approach before the flag got added. It doesn't work very well, because the LLM will basically go off the rails or start a completely new response that's likely not even related to the initial request. Also, in my opinion as a random person on the internet one can just not pass the flag for ignoring EOS if that's the behavior the user desires. So why go through the trouble of adding special code to disable it even when explicitly specified? |
I think we should address this. Perhaps the correct approach is to default to ignoring EOS in interactive mode instead of using it to switch to user input mode. Meaning --ignore-eos is on by default on interactive mode and in non-interactive mode it’s off by default. What do you think? Without EOS, the model has no way to cut short its generation, am I mistaken? In that case a reverse prompt would be required or there would be no possibility of interaction. EDIT: Upon further testing I’m seeing --ignore-eos, even with a reverse prompt, seems to send the model into generating endless nonsensical output. |
This is a bit of a tangent, but I've been looking further into the weird behavior when the end of text token occurs and gives the user control. (without the use of
Here are two (edit: three, I got a real good one showcasing the default behavior currently) excerpts from a reverse prompt dialogue WITHOUT this addition, the current behavior when an end of text token is reached. (edited in the [end of text] parts for clarity to indicate when it gave me back control)
A bit of second hand embarrassment as it randomly started going on about the anime, Naruto and fan-fiction..
Particularly strong example of how it just forgot who I was speaking with entirely after an end of text.
And here two small excerpts WITH the above change when the end of text token is thrown.
I've tested this over the past day, and it seems pretty apparent that without I would make this a PR myself but I'm really not certain on it and don't want to introduce any unintended or bad behavior. But the change seems to fix the weird end of text behavior I get regularly when not stripping out the EOS token altogether with |
@rabidcopy That was very thorough. Thank you! Unfortunately, I'm not very knowledgeable myself (I don't even know what token 13 is) so I don't know why your examples work out the way they do either. What is token 13? |
No idea. It was a snippet I saw floating around posted anonymously. Going out on a limb it somehow keeps the context on track or "restores" a state after an end of text is reached? Edit: I'll probably make a PR for this later and see if someone more knowledgeable can sign off on it. Though I don't see the harm as it only effects end of text behavior in interactive mode and the former behavior doesn't seem particularly ideal. |
Token 13 is a newline. Just for example (and I know there's a typo in the prompt, doesn't matter for this example :)
Token ID 2 is the end of document marker and token ID 1 is the start of document marker. You can see the prompt gets generated with a SOD at the beginning.If the LLM generates that, it also could cause weird stuff (I heard someone else mentioning this can happen, but I haven't see in). Note, this is just based on the current tokenizer behavior and models I've tried. I think it's the same for all llama and alpaca models but I am far from an expert. |
@rabidcopy If 13 is a newline, it makes sense that it would help smooth out the behavior when the model outputs an EOS but we essentially are telling it "I don't care that you are finished, we'll just keep going". 😏 What you are proposing is adding a newline (token 13) where there would normally be an EOS in interactive mode, right? That way you wouldn't need to use We could then remove this flag (or maybe it has other uses so we could keep it), and allow the model to generate EOS as a way for us to know that we need to go to interaction mode and add a newline instead. Do you think this could work well? |
I still find it useful outside of interactive mode to force the model to generate longer text, even if sometimes it may cause it to go off-rails. For example if using as the prompt the beginning of a story to get the LLM to finish the story, sometimes it will just generate an eos after a paragraph or two, and this can be prevented with --ignore-eos. |
I concur, there's scenarios where I think some users may prefer it to generate endlessly without being given back control unless they interject with CTRL+C. |
Thank you for making this @rabidcopy. I've actually encountered this issue before and I was totally perplexed by it. I thought it was some issue with my prompts. I'm going to merge this fix into my experimental branch right away. |
Fix CJK and emoji stream output
As per https://github.com/ggerganov/llama.cpp/blob/da5303c1ea68aa19db829c634f1e10d08d409680/main.cpp#L1066 the EOS flag in interactive mode simply causes
is_interacting
to switch on, and so it serves as a way to end the current series of tokens and wait for user input. Is there any reason to actually avoid sampling it in the first place then?The text was updated successfully, but these errors were encountered: