-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
main : restore old EOS behavior in interactive mode #2689
Conversation
Previously, the anti prompt was also injected after a |
This was introduced in #333, there was some discussion at the time about the rationale. |
Thanks for the quick update. I'm testing now. |
I'm making sure to be thorough in testing, so far working as expected, even without e3da126 I'm now testing with it. ❤️ The structure is working, but a testing shows Here's my format: The content of Vic.txt is:
I found Assistant is too chatty compared to It's unusual. After Assistant responds, then I ask "Tell me a short story", and other than the inital chatter, it works. So far, I liked the inital implementation best, but I understand if e3da126 is neccesary. |
For me the test that was causing the breakage is now fixed: Properly returns:
So for me you definitely nailed it! There is still this other problem we discussed in #2646 which is that the reverse-prompt sometimes appears in lowercase and is not matched, e.g. with the initial example I gave using llama-2-13b: It gives a chatty session because "Usr" became "usr". But that was already present before and is totally independent of the EOS/BOS issue:
So for me the regression caused by #2304 is fixed and we still have to figure how to deal with the other reverse-prompt issues separately. Thank you! |
If we ignore EOS, then I can expect the assistant to get very chatty because there is nothing to break it's turn. Can you test the following new commit without using |
I agree, I suspect |
With your latest stdout patch I'm not seeing any difference (i.e. everything's still OK for me). Also when trying to figure how to address the reverse-prompt case issue, I re-discovered that we actually accept multiple reverse-prompts, so using It looks good, I hadn't seen llama2 work that well till now, with lines properly broken ;-)
|
Okay, here's a problem(without
Assistant is cut off in both examples without Edit: to clarify, prior to e3da126, I tried Vicuna without |
Could this mean that some spurious EOS are injected in the response stream ? |
I see that Vicuna uses |
I'm open to suggestions!
I will test now, thank you. |
With I'm getting mixed results. Sometimes Assistant can finish a short story, sometimes it's cut:
2/4 tests ended abruptly. |
The reverse prompt was also injected prior to #2304 Does this model and tests work if you checkout before #2304 ? |
I can now reproduce it, it only happens with |
I used exactly this: And it gave me:
|
Actually, no.
With https://github.com/ggerganov/llama.cpp/releases/tag/master-07aaa0f (prior to #2304), |
I'll try without 2/3 tests failed: @wtarreau README explains |
Ok, it seems that We have to add functionality that prints detailed debug information to a log file, so that we can understand what the model is actually generating and what we give it as input during the generation process. Otherwise, there is too much guessing work and I feel like we are dealing with multiple bugs at once Will focus on this after we finish with the GGUF work. Here is the issue for this: #2694 |
I agree. Thanks for the direction as we figure this out. |
@ggerganov here's an example Here's the section: 'He':3868, ' tra':1020, 've':345, 'led':839, ' for':363, ' days':3841, ',':29892, ' bra':4105, 'ving':1747, '':2, ' User':4911, ':':29901, ' D':360, 'ear':799, ' Ass':4007, 'istant':22137, ',':29892, ' why':2020, ''':29915, 'd':29881, ' you':366, ' stop':5040, '?':29973, '':13, 'Ass':7900, 'istant':22137, ':':29901, ' Ap':6225, 'ologies':11763, ' for':363, ' the':278, ' ab':633, 'rupt':6685, ' ending':17140, '!':29991, '
Here's another example that failed is the exact same spot as vicCut3: vicCut4.log Edit: In case it's unclear, this is master |
Please do always think about trying some random initial seed values ( |
I'm thinking, brother. To understand you better, would me using |
Yes that's the idea. If you use, say, |
Regarding WizardMath, I removed @wtarreau Thank you. Good thing for me that the logs include the SEED! 😊 |
while the correct thing to "see" is:
Finally, when the generation stops seemingly abruptly after
The user prompt is not displayed due to Bug 1 and it looks confusing, so technically this worked as expected, given that we have those bugs. |
It turned out that the stoppage is technically expected - it appeared surprising because I noticed others experienced had issue with EOS in Vicuna 1.1. It's unclear if it's resolved with 1.5. @wtarreau accurately questioned EOS last week. I wonder who knows for sure if EOS should remain in context? Thanks for helping identify these issues, it's impressive what the logs reveal. |
In llama2-chat the EOS token is supposed to stay in the context, so there are at least some models in which it should stay. The initial implementation that removed the EOS in interactive was designed to work with base llama1, which will generally ignore everything before the EOS, but there are a lot more models now. |
It's going to depend a bit on the model. I've tried a bunch of them and some work well with them while others don't. The biggest issue I have is that if the model expects the EOS tokens, I can't just copy and paste a conversation into a new instance to pick up where I left off because it's first EOS will confuse it, making it think one side of the conversation justed ended even though it contained both sides. This also seems to make it more easily confused when your initial prompt is missing EOS tokens where they'd normally be, like in the example above:
There are some example chat scripts in the repo with much larger intros, so it's worse in those situations. I'm using a local patch that removes them unconditionally. (Although I don't use the official llama2-chat model.) I'm not sure what the best solution is, perhaps just a command line switch to remove EOS tokens. It may also be useful to make an escape character like "/s" that's converted into EOS tokens since there's no real way to inject them into a prompt right now. |
If I'm understanding then you want to be able to copy/paste a dialog to your system, then duplicate the results (assuming all else equal, Somehow (admittedly, I don't understand exactly how), the first EOS is devisive thus producing undesirable results.
Yeah, playing with ggerganov explained there's an issue with SPM tokenizer and For my usage, usually question-answer type dialog in It's nice to dicern 100% for-sure that |
I'll close this PR as we now have a better understanding of what needs to be done and the patch here is not really helpful Proper solution will come in another PR - contributions welcome |
When the new
--input-prefix-bos
option is disabled, we treat EOS during interactive mode as "new line".This should match the behavior prior to #2304 while preserve the new functionality of keeping EOS when the flag is specified.
cc @JackJollimore, @wtarreau
I don't have a good way to test this - just tried restoring the old behavior. Hope it works for the models that you tested