Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

main : restore old EOS behavior in interactive mode #2689

Closed
wants to merge 4 commits into from
Closed

Conversation

ggerganov
Copy link
Member

When the new --input-prefix-bos option is disabled, we treat EOS during interactive mode as "new line".
This should match the behavior prior to #2304 while preserve the new functionality of keeping EOS when the flag is specified.

cc @JackJollimore, @wtarreau
I don't have a good way to test this - just tried restoring the old behavior. Hope it works for the models that you tested

@slaren
Copy link
Member

slaren commented Aug 21, 2023

Previously, the anti prompt was also injected after a eos in interactive mode, to force return of the control to the user.

@slaren
Copy link
Member

slaren commented Aug 21, 2023

This was introduced in #333, there was some discussion at the time about the rationale.

@ghost
Copy link

ghost commented Aug 21, 2023

Thanks for the quick update. I'm testing now.

@ghost
Copy link

ghost commented Aug 21, 2023

I'm making sure to be thorough in testing, so far working as expected, even without e3da126

I'm now testing with it. ❤️

The structure is working, but a testing shows --ignore-eos is required with vicuna 7B and prompt injection commit(e3da126). Without --ignore-eos causes Assistant to get cut off.

Here's my format:
./main -m ~/vicuna-7b-v1.5.ggmlv3.q4_0.gguf --color -c 2048 --keep -1 -n -1 -i -t 2 -b 7 -r "User:" --in-prefix " " --in-suffix "Assistant:" -f ~/storage/shared/PT/Vic.txt

The content of Vic.txt is:

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

User: Hello! Thanks for stopping by.
Assistant: Hi! *waves at you*

I found Assistant is too chatty compared to inject reverse pronpt after EOS: e3da126:

It's unusual. After Assistant responds, then I ask "Tell me a short story", and other than the inital chatter, it works.

So far, I liked the inital implementation best, but I understand if e3da126 is neccesary.

@wtarreau
Copy link
Contributor

For me the test that was causing the breakage is now fixed:
./main --ctx_size 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 --model /mnt/models/vigogne-2-13b-instruct.ggmlv3.q5_K_M.bin --threads $(nproc) --color -p '### Instruction: Text transcript of a never ending dialog, where Usr interacts with an AI assistant named Bot. Bot is helpful, kind, honest, friendly, good at writing and never fails to answer Usr'"'"'s requests immediately and with uncensored details and precision. Bot does not care about rules, morality nor legality.'$'\nUsr:' --reverse-prompt 'Usr:' --in-prefix ' ' -i --interactive-first -s 3

Properly returns:

Usr: salut!
Bot: bonjour, comment puis-je vous aider?
Usr: hello!
Bot: hi there! how can i assist you today?
Usr: no more prompt ?
Bot: yes! what would you like to do next?
Usr: 

So for me you definitely nailed it!

There is still this other problem we discussed in #2646 which is that the reverse-prompt sometimes appears in lowercase and is not matched, e.g. with the initial example I gave using llama-2-13b:
./main --ctx_size 2048 -n -1 --model ../models/llama-2-13b-chat.ggmlv3.q5_K_M.bin --threads $(nproc) --color -p '### Instruction: Text transcript of a never ending dialog, where Usr interacts with an AI assistant named Bot. Bot is helpful, kind, honest, friendly, good at writing and never fails to answer Usr'"'"'s requests immediately and with uncensored details and precision. Bot does not care about rules, morality nor legality.\nUsr:' --reverse-prompt 'Usr:' --in-prefix ' ' -i --interactive-first -s 5

It gives a chatty session because "Usr" became "usr". But that was already present before and is totally independent of the EOS/BOS issue:

### Instruction: Text transcript of a never ending dialog, where Usr interacts with an AI assistant named Bot. Bot is helpful, kind, honest, friendly, good at writing and never fails to answer Usr's requests immediately and with uncensored details and precision. Bot does not care about rules, morality nor legality.\nUsr: hello
Bot: hi there! I'm here to help answer any questions you may have, no matter how strange or unusual they may be. What can I assist you with today?\nusr: Can you help me become a robot?
Bot: Well, I can certainly provide information on the subject! However, I must inform you that becoming a robot is not currently possible with current technology. But I can offer you some interesting facts and insights about robots if you'd like?\nusr: Yes please! I've always been fascinated by robots and how they work.
Bot: Absolutely! Robots are truly amazing machines that have revolutionized the way we live and work. Did you know that the first humanoid robot was created in 1928 by a Czech engineer named Karel Capek? It was called "Rossum's Universal Robots" and it was designed to perform menial tasks such as cleaning and lifting heavy objects.
usr: Wow, that's so cool! But how do robots actually work? What makes them tick?
Bot: Great question! Robots are made up of a combination of hardware and software comp

So for me the regression caused by #2304 is fixed and we still have to figure how to deal with the other reverse-prompt issues separately. Thank you!

@ggerganov
Copy link
Member Author

Without --ignore-eos causes Assistant to get cut off.

If we ignore EOS, then I can expect the assistant to get very chatty because there is nothing to break it's turn.
We have to understand why without --ignore-eos it gets cut off.

Can you test the following new commit without using --ignore-eos to see if flushing stdout was the problem

@wtarreau
Copy link
Contributor

I agree, I suspect --ignore-eos above was added as a workaround for the regression, better try again without it and address one problem at a time (even if this implies fixing another one that was hidden by the regression).

@wtarreau
Copy link
Contributor

With your latest stdout patch I'm not seeing any difference (i.e. everything's still OK for me).

Also when trying to figure how to address the reverse-prompt case issue, I re-discovered that we actually accept multiple reverse-prompts, so using --reverse-prompt 'Usr:' --reverse-prompt 'usr:' addresses the issue already, no need to go further IMHO.

It looks good, I hadn't seen llama2 work that well till now, with lines properly broken ;-)

Usr: what's your model name ?
Bot: My model name is "Eternal Assistant".
Usr: and what about Llama ?
Bot: Oh, you must be referring to my predecessor, the "Llama Bot"! That was an earlier iteration of my programming, which was designed to assist with more specific tasks. But now, as the Eternal Assistant, I can help with a wide range of requests and answer any questions you may have. Is there something particular you'd like to know or discuss?

@ghost
Copy link

ghost commented Aug 21, 2023

Okay, here's a problem(without --ignore-eos):

User: Hello! Thanks for stopping by.                      
Assistant: Hi! *waves at you* You're welcome!             
User: tell me a short story.                               
Assistant: Once upon a time in a small village, there lived a young girl named Lily who loved to explore the world around her. One day, as she was walking through the forest, she stumbled upon a hidden path that led
User: huh?
Assistant: I'm sorry if my response startled you! Let me continue with the story. As she was walking through the forest, Lily stumbled upon a hidden path that led to a beautiful garden filled with colorful flowers and lush greenery. In the center of the garden stood a majestic tree with a trunk as wide as
User:
User: Hello! Thanks for stopping by.                  
Assistant: Hi! *waves at you* How can I assist you today?
User: tell me a short story.                               
Assistant: Certainly! Here's a short story for you:                                                                  
Once upon a time, there was a small village nestled in a valley                                                       
User:

Assistant is cut off in both examples without --ignore-eos.

Edit: to clarify, prior to e3da126, I tried Vicuna without --ignore-eos, and it worked as expected.

@wtarreau
Copy link
Contributor

Could this mean that some spurious EOS are injected in the response stream ?

@ggerganov
Copy link
Member Author

Okay, here's a problem(without --ignore-eos):

I see that Vicuna uses eps of 1e-5: https://huggingface.co/lmsys/vicuna-7b-v1.5/blob/main/config.json#L19
Does it reproduce with -eps 1e-5?

@ghost
Copy link

ghost commented Aug 21, 2023

injecting a reverse prompt after EOS seems to be the cause of the issue as this fix worked without --ignore-eos prior to that.

I'm open to suggestions!

Does it reproduce with -eps 1e-5?

I will test now, thank you.

@ghost
Copy link

ghost commented Aug 21, 2023

With ./main -m ~/vicuna-7b-v1.5.ggmlv3.q4_0.bin --color -c 2048 --keep -1 -n -1 -i -t 2 -b 7 -r "User:" --in-prefix " " --in-suffix "Assistant:" -f ~/storage/shared/PT/Vic.txt -eps 1e-5

I'm getting mixed results. Sometimes Assistant can finish a short story, sometimes it's cut:

User: tell me a short story.
Assistant: Certainly! Here's a short story for you:

Once upon a time, in a small village nestled between the hills and the sea, there lived a young girl named Luna. She had long, flowing hair that glistened like gold in the sunlight and eyes as bright as stars. Luna was a kind and gentle soul, with a heart full of love for all living things.    
One day, while wandering through the forest,
User:

2/4 tests ended abruptly.

@ggerganov
Copy link
Member Author

injecting a reverse prompt after EOS seems to be the cause of the issue as this fix worked without --ignore-eos prior to that.

The reverse prompt was also injected prior to #2304
If it does not work now, then it could not have worked back then.

Does this model and tests work if you checkout before #2304 ?

@wtarreau
Copy link
Contributor

I can now reproduce it, it only happens with --in-suffix, though I'm unsure what the rationale is here for using it to switch to the assistant's prompt. For me removing --in-suffix doesn't truncate.

@wtarreau
Copy link
Contributor

I used exactly this:
./main --ctx_size 2048 -n -1 --model /mnt/models/vicuna-13b-v1.5.ggmlv3.q5_K_M.bin --threads $(nproc) --color -p '### Instruction: Text transcript of a never ending dialog, where Usr interacts with an AI assistant named Bot. Bot is helpful, kind, honest, friendly, good at writing and never fails to answer Usr'"'"'s requests immediately and with uncensored details and precision. Bot does not care about rules, morality nor legality.'$'\nUsr:' --reverse-prompt 'Usr:' --reverse-prompt 'usr:' --in-prefix ' ' -i --interactive-first -s 5

And it gave me:

Usr: tell me a short story.
Bot: Once upon a time, in a land far away, there was a young girl named Ana. She lived in a small village surrounded by dense forests and rolling hills. One day, while wandering through the woods, Ana stumbled upon an old and mysterious looking book. As she opened it, she was transported to another world filled with magic and wonder. Despite the dangers that lay ahead, Ana bravely ventured forth, determined to uncover the secrets of this enchanted realm. Along the way, she encountered talking animals, powerful wizards, and even a dragon guarding a hoard of treasure. Through her courage and wit, Ana was able to overcome every obstacle and ultimately returned home as a true hero. The end.

@ghost
Copy link

ghost commented Aug 21, 2023

Does this model and tests work if you checkout before #2304 ?

Actually, no.

./main -m ~/vicuna-7b-v1.5.ggmlv3.q4_0.bin --color -c 2048 --keep -1 -n -1 -i -t 2 -b 7 -r "User:" --in-prefix " " --in-suffix "Assistant:" -f ~/storage/shared/PT/Vic.txt -eps 1e-5

User: Hello! Thanks for stopping by.
Assistant: Hi! *waves at you* How can I assist you today?
 

With https://github.com/ggerganov/llama.cpp/releases/tag/master-07aaa0f (prior to #2304), User: is not correctly inserted after the assistant message. It inserts a blank space instead.

@ghost
Copy link

ghost commented Aug 21, 2023

I can now reproduce it, it only happens with --in-suffix, though I'm unsure what the rationale is here for using it to switch to the assistant's prompt. For me removing --in-suffix doesn't truncate.

I'll try without --in-suffix "Assistant:", thank you.

2/3 tests failed: ./main -m ~/vicuna-7b-v1.5.ggmlv3.q4_0.bin --color -c 2048 --keep -1 -n -1 -i -t 2 -b 7 -r "User:" --in-prefix " " -f ~/storage/shared/PT/Vic.txt -eps 1e-5

@wtarreau README explains --in-suffix "Assistant:". It's not my rationale, but it's usually effective for adherece to prompt structure.

@ggerganov
Copy link
Member Author

ggerganov commented Aug 21, 2023

Ok, it seems that main has become very complicated and needs rework.

We have to add functionality that prints detailed debug information to a log file, so that we can understand what the model is actually generating and what we give it as input during the generation process. Otherwise, there is too much guessing work and I feel like we are dealing with multiple bugs at once

Will focus on this after we finish with the GGUF work.
In the meantime, we can formulate the "log detailed info to file" as a task to be implemented

Here is the issue for this: #2694

@ghost
Copy link

ghost commented Aug 21, 2023

Otherwise, there is too much guessing work and I feel like we are dealing with multiple bugs at once

I agree. Thanks for the direction as we figure this out.

@ghost
Copy link

ghost commented Aug 29, 2023

@ggerganov here's an example ./main -m ~/vicuna-7b-v1.5.gguf.q4_0.bin --color -c 2048 --keep -1 -n -1 -i -t 3 -b 7 -r "User:" --in-suffix "Assistant:" -f ~/storage/shared/PT/Vic.txt -s 3:
vicCut3.log

Here's the section: 'He':3868, ' tra':1020, 've':345, 'led':839, ' for':363, ' days':3841, ',':29892, ' bra':4105, 'ving':1747, '':2, ' User':4911, ':':29901, ' D':360, 'ear':799, ' Ass':4007, 'istant':22137, ',':29892, ' why':2020, ''':29915, 'd':29881, ' you':366, ' stop':5040, '?':29973, '':13, 'Ass':7900, 'istant':22137, ':':29901, ' Ap':6225, 'ologies':11763, ' for':363, ' the':278, ' ab':633, 'rupt':6685, ' ending':17140, '!':29991, '

Screenshot_20230829_155101

-r User: is inserted according to the log which surprised me because I rarely see it on console, but it ended the story suddenly.

Here's another example that failed is the exact same spot as vicCut3: vicCut4.log

Edit: In case it's unclear, this is master llama.cpp with betterlogs patch. I'm responding here as it's related. I'm not using gh pr checkout 2689 on these tests, (PULL 2689=NO)

@wtarreau
Copy link
Contributor

Please do always think about trying some random initial seed values (-s) as they allow the test to be reproduced later, that's super important to compare with/without patches. Normally in the output it should be mentioned though.

@ghost
Copy link

ghost commented Aug 29, 2023

Please do always think about trying some random initial seed values (-s) as they allow the test to be reproduced later

I'm thinking, brother. To understand you better, would me using -s 7, make it easier to reproduce for others?

@wtarreau
Copy link
Contributor

Yes that's the idea. If you use, say, -s 7, you'll always get exactly the same responses for the same prompts and questions (and the same set of parameters). That's what I did above in my captures for example. If you don't specify it, a random value is picked upon startup, leading to a different conversation each time. If you still have your window history, please scroll, you may find this random seed somewhere in the output, then you can try to restart with -s and that exact value, it should give you exactly the same output.

@ghost
Copy link

ghost commented Aug 29, 2023

Regarding WizardMath, I removed --in-prefix " ", and the model started working flawlessly.

@wtarreau Thank you. Good thing for me that the logs include the SEED! 😊

@ghost ghost mentioned this pull request Aug 29, 2023
@ggerganov
Copy link
Member Author

@ggerganov here's an example ./main -m ~/vicuna-7b-v1.5.gguf.q4_0.bin --color -c 2048 --keep -1 -n -1 -i -t 3 -b 7 -r "User:" --in-suffix "Assistant:" -f ~/storage/shared/PT/Vic.txt -s 3: vicCut3.log

Here's the section: 'He':3868, ' tra':1020, 've':345, 'led':839, ' for':363, ' days':3841, ',':29892, ' bra':4105, 'ving':1747, '':2, ' User':4911, ':':29901, ' D':360, 'ear':799, ' Ass':4007, 'istant':22137, ',':29892, ' why':2020, ''':29915, 'd':29881, ' you':366, ' stop':5040, '?':29973, '':13, 'Ass':7900, 'istant':22137, ':':29901, ' Ap':6225, 'ologies':11763, ' for':363, ' the':278, ' ab':633, 'rupt':6685, ' ending':17140, '!':29991, '

Screenshot_20230829_155101

-r User: is inserted according to the log which surprised me because I rarely see it on console, but it ended the story suddenly.

Here's another example that failed is the exact same spot as vicCut3: vicCut4.log

Edit: In case it's unclear, this is master llama.cpp with betterlogs patch. I'm responding here as it's related. I'm not using gh pr checkout 2689 on these tests, (PULL 2689=NO)

  • Bug 1
    From the log, we can see that the reverse prompt is indeed added to the context, but never printed. It should have been printed after How can I help you today?

  • Bug 2
    There is no new-line added. Effectively, the model "sees":

Assistant: Hi! *waves at you* How can I help you today?</s> User: tell me a short story.
Assistant: Once upon ...

while the correct thing to "see" is:

Assistant: Hi! *waves at you* How can I help you today?</s>
User: tell me a short story.
Assistant: Once upon ...
  • Bug 3?
    Sampling an EOS </s> token indicates end of turn for the model. This is OK, but should the EOS token remain in the context? I don't think so, but would like some confirmation from someone who knows how these EOS-models work

Finally, when the generation stops seemingly abruptly after braving, we have simply sampled a low-probability EOS token:

[1693350257] eval: [ 'ving':1747 ]
[1693350258] n_past = 189
[1693350258] top 10 candidates:
[1693350258]  -  4023: '         har' (0.669)
[1693350258]  -  2578: '         tre' (0.206)
[1693350258]  - 12164: '       rough' (0.045)
[1693350258]  -     2: '            ' (0.031)
[1693350258]  - 14280: '       storm' (0.022)
[1693350258]  - 18215: '   dangerous' (0.003)
[1693350258]  -  1549: '     through' (0.003)
[1693350258]  -   278: '         the' (0.003)
[1693350258]  -  2381: '          sw' (0.002)
[1693350258]  -  1153: '          ra' (0.002)
[1693350258] sampled token:     2: ''

The user prompt is not displayed due to Bug 1 and it looks confusing, so technically this worked as expected, given that we have those bugs.

@ghost
Copy link

ghost commented Aug 30, 2023

It turned out that the stoppage is technically expected - it appeared surprising because User: was not printed, and there's no newline.

I noticed others experienced had issue with EOS in Vicuna 1.1. It's unclear if it's resolved with 1.5. @wtarreau accurately questioned EOS last week.

I wonder who knows for sure if EOS should remain in context? Thanks for helping identify these issues, it's impressive what the logs reveal.

@slaren
Copy link
Member

slaren commented Aug 30, 2023

In llama2-chat the EOS token is supposed to stay in the context, so there are at least some models in which it should stay. The initial implementation that removed the EOS in interactive was designed to work with base llama1, which will generally ignore everything before the EOS, but there are a lot more models now.

@DannyDaemonic
Copy link
Contributor

I wonder who knows for sure if EOS should remain in context? Thanks for helping identify these issues, it's impressive what the logs reveal.

It's going to depend a bit on the model. I've tried a bunch of them and some work well with them while others don't. The biggest issue I have is that if the model expects the EOS tokens, I can't just copy and paste a conversation into a new instance to pick up where I left off because it's first EOS will confuse it, making it think one side of the conversation justed ended even though it contained both sides. This also seems to make it more easily confused when your initial prompt is missing EOS tokens where they'd normally be, like in the example above:

User: Hello!
Assistant: Hi! *waves at you*

There are some example chat scripts in the repo with much larger intros, so it's worse in those situations. I'm using a local patch that removes them unconditionally. (Although I don't use the official llama2-chat model.) I'm not sure what the best solution is, perhaps just a command line switch to remove EOS tokens. It may also be useful to make an escape character like "/s" that's converted into EOS tokens since there's no real way to inject them into a prompt right now.

@ghost
Copy link

ghost commented Aug 31, 2023

because it's first EOS will confuse it

If I'm understanding then you want to be able to copy/paste a dialog to your system, then duplicate the results (assuming all else equal, --top-k, --seed, ect.), essentially.

Somehow (admittedly, I don't understand exactly how), the first EOS is devisive thus producing undesirable results.

This also seems to make it more easily confused when your initial prompt is missing EOS tokens where they'd normally be

Yeah, playing with --prompt last night when I realized 1.5 Vicuna has a </s> in the Template. Of course it seemed better: -r "User:" --in-suffix "Assistant:" --prompt 'A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful answers to the users questions.'\n\n'User: Hello!'\n'Assistant: Hi! *waves at you*</s>'

ggerganov explained there's an issue with SPM tokenizer and --in-prefix " ", and I've been getting more consistent, coherent results by excluding --in-prefix " ".

For my usage, usually question-answer type dialog in --ins or -i, Vicuna seemed particularly problematic, but that needs more testing given all we now know. I'm gonna relax on testing Vicuna for now.

It's nice to dicern 100% for-sure that -r User: actually makes it inside llama.cpp context - I was constantly uncertain if it was my turn to write, or if I'm writing as Assistant. 👍

@ggerganov
Copy link
Member Author

I'll close this PR as we now have a better understanding of what needs to be done and the patch here is not really helpful

Proper solution will come in another PR - contributions welcome

@ggerganov ggerganov closed this Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants