Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add --in-prefix-bos to prefix BOS to user inputs; keep EOS #2304

Merged
merged 3 commits into from
Jul 25, 2023

Conversation

jxy
Copy link
Contributor

@jxy jxy commented Jul 21, 2023

The BOS precedes the string specified by --in-prefix. Model generated EOS is now kept in the context.

It provides a way to strictly following the prompt format used in Llama-2-chat.

The EOS handling also benefits some existing finetunes that uses EOS to mark the end of turn.

The BOS precedes the string specified by `--in-prefix`.
Model generated EOS is now kept in the context.

It provides a way to strictly following the prompt format used in
Llama-2-chat.

The EOS handling also benefits some existing finetunes that uses
EOS to mark the end of turn.
@jxy
Copy link
Contributor Author

jxy commented Jul 21, 2023

For llama-2-chat, you want

$ ./main -m "$MODEL" -c 4096 -n -1 \
--in-prefix-bos --in-prefix ' [INST] ' --in-suffix ' [/INST]' -i -p \
"[INST] <<SYS>>
$SYSTEM
<</SYS>>

$instruct [/INST]"

The spaces in in-prefix/suffix are important.

@jxy jxy mentioned this pull request Jul 21, 2023
@bullno1
Copy link
Contributor

bullno1 commented Jul 21, 2023

Currently prompt already has an implicit BOS right?

@ggerganov
Copy link
Owner

@bullno1

llama_tokenize() has a bool flag whether to add a BOS or not. In main we set the flag to true when tokenizing the prompt

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change seems good, but maybe do some more testing by more people to be sure we didn't mess up the main loop somehow

examples/common.h Outdated Show resolved Hide resolved
@ghost
Copy link

ghost commented Jul 21, 2023

For llama-2-chat, you want

$ ./main -m "$MODEL" -c 4096 -n -1 \
--in-prefix-bos --in-prefix ' [INST] ' --in-suffix ' [/INST]' -i -p \
"[INST] <<SYS>>
$SYSTEM
<</SYS>>

$instruct [/INST]"

The spaces in in-prefix/suffix are important.

I'm trying, but my command line does not accept the format:
Screenshot_20230721_111519

I try to clean it, but then format is changed:
Screenshot_20230721_111551

I dunno how to force a new line. Also, am I to expected replace $SYSTEM, and $instruct or leave it alone?

Edit: Instead of -p, I used -f and loaded a .txt:

./main -m ~/llama-2-7b-chat.ggmlv3.q4_0.bin -c 2048 -n -1 -t 3 -b 7 --in-prefix-bos --in-prefix ' [INST] ' --in-suffix ' [/INST]' -i -f ~/storage/downloads/Llama2.txt

main: build = 857 (47031e4)
main: seed  = 1689957008
llama.cpp: loading model from /data/data/com.termux/files/home/llama-2-7b-chat.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 5287.72 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size  = 1024.00 MB

system_info: n_threads = 3 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
main: interactive mode on.
Input prefix with BOS
Input prefix: ' [INST] '
Input suffix: ' [/INST]'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 7, n_predict = -1, n_keep = 0


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 "[INST] <<SYS>>
You're an A.I.
<</SYS>>

Please list 3 movies with Mel Gibson [/INST]"When you say 'Mel Gibson', I immediately think of his iconic roles in these three movies:
1. Mad Max (1979) - In this cult classic, Gibson plays the titular character, a rugged and violent anti-hero who must navigate a post-apocalyptic world filled with danger and mayhem.
2. Lethal Weapon (1987) - In this buddy cop comedy-action film, Gibson stars as Martin Riggs, a reckless and unpredictable detective who teams up with a straight-laced partner (Danny Glover) to take down a drug lord.
3. Braveheart (1995) - In this epic historical drama, Gibson gives a tour de force performance as William Wallace, a Scottish warrior who leads a rebellion against English rule in the late 13th century. The film's intense battle scenes and emotional dramatics have made it a fan favorite for decades."
 [INST] What's 2 fun things to do at the beach?
 [/INST]  Sure! Here are 2 fun things to do at the beach:
1. Build Sandcastles: Building sandcastles is a classic beach activity that can be enjoyed by people of all ages. You can use buckets, shovels, and other tools to create your masterpiece. Don't forget to add some decorations like seashells, rocks, or even small toys to make it more interesting.
2. Go Swimming: Swimming is another popular beach activity that provides a great way to cool off and have fun in the sun. You can swim laps, play games like "Marco Polo" or "Sharks and Minnows," or simply splash around and enjoy the water. Just remember to always swim in designated areas and follow safety guidelines to avoid any accidents.
 [INST]

@jxy
Copy link
Contributor Author

jxy commented Jul 22, 2023

@JackJollimore you're missing the backslashes \ at the end of the line in your shell (and somehow extra newlines???), and having extra quotation marks " in your prompt file.

@arch-btw
Copy link
Contributor

Unfortunately it doesn't work for me either. It starts asking itself questions and then answers them.

@ghost
Copy link

ghost commented Jul 22, 2023

you're missing the backslashes \ at the end of the line in your shell (and somehow extra newlines???), and having extra quotation marks " in your prompt file.

I overlooked the quotation after converting to .txt. Thanks for pointing that out, but I copy/pasted, so if there's new lines then the shell did that. Termux ignored commands after backslash, and I deleted them to make it work.

I don't have a clear understanding of this PR, so maybe someone skilled will test.

@arch-btw
Copy link
Contributor

Just following up that it works for me now, I made a few user errors and also ran into the copy + paste problem. It's all solved now. 👍

@zacps
Copy link

zacps commented Jul 24, 2023

Another 👍, working well for me as well on llama2-70b-chat (fp16) after rebasing on master.

@lionelchg
Copy link

lionelchg commented Jul 24, 2023

Works for me as well! I have rewritten a small bash script that is copy/pastable:

#!/bin/bash

# The script should be launched like ./chat.sh models/llama-2-13b-chat.ggmlv3.q4_0.bin system_prompts/translation.txt Hello

# Load system prompt
SYSTEM_PROMPT=$(cat $2)

# Execute model
./main -m $1 -c 4096 -n -1 --in-prefix-bos --in-prefix ' [INST] ' --in-suffix ' [/INST]' -i \
    -p "[INST] <<SYS>>\n$SYSTEM_PROMPT\n<</SYS>>\n\n$3 [/INST]"

with the following content for system_prompts/translation.txt:

Translate every sentence from English into French.

@ggerganov ggerganov merged commit 0c06204 into ggerganov:master Jul 25, 2023
@ggerganov
Copy link
Owner

@lionelchg Would be a nice contribution to examples - feel free to PR it

@ejones
Copy link
Collaborator

ejones commented Jul 26, 2023

This is great! FYI as an additional benefit, this unblocks using --grammar in interactive mode as an alternative for several prompt options like --input-prefix, --input-suffix and --reverse-prompt (except for the unnecessary newline):

$ ./main -m $LLAMA2_13B_Q4_0  -i  \
  --grammar 'root ::= "### RESPONSE: *" [a-z]+ (" " [a-z]+)* "* " [^\r\n]+ "\n### HUMAN: "' \
  -p "### HUMAN: Hello, how are you?
"
...

 ### HUMAN: Hello, how are you?
### RESPONSE: *thinks for a minute* I think I'm fine.
### HUMAN: 
great!
### RESPONSE: *says something else* I have this thing that annoys me.
### HUMAN: 

@ejones ejones mentioned this pull request Jul 26, 2023
lionelchg pushed a commit to lionelchg/llama.cpp that referenced this pull request Jul 26, 2023
Builds on top of PR ggerganov#2304 to create a working script for system
prompt integration with interactive mode.
lionelchg added a commit to lionelchg/llama.cpp that referenced this pull request Jul 26, 2023
Builds on top of PR ggerganov#2304 to create a working script for system
prompt integration with interactive mode.
@ghost
Copy link

ghost commented Jul 27, 2023

@jxy It appears Llama2 is the only model working as expected since this commit.

Is there something I need to do to have models other than Llama2 follow intended prompt structure?

#2417

@pugzly
Copy link

pugzly commented Jul 27, 2023

Yes, I couldn't get one of my older models (wizardlm) working with the latest llama.cpp
Until I tried manually downloading repo one commit prior to this one, and everything start working as it used to be.

@jxy
Copy link
Contributor Author

jxy commented Jul 28, 2023

Sorry for breaking people's command line with reverse prompt. Previously if you specified reverse prompt, and the model generated an EOS, the EOS is replaced by a new line and the first reverse prompt was inserted. That was a bit unintuitive and overlapped with in-prefix.

Now with this PR, if the model generates EOS, the EOS is kept in the context, and NO reverse prompt would be inserted automatically. In order to have some text prefix your input, use --in-prefix as intended. As an example, for Vicuna, previously you might have

-r 'USER:' --in-prefix ' '

after this PR, you only need

--in-prefix 'USER: '

which is because Vicuna is capable of generating EOS. In fact, the latter worked before this PR too. Same things for Wizarldlm or other model that uses EOS to signal the end of generation.

aragula12 pushed a commit to aragula12/llama.cpp that referenced this pull request Aug 4, 2023
@jxy jxy deleted the prefix-bos branch April 10, 2024 02:46
ajs177 pushed a commit to ajs177/LLAMA-summarizer that referenced this pull request May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants