Add OpenChat, Alpaca, Vicuna chat templates #6397

kaizau · 2024-03-30T09:41:40Z

This PR adds chat templates for some of the more popular non-ChatML models (that I know of, at least!).

OpenChat is also used by Starling 7B.
Vicuna 1.1 and its Orca variant are used by a bunch of Yi fine tunes (Nous Capybara, Tess, etc.)
Alpaca is used by DeepSeek Coder.

Named openchat, vicuna, and alpaca respectively.

I based OpenChat's on the official Jinja template, and Vicuna's on the one from text-generation-web-ui (couldn't find it in any model's config_tokenizer.json, but it matches what I saw in model cards and HF discussions). Alpaca was done using DeepSeek's template since the original also predates Jinja chat templates.

Caveat: Because none of the Vicuna models I've tested seem to include a chat template string, there doesn't seem to be a good way to heuristically detect the Orca variant. I've worked around this by creating a vicuna-orca template that's also handled by vicuna. Open to alternatives here.

New to C++ and this project, so please don't hesitate to mention any details I may have missed!

ngxson

Could you also test to see if the output really matches with the python version of these template? You can use the python code here: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template

Please also let me know what must be added to the wiki page.

ngxson · 2024-03-30T14:22:01Z

llama.cpp

+            } else if (role == "user") {
+                ss << "### Instruction:\n" << message->content << "\n\n";
+            } else if (role == "assistant") {
+                ss << "### Response:\n" << message->content << "\n\n";


alpaca template and deepseek template both look similar at the first glance, but the main different is that alpaca template only used for instruction-response (one turn) and not multiple turns like modern chat template.

deepseek extends the notion of instruction-response into multi-turn by placing <|EOT|> token between each turn, so the formatted chat should look like:

### Instruction: who are you? ### Response: I am assistant <|EOT|> ### Instruction: 1+1 is ### Response: equal to 2 <|EOT|>

So what missing here is that <|EOT|>

The chat above is produced by python code + jinja template, it doesn't seem to have "\n\n" at the end of each message, so I think the "\n\n" should be replaced by "\n"

Thanks for the python script! Included the Jinja output of OpenChat and DeepSeek below. And as you mentioned, the other two fail due to not having templates in config_tokenizers.json.

Will add <|EOT|> for DeepSeek when I have moment tomorrow.

openchat/openchat-3.5-0106 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. <s>GPT4 Correct System: You are a helpful assistant<|end_of_turn|>GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi there<|end_of_turn|>GPT4 Correct User: Who are you<|end_of_turn|>GPT4 Correct Assistant: I am an assistant <|end_of_turn|>GPT4 Correct User: Another question<|end_of_turn|> ------------------------------ deepseek-ai/deepseek-coder-33b-instruct Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. <｜begin▁of▁sentence｜>You are a helpful assistant### Instruction: Hello ### Response: Hi there <|EOT|> ### Instruction: Who are you ### Response: I am an assistant <|EOT|> ### Instruction: Another question ------------------------------

Looks OK to me, just a quick note is that <｜begin▁of▁sentence｜> is not needed, because BOS is always added on server

tests/test-chat-template.cpp

llama.cpp

Jeximo · 2024-03-30T19:06:40Z

Mistral Instruct may be good for templating, https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2

github-actions · 2024-04-01T11:51:00Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3: 492 iterations 🚀

Concurrent users: 8, duration: 10m
HTTP request : avg=9577.92ms p(90)=26590.07ms fails=0, finish reason: stop=492 truncated=0
Prompt processing (pp): avg=244.14tk/s p(90)=741.1tk/s total=195.69tk/s
Token generation (tg): avg=99.73tk/s p(90)=272.37tk/s total=131.07tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=master commit=021c6f50e1d354cb95a8187d4f6dd5b40f7e329f

Time series

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 492 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1712152788 --> 1712153414
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 578.07, 578.07, 578.07, 578.07, 578.07, 627.44, 627.44, 627.44, 627.44, 627.44, 647.2, 647.2, 647.2, 647.2, 647.2, 659.72, 659.72, 659.72, 659.72, 659.72, 684.64, 684.64, 684.64, 684.64, 684.64, 685.76, 685.76, 685.76, 685.76, 685.76, 685.92, 685.92, 685.92, 685.92, 685.92, 690.43, 690.43, 690.43, 690.43, 690.43, 690.49, 690.49, 690.49, 690.49, 690.49, 695.09, 695.09, 695.09, 695.09, 695.09, 713.81, 713.81, 713.81, 713.81, 713.81, 720.54, 720.54, 720.54, 720.54, 720.54, 740.53, 740.53, 740.53, 740.53, 740.53, 748.95, 748.95, 748.95, 748.95, 748.95, 697.43, 697.43, 697.43, 697.43, 697.43, 699.98, 699.98, 699.98, 699.98, 699.98, 704.39, 704.39, 704.39, 704.39, 704.39, 701.68, 701.68, 701.68, 701.68, 701.68, 715.57, 715.57, 715.57, 715.57, 715.57, 714.44, 714.44, 714.44, 714.44, 714.44, 712.26, 712.26, 712.26, 712.26, 712.26, 711.15, 711.15, 711.15, 711.15, 711.15, 714.24, 714.24, 714.24, 714.24, 714.24, 716.1, 716.1, 716.1, 716.1, 716.1, 731.73, 731.73, 731.73, 731.73, 731.73, 730.86, 730.86, 730.86, 730.86, 730.86, 724.3, 724.3, 724.3, 724.3, 724.3, 725.39, 725.39, 725.39, 725.39, 725.39, 714.99, 714.99, 714.99, 714.99, 714.99, 710.76, 710.76, 710.76, 710.76, 710.76, 713.53, 713.53, 713.53, 713.53, 713.53, 715.75, 715.75, 715.75, 715.75, 715.75, 715.22, 715.22, 715.22, 715.22, 715.22, 717.03, 717.03, 717.03, 717.03, 717.03, 714.69, 714.69, 714.69, 714.69, 714.69, 723.16, 723.16, 723.16, 723.16, 723.16, 728.33, 728.33, 728.33, 728.33, 728.33, 718.0, 718.0, 718.0, 718.0, 718.0, 717.19, 717.19, 717.19, 717.19, 717.19, 715.82, 715.82, 715.82, 715.82, 715.82, 714.82, 714.82, 714.82, 714.82, 714.82, 716.23, 716.23, 716.23, 716.23, 716.23, 719.34, 719.34, 719.34, 719.34, 719.34, 725.21, 725.21, 725.21, 725.21, 725.21, 722.99, 722.99, 722.99, 722.99, 722.99, 720.33, 720.33, 720.33, 720.33, 720.33, 719.02, 719.02, 719.02, 719.02, 719.02, 718.49, 718.49, 718.49, 718.49, 718.49, 715.0, 715.0, 715.0, 715.0, 715.0, 711.46, 711.46, 711.46, 711.46, 711.46, 716.09, 716.09, 716.09, 716.09, 716.09, 717.05, 717.05, 717.05, 717.05, 717.05, 717.04, 717.04, 717.04, 717.04, 717.04, 721.06, 721.06, 721.06, 721.06, 721.06, 723.57, 723.57, 723.57, 723.57, 723.57, 723.37, 723.37, 723.37, 723.37, 723.37, 724.5, 724.5, 724.5, 724.5, 724.5, 723.55, 723.55, 723.55, 723.55, 723.55, 724.1, 724.1, 724.1, 724.1, 724.1, 725.85, 725.85, 725.85, 725.85, 725.85, 726.08, 726.08, 726.08, 726.08]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 492 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1712152788 --> 1712153414
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 28.99, 28.99, 28.99, 28.99, 28.99, 15.95, 15.95, 15.95, 15.95, 15.95, 17.05, 17.05, 17.05, 17.05, 17.05, 17.53, 17.53, 17.53, 17.53, 17.53, 18.59, 18.59, 18.59, 18.59, 18.59, 19.0, 19.0, 19.0, 19.0, 19.0, 19.63, 19.63, 19.63, 19.63, 19.63, 19.91, 19.91, 19.91, 19.91, 19.91, 20.06, 20.06, 20.06, 20.06, 20.06, 20.11, 20.11, 20.11, 20.11, 20.11, 20.04, 20.04, 20.04, 20.04, 20.04, 20.02, 20.02, 20.02, 20.02, 20.02, 19.7, 19.7, 19.7, 19.7, 19.7, 19.45, 19.45, 19.45, 19.45, 19.45, 19.18, 19.18, 19.18, 19.18, 19.18, 19.01, 19.01, 19.01, 19.01, 19.01, 18.73, 18.73, 18.73, 18.73, 18.73, 18.63, 18.63, 18.63, 18.63, 18.63, 18.72, 18.72, 18.72, 18.72, 18.72, 18.56, 18.56, 18.56, 18.56, 18.56, 18.45, 18.45, 18.45, 18.45, 18.45, 18.45, 18.45, 18.45, 18.45, 18.45, 18.25, 18.25, 18.25, 18.25, 18.25, 18.24, 18.24, 18.24, 18.24, 18.24, 18.29, 18.29, 18.29, 18.29, 18.29, 18.22, 18.22, 18.22, 18.22, 18.22, 18.3, 18.3, 18.3, 18.3, 18.3, 18.39, 18.39, 18.39, 18.39, 18.39, 18.39, 18.39, 18.39, 18.39, 18.39, 18.33, 18.33, 18.33, 18.33, 18.33, 18.43, 18.43, 18.43, 18.43, 18.43, 18.51, 18.51, 18.51, 18.51, 18.51, 18.6, 18.6, 18.6, 18.6, 18.6, 18.7, 18.7, 18.7, 18.7, 18.7, 18.79, 18.79, 18.79, 18.79, 18.79, 18.73, 18.73, 18.73, 18.73, 18.73, 18.72, 18.72, 18.72, 18.72, 18.72, 18.68, 18.68, 18.68, 18.68, 18.68, 18.58, 18.58, 18.58, 18.58, 18.58, 18.55, 18.55, 18.55, 18.55, 18.55, 18.59, 18.59, 18.59, 18.59, 18.59, 18.63, 18.63, 18.63, 18.63, 18.63, 18.67, 18.67, 18.67, 18.67, 18.67, 18.6, 18.6, 18.6, 18.6, 18.6, 18.53, 18.53, 18.53, 18.53, 18.53, 18.48, 18.48, 18.48, 18.48, 18.48, 18.45, 18.45, 18.45, 18.45, 18.45, 18.2, 18.2, 18.2, 18.2, 18.2, 17.91, 17.91, 17.91, 17.91, 17.91, 17.61, 17.61, 17.61, 17.61, 17.61, 17.6, 17.6, 17.6, 17.6, 17.6, 17.61, 17.61, 17.61, 17.61, 17.61, 17.68, 17.68, 17.68, 17.68, 17.68, 17.69, 17.69, 17.69, 17.69, 17.69, 17.75, 17.75, 17.75, 17.75, 17.75, 17.78, 17.78, 17.78, 17.78, 17.78, 17.77, 17.77, 17.77, 17.77, 17.77, 17.76, 17.76, 17.76, 17.76, 17.76, 17.73, 17.73, 17.73, 17.73, 17.73, 17.69, 17.69, 17.69, 17.69, 17.69, 17.68, 17.68, 17.68, 17.68]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 492 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1712152788 --> 1712153414
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.17, 0.17, 0.17, 0.17, 0.17, 0.24, 0.24, 0.24, 0.24, 0.24, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.23, 0.23, 0.23, 0.23, 0.23, 0.21, 0.21, 0.21, 0.21, 0.21, 0.19, 0.19, 0.19, 0.19, 0.19, 0.18, 0.18, 0.18, 0.18, 0.18, 0.19, 0.19, 0.19, 0.19, 0.19, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.18, 0.18, 0.18, 0.18, 0.18, 0.12, 0.12, 0.12, 0.12, 0.12, 0.31, 0.31, 0.31, 0.31, 0.31, 0.26, 0.26, 0.26, 0.26, 0.26, 0.27, 0.27, 0.27, 0.27, 0.27, 0.33, 0.33, 0.33, 0.33, 0.33, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.21, 0.21, 0.21, 0.21, 0.21, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.08, 0.08, 0.08, 0.08, 0.08, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.11, 0.11, 0.11, 0.11, 0.11, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.32, 0.32, 0.32, 0.32, 0.32, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.34, 0.34, 0.34, 0.34, 0.34, 0.45, 0.45, 0.45, 0.45, 0.45, 0.46, 0.46, 0.46, 0.46, 0.46, 0.55, 0.55, 0.55, 0.55, 0.55, 0.57, 0.57, 0.57, 0.57, 0.57, 0.37, 0.37, 0.37, 0.37, 0.37, 0.1, 0.1, 0.1, 0.1, 0.1, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.09, 0.09, 0.09, 0.09, 0.09, 0.11, 0.11, 0.11, 0.11, 0.11, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.24, 0.24, 0.24, 0.24, 0.24, 0.28, 0.28, 0.28, 0.28, 0.28, 0.21, 0.21, 0.21, 0.21, 0.21, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 492 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1712152788 --> 1712153414
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0]

kaizau · 2024-04-01T12:03:44Z

@ngxson @Jeximo Three updates and a question:

I tweaked the python script to match the history and add_generation_prompt settings from tests/test_chat_template.cpp. Also included a shortcut for copying the output as a test.

Updated script

from transformers import AutoTokenizer

VARIANTS_TO_TEST = [
    #'teknium/OpenHermes-2.5-Mistral-7B',
    # 'mistralai/Mistral-7B-Instruct-v0.2',
    # 'TheBloke/FusionNet_34Bx2_MoE-AWQ',
    # 'bofenghuang/vigogne-2-70b-chat',
    # 'mlabonne/AlphaMonarch-7B',
    # 'google/gemma-7b-it',
    # 'OrionStarAI/Orion-14B-Chat',
    # 'openbmb/MiniCPM-2B-dpo-fp32',
    'openchat/openchat-3.5-0106',
    'deepseek-ai/deepseek-coder-33b-instruct',
]

HISTORY = [
    { 'role': 'system', 'content': 'You are a helpful assistant' },
    { 'role': 'user', 'content': 'Hello' },
    { 'role': 'assistant', 'content': 'Hi there' },
    { 'role': 'user', 'content': 'Who are you' },
    { 'role': 'assistant', 'content': '   I am an assistant   ' },
    { 'role': 'user', 'content': 'Another question' },
]

for variant in VARIANTS_TO_TEST:
    history = [m for m in HISTORY] # copy
    if 'Mistral' in variant or 'gemma' in variant:
        history.pop(0) # no system prompt for mistral and gemma
    if 'gemma' in variant:
        # GemmaTokenizer is quite buggy, let's hard code the template here
        GEMMA_TMLP = "{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if (message['role'] == 'assistant') %}{% set role = 'model' %}{% else %}{% set role = message['role'] %}{% endif %}{{ '<start_of_turn>' + role + '\n' + message['content'] | trim + '<end_of_turn>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"
        print('[Gemma]')
        output = AutoTokenizer.from_pretrained(VARIANTS_TO_TEST[0]).apply_chat_template(history, tokenize=False, add_generation_prompt=True, chat_template=GEMMA_TMLP)
        print(output)
        print(output.replace("\n", "\\n"))
        print('-' * 30)
    else:
        print('[' + variant + ']')
        tokenizer = AutoTokenizer.from_pretrained(variant)
        output = tokenizer.apply_chat_template(history, tokenize=False, add_generation_prompt=True)
        print(output)
        print(output.replace("\n", "\\n"))
        print('-' * 30)

Used this output as the ground truth tests for updating the templates. And yup, found differences — fixed them.
Replaced alpaca entirely with deepseek, re: above comments.

Question is about vicuna and vicuna-orca, and more generally, any models where automated detection isn't feasible. Would it make sense to support them if only through the --chat-template server flag? Or would you prefer I just cut them from this PR — maybe try to figure out an alternative later?

Jeximo · 2024-04-01T19:40:27Z

Would it make sense to support them if only through the --chat-template server flag?

When I search Orca, it's --chat-ml, but I also saw the older-styled templates for Nous/Tess. I like more options for chat templates personally, but I understand not wanting to complicate other development, so leave it up to yourself and @ngxson.

ngxson · 2024-04-01T20:30:18Z

Thanks for the efforts @kaizau . IMO chat/prompt template has always been a quite messy topic (rabbit hole as you said). You can see on the beginning of #4216 there was a discuss about that.

After some more researches I think it's OK to keep vicuna/vicuna-orca. While they does not have official jinja template, I think we can maybe ask the model's author to add one (or the one to convert it to gguf to add one). One of the thing I fear was that some templates do not have multi-turn capability from the beginning, like alpaca for example, but people try to retro-fit it. Turns out, that's not the case of vicuna, so it's safe to assume that all vicuna-based models support multi-turn.

kaizau · 2024-04-02T13:45:07Z

@ngxson Makes sense. Any other code / formatting changes you'd like to see here? I'll draft up a readme update shortly.

Relatedly, a quirk I've noticed in using the OpenChat and Vicuna templates is that the first character of every assistant message is now always " ".

This is because these 3 templates all use ": " as the role separator — yet all of the official / reference add_generation_prompt examples exclude the space after the colon.

I can't tell if this is an oversight or as intended. Adding the space after the colon in each add_ass block gets rid of the problem — which is what I would lean towards (what I expect most users would prefer).

Did you encounter anything similar with previous chat templates?

Jeximo · 2024-04-02T15:22:47Z

first character of every assistant message is now always " "

readme under prompt states If the prompt is a string or an array with the first element given as a string, a bos token is inserted in the front like main does

I'm not sure of the correct solution - I had a similar experience with CLI in main: --in-prefix "GPT4 Correct User: " --in-suffix "<|end_of_turn|>GPT4 Correct Assistant:"

I included the space for User, and excluded it for Assistant in order to strickly adhere to the template. I think it's intentional, but I may be wrong.

kaizau · 2024-04-02T15:40:39Z

@ngxson Was about to paste the readme update here, but realized I already had edit access to the page?

Either way, added the 4 templates: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template

I also added a "how to add a template" section that hopefully makes it incrementally easier for others. It includes the updated version of your script that outputs in a format identical to test-chat-template.cpp, to reduce room for human error.

SamuelTallet · 2024-04-02T16:20:20Z

@kaizau
Thanks for your efforts.

The part <s>GPT4 Correct System: in the Wiki seems incorrect.

OpenChat author said the system prompt should be appended without prefix.

Source: https://huggingface.co/openchat/openchat_3.5/discussions/5#65448109b4a3f3a2f486fd9d

ngxson · 2024-04-02T22:36:38Z

Relatedly, a quirk I've noticed in using the OpenChat and Vicuna templates is that the first character of every assistant message is now always " ".

That's because tokenizers tend to encode both the word and the space into the same token. For example, using https://platform.openai.com/tokenizer :

Adding a trailing space in the assistant prompt GPT4 Correct System: will make the model to perceive the sentence differently, because now the trailing space is encoded as a single character and not attached to the next word:

Sadly there's no other way to get rid of this problem. The root cause in fact is because this class of template does not have special tokens like <|im_start|>, they rely on common characters like : or space which is dependent on the next word.

ngxson · 2024-04-02T22:39:02Z

I also added a "how to add a template" section that hopefully makes it incrementally easier for others. It includes the updated version of your script that outputs in a format identical to test-chat-template.cpp, to reduce room for human error.

Nice, thanks! That looks good to me. I don't know how the permission system in wiki page works, but I glad to know that you have write access to wiki.

ngxson · 2024-04-02T22:40:25Z

llama.cpp

+        for (auto message : chat) {
+            std::string role(message->role);
+            if (message == chat.front()) {
+                ss << "<｜begin▁of▁sentence｜>";


The only thing left need to do is to remove this <｜begin▁of▁sentence｜>, this is because server already add BOS token to input prompt by default.

ggerganov · 2024-04-03T12:59:29Z

I don't know how the permission system in wiki page works, but I glad to know that you have write access to wiki.

It might be a good idea to restrict wiki access to collaborators, agree?

kaizau · 2024-04-03T13:10:13Z

The part <s>GPT4 Correct System: in the Wiki seems incorrect.

OpenChat author said the system prompt should be appended without prefix.

Source: https://huggingface.co/openchat/openchat_3.5/discussions/5#65448109b4a3f3a2f486fd9d

@SamuelTallet Thanks! I saw that thread too and originally implemented the unprefixed version.

But running the actual Jinja template from the model's tokenizer_config.json produces <s>GPT4 Correct System: . So any implementation that actually uses the Jinja template would include a prefix...The readme also references the tokenizer.chat_template as the correct one.

This is unfortunately the state of templates right now. 🥲

I've left a comment asking for clarification, but will default to the unprefixed.

kaizau · 2024-04-03T13:48:58Z

@ngxson Thanks for the explanation.

Just removed prefixes for both OpenChat and DeekSeek.

If the BOS token is automatically added, then my python script update probably oversells the extent to which copy-and-pasting the output as a test will work. The special tokens would have to be manually removed. But I can clarify that in the next wiki update.

Aside: I was also surprised to find I could edit the wiki directly — was fully expecting a "your edits are pending approval" screen when I hit save. 😅

ngxson · 2024-04-03T15:18:59Z

@ggerganov Yes, it is important to restrict write access to wiki. Ideally IMO we can allow only a list of people (not all contributors), but I'm not sure if this option is possible on github. The reason is because changes to wiki does not requires review. Bad actors may be able to exploit contributor's write access to change content on wiki.

ngxson

LGTM. Thank you!

ggerganov · 2024-04-03T16:04:44Z

Updated the wiki access:

Note, these are only the collaborators that have write access (i.e. not all contributors). Still, if we want to make this even stricter, it should be moved as doc files and committed to the repo

Folko-Ven · 2024-04-04T01:59:07Z

@kaizau Hello, I apologize for disturbing you, but is there any hope for the addition of Mistral templates?

ngxson · 2024-04-04T02:15:15Z

@ggerganov Thanks. That's ok for now I think. We can consider moving wiki to doc files later. Personally, I still feel like the UI of wiki page is more simple to navigate.

@Folko-Ven Mistral uses llama2 template. Maybe we can add mistral as an alias for llama2 to clarify that.

Folko-Ven · 2024-04-04T02:56:06Z

@ngxson It seemed to me that they are slightly different, aren’t they? Usually, I look at chat templates here - [link]

ngxson · 2024-04-04T03:18:43Z

We do support 3 variants of llama2. Mistral uses the variant with spaces around message content. As long as the model have the correct jinja template, it will be auto-detected and correct template will be used.

Folko-Ven · 2024-04-04T03:29:02Z

@ngxson Got it, thanks for explaining!

wtarreau · 2024-04-06T14:41:37Z

Regarding the limitations to access the wiki from a list of people, the only solution we've found in haproxy was to create a dedicated project for the wiki and send invites to those who want to contribute. The main project's wiki is simply redirected to the wiki project and that solved the issues. but it's indeed annoying.

* Add openchat chat template * Add chat template test for openchat * Add chat template for vicuna * Add chat template for orca-vicuna * Add EOS for vicuna templates * Combine vicuna chat templates * Add tests for openchat and vicuna chat templates * Add chat template for alpaca * Add separate template name for vicuna-orca * Remove alpaca, match deepseek with jinja output * Regenerate chat template test with add_generation_prompt * Separate deepseek bos from system message * Match openchat template with jinja output * Remove BOS token from templates, unprefix openchat

kaizau added 9 commits March 30, 2024 10:52

Add openchat chat template

d19df2c

Add chat template test for openchat

0d24c6a

Add chat template for vicuna

f6104b9

Add chat template for orca-vicuna

e0f9d9d

Add EOS for vicuna templates

e423aa1

Combine vicuna chat templates

5305d68

Add tests for openchat and vicuna chat templates

c708544

Merge branch 'ggerganov:master' into master

ce48a6e

Add chat template for alpaca

f1a3b12

kaizau changed the title ~~Add OpenChat, Starling, Vicuna chat template support~~ Add OpenChat, Alpaca, Vicuna chat templates Mar 30, 2024

Add separate template name for vicuna-orca

a4986dd

ggerganov requested a review from ngxson March 30, 2024 12:27

ngxson requested changes Mar 30, 2024

View reviewed changes

Jeximo reviewed Mar 30, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

kaizau added 4 commits April 1, 2024 16:40

Remove alpaca, match deepseek with jinja output

d297225

Regenerate chat template test with add_generation_prompt

9165380

Separate deepseek bos from system message

1eebfc9

Match openchat template with jinja output

cfcbc7a

ngxson requested changes Apr 2, 2024

View reviewed changes

Remove BOS token from templates, unprefix openchat

48850cf

Merge branch 'ggerganov:master' into master

021c6f5

kaizau requested a review from ngxson April 3, 2024 14:41

ngxson approved these changes Apr 3, 2024

View reviewed changes

ngxson merged commit 1ff4d9f into ggerganov:master Apr 3, 2024
5 checks passed

kaizau mentioned this pull request Apr 17, 2024

Proposal: An alternative to chat templates #6726

Closed

4 tasks

jukofyork mentioned this pull request May 19, 2024

Add alpaca chat template #7383

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenChat, Alpaca, Vicuna chat templates #6397

Add OpenChat, Alpaca, Vicuna chat templates #6397

kaizau commented Mar 30, 2024 •

edited

Loading

ngxson left a comment

ngxson Mar 30, 2024

ngxson Mar 30, 2024

kaizau Mar 31, 2024

ngxson Apr 1, 2024

Jeximo commented Mar 30, 2024

github-actions bot commented Apr 1, 2024 •

edited

Loading

kaizau commented Apr 1, 2024 •

edited

Loading

Jeximo commented Apr 1, 2024

ngxson commented Apr 1, 2024 •

edited

Loading

kaizau commented Apr 2, 2024 •

edited

Loading

Jeximo commented Apr 2, 2024

kaizau commented Apr 2, 2024 •

edited

Loading

SamuelTallet commented Apr 2, 2024 •

edited

Loading

ngxson commented Apr 2, 2024

ngxson commented Apr 2, 2024

ngxson Apr 2, 2024

ggerganov commented Apr 3, 2024

kaizau commented Apr 3, 2024

kaizau commented Apr 3, 2024

ngxson commented Apr 3, 2024 •

edited

Loading

ngxson left a comment

ggerganov commented Apr 3, 2024

Folko-Ven commented Apr 4, 2024

ngxson commented Apr 4, 2024

Folko-Ven commented Apr 4, 2024

ngxson commented Apr 4, 2024

Folko-Ven commented Apr 4, 2024

wtarreau commented Apr 6, 2024

Add OpenChat, Alpaca, Vicuna chat templates #6397

Add OpenChat, Alpaca, Vicuna chat templates #6397

Conversation

kaizau commented Mar 30, 2024 • edited Loading

ngxson left a comment

Choose a reason for hiding this comment

ngxson Mar 30, 2024

Choose a reason for hiding this comment

ngxson Mar 30, 2024

Choose a reason for hiding this comment

kaizau Mar 31, 2024

Choose a reason for hiding this comment

ngxson Apr 1, 2024

Choose a reason for hiding this comment

Jeximo commented Mar 30, 2024

github-actions bot commented Apr 1, 2024 • edited Loading

kaizau commented Apr 1, 2024 • edited Loading

Jeximo commented Apr 1, 2024

ngxson commented Apr 1, 2024 • edited Loading

kaizau commented Apr 2, 2024 • edited Loading

Jeximo commented Apr 2, 2024

kaizau commented Apr 2, 2024 • edited Loading

SamuelTallet commented Apr 2, 2024 • edited Loading

ngxson commented Apr 2, 2024

ngxson commented Apr 2, 2024

ngxson Apr 2, 2024

Choose a reason for hiding this comment

ggerganov commented Apr 3, 2024

kaizau commented Apr 3, 2024

kaizau commented Apr 3, 2024

ngxson commented Apr 3, 2024 • edited Loading

ngxson left a comment

Choose a reason for hiding this comment

ggerganov commented Apr 3, 2024

Folko-Ven commented Apr 4, 2024

ngxson commented Apr 4, 2024

Folko-Ven commented Apr 4, 2024

ngxson commented Apr 4, 2024

Folko-Ven commented Apr 4, 2024

wtarreau commented Apr 6, 2024

kaizau commented Mar 30, 2024 •

edited

Loading

github-actions bot commented Apr 1, 2024 •

edited

Loading

kaizau commented Apr 1, 2024 •

edited

Loading

ngxson commented Apr 1, 2024 •

edited

Loading

kaizau commented Apr 2, 2024 •

edited

Loading

kaizau commented Apr 2, 2024 •

edited

Loading

SamuelTallet commented Apr 2, 2024 •

edited

Loading

ngxson commented Apr 3, 2024 •

edited

Loading