-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenChat, Alpaca, Vicuna chat templates #6397
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also test to see if the output really matches with the python version of these template? You can use the python code here: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
Please also let me know what must be added to the wiki page.
llama.cpp
Outdated
} else if (role == "user") { | ||
ss << "### Instruction:\n" << message->content << "\n\n"; | ||
} else if (role == "assistant") { | ||
ss << "### Response:\n" << message->content << "\n\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alpaca template and deepseek template both look similar at the first glance, but the main different is that alpaca template only used for instruction-response (one turn) and not multiple turns like modern chat template.
deepseek extends the notion of instruction-response into multi-turn by placing <|EOT|>
token between each turn, so the formatted chat should look like:
### Instruction:
who are you?
### Response:
I am assistant
<|EOT|>
### Instruction:
1+1 is
### Response:
equal to 2
<|EOT|>
So what missing here is that <|EOT|>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chat above is produced by python code + jinja template, it doesn't seem to have "\n\n"
at the end of each message, so I think the "\n\n"
should be replaced by "\n"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the python script! Included the Jinja output of OpenChat and DeepSeek below. And as you mentioned, the other two fail due to not having templates in config_tokenizers.json
.
Will add <|EOT|>
for DeepSeek when I have moment tomorrow.
openchat/openchat-3.5-0106
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
<s>GPT4 Correct System: You are a helpful assistant<|end_of_turn|>GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi there<|end_of_turn|>GPT4 Correct User: Who are you<|end_of_turn|>GPT4 Correct Assistant: I am an assistant <|end_of_turn|>GPT4 Correct User: Another question<|end_of_turn|>
------------------------------
deepseek-ai/deepseek-coder-33b-instruct
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
<|begin▁of▁sentence|>You are a helpful assistant### Instruction:
Hello
### Response:
Hi there
<|EOT|>
### Instruction:
Who are you
### Response:
I am an assistant
<|EOT|>
### Instruction:
Another question
------------------------------
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me, just a quick note is that <|begin▁of▁sentence|>
is not needed, because BOS is always added on server
Mistral Instruct may be good for templating, https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 |
@ngxson @Jeximo Three updates and a question:
Question is about |
When I search Orca, it's |
Thanks for the efforts @kaizau . IMO chat/prompt template has always been a quite messy topic (rabbit hole as you said). You can see on the beginning of #4216 there was a discuss about that. After some more researches I think it's OK to keep vicuna/vicuna-orca. While they does not have official jinja template, I think we can maybe ask the model's author to add one (or the one to convert it to gguf to add one). One of the thing I fear was that some templates do not have multi-turn capability from the beginning, like alpaca for example, but people try to retro-fit it. Turns out, that's not the case of vicuna, so it's safe to assume that all vicuna-based models support multi-turn. |
@ngxson Makes sense. Any other code / formatting changes you'd like to see here? I'll draft up a readme update shortly. Relatedly, a quirk I've noticed in using the OpenChat and Vicuna templates is that the first character of every assistant message is now always " ". This is because these 3 templates all use ": " as the role separator — yet all of the official / reference I can't tell if this is an oversight or as intended. Adding the space after the colon in each Did you encounter anything similar with previous chat templates? |
readme under I'm not sure of the correct solution - I had a similar experience with CLI in I included the space for User, and excluded it for Assistant in order to strickly adhere to the template. I think it's intentional, but I may be wrong. |
@ngxson Was about to paste the readme update here, but realized I already had edit access to the page? Either way, added the 4 templates: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template I also added a "how to add a template" section that hopefully makes it incrementally easier for others. It includes the updated version of your script that outputs in a format identical to |
@kaizau The part OpenChat author said the system prompt should be appended without prefix. Source: https://huggingface.co/openchat/openchat_3.5/discussions/5#65448109b4a3f3a2f486fd9d |
That's because tokenizers tend to encode both the word and the space into the same token. For example, using https://platform.openai.com/tokenizer : Adding a trailing space in the assistant prompt Sadly there's no other way to get rid of this problem. The root cause in fact is because this class of template does not have special tokens like |
Nice, thanks! That looks good to me. I don't know how the permission system in wiki page works, but I glad to know that you have write access to wiki. |
llama.cpp
Outdated
for (auto message : chat) { | ||
std::string role(message->role); | ||
if (message == chat.front()) { | ||
ss << "<|begin▁of▁sentence|>"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing left need to do is to remove this <|begin▁of▁sentence|>
, this is because server already add BOS token to input prompt by default.
@SamuelTallet Thanks! I saw that thread too and originally implemented the unprefixed version. But running the actual Jinja template from the model's tokenizer_config.json produces This is unfortunately the state of templates right now. 🥲 I've left a comment asking for clarification, but will default to the unprefixed. |
@ngxson Thanks for the explanation. Just removed prefixes for both OpenChat and DeekSeek. If the BOS token is automatically added, then my python script update probably oversells the extent to which copy-and-pasting the output as a test will work. The special tokens would have to be manually removed. But I can clarify that in the next wiki update. Aside: I was also surprised to find I could edit the wiki directly — was fully expecting a "your edits are pending approval" screen when I hit save. 😅 |
@ggerganov Yes, it is important to restrict write access to wiki. Ideally IMO we can allow only a list of people (not all contributors), but I'm not sure if this option is possible on github. The reason is because changes to wiki does not requires review. Bad actors may be able to exploit contributor's write access to change content on wiki. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
@kaizau Hello, I apologize for disturbing you, but is there any hope for the addition of Mistral templates? |
@ggerganov Thanks. That's ok for now I think. We can consider moving wiki to doc files later. Personally, I still feel like the UI of wiki page is more simple to navigate. @Folko-Ven Mistral uses llama2 template. Maybe we can add |
We do support 3 variants of llama2. Mistral uses the variant with spaces around message content. As long as the model have the correct jinja template, it will be auto-detected and correct template will be used. |
@ngxson Got it, thanks for explaining! |
Regarding the limitations to access the wiki from a list of people, the only solution we've found in haproxy was to create a dedicated project for the wiki and send invites to those who want to contribute. The main project's wiki is simply redirected to the wiki project and that solved the issues. but it's indeed annoying. |
* Add openchat chat template * Add chat template test for openchat * Add chat template for vicuna * Add chat template for orca-vicuna * Add EOS for vicuna templates * Combine vicuna chat templates * Add tests for openchat and vicuna chat templates * Add chat template for alpaca * Add separate template name for vicuna-orca * Remove alpaca, match deepseek with jinja output * Regenerate chat template test with add_generation_prompt * Separate deepseek bos from system message * Match openchat template with jinja output * Remove BOS token from templates, unprefix openchat
This PR adds chat templates for some of the more popular non-ChatML models (that I know of, at least!).
Named
openchat
,vicuna
, andalpaca
respectively.I based OpenChat's on the official Jinja template, and Vicuna's on the one from text-generation-web-ui (couldn't find it in any model's config_tokenizer.json, but it matches what I saw in model cards and HF discussions). Alpaca was done using DeepSeek's template since the original also predates Jinja chat templates.
Caveat: Because none of the Vicuna models I've tested seem to include a chat template string, there doesn't seem to be a good way to heuristically detect the Orca variant. I've worked around this by creating a
vicuna-orca
template that's also handled byvicuna
. Open to alternatives here.New to C++ and this project, so please don't hesitate to mention any details I may have missed!