Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server: use llama_chat_apply_template #5593

Merged
merged 5 commits into from
Feb 20, 2024

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Feb 19, 2024

Closes #5575

This PR replaces the usage --chat-template introduced #5425 . This parameter now accepts a jinja template instead of type name.

If --chat-template is not specified, the default template (taken from model metadata) will be used instead.

This PR also fix the issue where llama_chat_apply_template does not read the metadata correctly.

CC @ggerganov and @cebtenzzre for review. Thank you!

@@ -2390,12 +2391,13 @@ static void server_params_parse(int argc, char **argv, server_params &sparams,
break;
}
std::string value(argv[i]);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value seems unused now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's unused, I forgot to remove that. It's now removed

std::ostringstream output;
bool is_inside_turn = false;
// Check if the template supplied via "--chat-template" is supported or not. Returns true if it's valid
inline bool verify_custom_template(std::string tmpl) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline bool verify_custom_template(std::string tmpl) {
inline bool verify_custom_template(const std::string & tmpl) {

Comment on lines 186 to 192
for (size_t i = 0; i < messages.size(); ++i) {
auto &curr_msg = messages[i];
str[i] = json_value(curr_msg, "role", std::string(""));
str[i + 1] = json_value(curr_msg, "content", std::string(""));
alloc_size += str[i + 1].length();
chat[i].role = str[i].c_str();
chat[i].content = str[i + 1].c_str();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a bug here. Maybe change to str[2*i + 0] = ... and str[2*i + 1] = ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for notice that. That's why I noticed that the bot's response is quite weird when I test this PR yesterday.

Fixed on c53b34d

Looking at the debug log, I can confirm that the formatted chat is correct:

{"timestamp":1708423580,"level":"VERBOSE","function":"format_chat","line":208,"message":"formatted_chat","text":"<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nhi, how are you<|im_end|>\n<|im_start|>assistant\n"}

examples/server/server.cpp Outdated Show resolved Hide resolved
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
@ngxson ngxson marked this pull request as draft February 20, 2024 14:13
@ngxson
Copy link
Collaborator Author

ngxson commented Feb 20, 2024

I still spot a weird bug where the chat is formatted correctly, but then \u0000 is added when it's tokenized. I changed to draft and investigating that:

{"timestamp":1708438302,"level":"VERBOSE","function":"format_chat","line":208,"message":"formatted_chat","text":"[INST] <<SYS>>\nYou are a helpful assistant.\n<</SYS>>\n\nhi, how are you [/INST]"}
{"timestamp":1708438302,"level":"VERBOSE","function":"start_loop","line":293,"message":"have new task"}
{"timestamp":1708438302,"level":"VERBOSE","function":"start_loop","line":305,"message":"callback_new_task"}
slot 0 is processing [task id: 0]
{"timestamp":1708438302,"level":"VERBOSE","function":"start_loop","line":308,"message":"callback_all_task_finished"}
slot 0 : kv cache rm - [0, end)
{"timestamp":1708438302,"level":"VERBOSE","function":"update_slots","line":1685,"message":"prompt ingested","n_past":0,"cached":"","to_eval":"[INST] <<SYS>>\nYou are a helpful assistant.\n<</SYS>>\n\nhi, how are you [/INST]\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"}

Edit: found it! I forgot to buf.resize after receiving the result from llama_chat_apply_template

Works fine now (tested with both chatml + llama2 template)

@ngxson ngxson marked this pull request as ready for review February 20, 2024 14:21
@ngxson ngxson merged commit 9c405c9 into ggerganov:master Feb 20, 2024
44 checks passed
@ibehnam
Copy link
Contributor

ibehnam commented Feb 20, 2024

@ngxson

Since this is a breaking change, it'd be good to update the server README to mention that the chat-template arg is now different. An example would be nice too.

Also, I found the following message vague. What are the "common" templates?

--chat-template JINJA_TEMPLATE
                            set custom jinja chat template (default: template taken from model's metadata)
                            Note: only commonly used templates are accepted, since we don't have jinja parser

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 21, 2024

@ibehnam Yeah I forgot about the doc. You're right, in fact, I was thinking about how to make it clear that which templates we support when showing this help, but the problem is that it's depends on llama_chat_apply_template. That function is the one that must be documented.

My idea is that we can add a section in server's doc that show how to use the --chat-template, then include a link to llama_chat_apply_template where user can see a list of supported templates.

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* server: use llama_chat_apply_template

* server: remove trailing space

* server: fix format_chat

* server: fix help message

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* server: fix formatted_chat

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* server: use llama_chat_apply_template

* server: remove trailing space

* server: fix format_chat

* server: fix help message

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* server: fix formatted_chat

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Server: use llama_chat_apply_template to format the chat
3 participants