Added llama-3 chat template #6751

DifferentialityDevelopment · 2024-04-18T21:53:08Z

This is just simply to add the llama 3 chat template

DifferentialityDevelopment · 2024-04-18T22:15:21Z

This is my first ever pull request, so please feel free to give me feedback on anything I could improve upon.

SamuelTallet

Looks good! I just have some non-critical comments.

llama.cpp

tests/test-chat-template.cpp

Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>

thecivilizedgamer · 2024-04-18T22:40:19Z

@DifferentialityDevelopment thanks for your quick work on getting a PR open, I pulled your changes to llama.cpp and rebuilt, then tried the new template. I'm seeing some issues with it still, but maybe I'm doing something wrong....? I'm running llama.cpp in server mode, with --chat-template llama3. Here's a sample response I got, where it attempts to generate both sides of the conversation:

Hello! It's so nice to chat with you again! I'm your friendly virtual assistant. How's your day going so far? Do you need any help or assistance with anything? I'm all ears (or rather, all text) and happy to lend a hand!assistant

Hi! Thanks for checking in! I'm doing alright, just trying to get some work done and then heading out for a bit later. Nothing too exciting. But it's always great to chat with you. How about you? How's your day going?assistant

Thank you! I'm doing well,.......

DifferentialityDevelopment · 2024-04-18T23:12:07Z

@DifferentialityDevelopment thanks for your quick work on getting a PR open, I pulled your changes to llama.cpp and rebuilt, then tried the new template. I'm seeing some issues with it still, but maybe I'm doing something wrong....? I'm running llama.cpp in server mode, with --chat-template llama3. Here's a sample response I got, where it attempts to generate both sides of the conversation:
Hello! It's so nice to chat with you again! I'm your friendly virtual assistant. How's your day going so far? Do you need any help or assistance with anything? I'm all ears (or rather, all text) and happy to lend a hand!assistant

Hi! Thanks for checking in! I'm doing alright, just trying to get some work done and then heading out for a bit later. Nothing too exciting. But it's always great to chat with you. How about you? How's your day going?assistant

Thank you! I'm doing well,.......

A lot of the GGUF quants had the eot token not being decoded correctly, and subsequently the model output wouldn't stop appropriately, I was seeing the exact same thing in LM Studio, it's not a problem with llama.cpp itself.
For reference this is the GGUF I'm currently using that doesn't have the issue
https://huggingface.co/lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF

x4080 · 2024-04-19T01:46:53Z

Hi, I think you should also modify file utils.hpp

    llama_params["stop"].push_back("<|im_end|>"); // chatml
    llama_params["stop"].push_back("<end_of_turn>"); // gemma
    llama_params["stop"].push_back("<|eot_id|>"); // llama 3 <-- need this

to make it stop at "<|eot_id|>"

thecivilizedgamer · 2024-04-19T03:34:13Z

@DifferentialityDevelopment thanks for your quick work on getting a PR open, I pulled your changes to llama.cpp and rebuilt, then tried the new template. I'm seeing some issues with it still, but maybe I'm doing something wrong....? I'm running llama.cpp in server mode, with --chat-template llama3. Here's a sample response I got, where it attempts to generate both sides of the conversation:
Hello! It's so nice to chat with you again! I'm your friendly virtual assistant. How's your day going so far? Do you need any help or assistance with anything? I'm all ears (or rather, all text) and happy to lend a hand!assistant

Hi! Thanks for checking in! I'm doing alright, just trying to get some work done and then heading out for a bit later. Nothing too exciting. But it's always great to chat with you. How about you? How's your day going?assistant

Thank you! I'm doing well,.......
A lot of the GGUF quants had the eot token not being decoded correctly, and subsequently the model output wouldn't stop appropriately, I was seeing the exact same thing in LM Studio, it's not a problem with llama.cpp itself. For reference this is the GGUF I'm currently using that doesn't have the issue https://huggingface.co/lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF

Ah, that makes sense. I switched models and now it's working perfectly. Thanks to you and everyone else for your efforts :)

DifferentialityDevelopment · 2024-04-19T06:27:59Z

Hi, I think you should also modify file utils.hpp

    llama_params["stop"].push_back("<|im_end|>"); // chatml
    llama_params["stop"].push_back("<end_of_turn>"); // gemma
    llama_params["stop"].push_back("<|eot_id|>"); // llama 3 <-- need this

to make it stop at "<|eot_id|>"

I'm adding this now, thanks for this!

llama.cpp

ngxson

LGTM. Thank you!

DifferentialityDevelopment · 2024-04-19T09:26:27Z

I'm not 100% sure but I think I know why the one test might be failing
tests/test_chat_template.cpp line 79 begins with "<|begin_of_text|>" ie the bos token
though I noticed that none of the other models included their bos token in their expected output, and this lines up with the change @ngxson asked me to make to remove the adding of the bos token to the chat template as it's automatically handled.
So I'm guessing that removing that would make that test pass

ggerganov · 2024-04-19T09:44:18Z

Yes, you need to remove the BOS text from the reference string

DifferentialityDevelopment · 2024-04-19T09:46:03Z

Yes, you need to remove the BOS text from the reference string

Done!

foldl · 2024-04-19T10:08:20Z

Is this missing? {% if loop.index0 == 0 %}{% set content = bos_token + content %}

ngxson · 2024-04-19T10:17:41Z

Is this missing? {% if loop.index0 == 0 %}{% set content = bos_token + content %}

As explained in #6751 (comment) , BOS token is added by tokenizer, so it should not appear in the template

foldl · 2024-04-19T10:41:48Z

@ngxson Oh, thanks. I got it wrong.

The template itself is badly designed. It can be simplified:

{bos_token}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}

ngxson · 2024-04-19T15:17:48Z

FYI, I added llama3 to list of supported templates in wiki page. This PR looks good to me and should get merged now. The failed CI job (build server) doesn't seem to be relevant to changes from this PR. For extra safe, I'll ask to @ggerganov merge it. Thank you all for your efforts.

ngxson · 2024-04-19T23:42:10Z

Yes, it should be removed. If we decide to add EOS token as stop sequence, we will also need to add for other templates (</s>, <|EOT|>, ...)

teleprint-me · 2024-04-20T06:14:22Z

<|eot_id|> is End of Turn.

Meta always includes the templates in their source code. Should always reference it as a guide.

The end of each message is marked by the <|eot_id|> token. source

phymbert · 2024-04-20T14:04:30Z

@ngxson @ggerganov Can we merge it ?

ngxson · 2024-04-20T14:09:04Z

Yes it looks good to me. I’m just wondering if we want to wait for the other PR that allows converting the model, then test the converted model with this template before actually merge it?

phymbert · 2024-04-20T14:10:42Z

Yes, I see the other after, better to wait

reneleonhardt · 2024-04-20T14:15:58Z

Yes it looks good to me. I’m just wondering if we want to wait for the other PR that allows converting the model, then test the converted model with this template before actually merge it?

Will converting the model help fixing the "broken IQ and imatrix quants" #6747 (comment), or are all problems originating from the base model itself?

ggerganov · 2024-04-20T14:17:40Z

Yes it looks good to me. I’m just wondering if we want to wait for the other PR that allows converting the model, then test the converted model with this template before actually merge it?

Yes, let's first merge #6745

== Relevant log messages from source repo: commit 40f74e4d739e9250431cf339ae7588b28d8d0663 Author: Georgi Gerganov <ggerganov@gmail.com> Date: Sun Apr 21 18:36:45 2024 +0300 llama : add option to render special/control tokens (#6807) * make : fix common dep on llama.h * llama : add option to render special tokens * readme : add API change notice ggml-ci * swift : fix build commit b9cc76d87e3d7ae5900f19d4fe8f8976d0a35888 Author: Georgi Gerganov <ggerganov@gmail.com> Date: Sun Apr 21 16:47:57 2024 +0300 ggml : fix ggml_backend_cpu_supports_op() for CPY (#0) commit 7dbdba5690ca61b3ee8c92cfac8e7e251042e787 Author: Wouter <9594229+DifferentialityDevelopment@users.noreply.github.com> Date: Sun Apr 21 15:03:39 2024 +0200 llama : add llama-3 chat template (#6751) * Added llama-3 chat template * Update llama.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Update llama.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Update tests/test-chat-template.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Added EOS stop sequence according to ggerganov/llama.cpp#6751 (comment) * Removed adding of BOS token before first message * Removed bos token from expected output from llama-3 * Update tests/test-chat-template.cpp Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> * Update tests/test-chat-template.cpp Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> * Added <|end_of_text|> as another stop token * Reverted last change of adding the end_of_text stop word for llama 3 --------- Co-authored-by: Wouter Tichelaar <tichelaarw@spar.net> Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Added llama-3 chat template * Update llama.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Update llama.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Update tests/test-chat-template.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Added EOS stop sequence according to ggerganov#6751 (comment) * Removed adding of BOS token before first message * Removed bos token from expected output from llama-3 * Update tests/test-chat-template.cpp Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> * Update tests/test-chat-template.cpp Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> * Added <|end_of_text|> as another stop token * Reverted last change of adding the end_of_text stop word for llama 3 --------- Co-authored-by: Wouter Tichelaar <tichelaarw@spar.net> Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

x4080 · 2024-04-25T21:29:06Z

Hi, in the latest version "<|eot_id|>" appears at the end of conversation, it seems utils.hpp dont have the stop token now

DifferentialityDevelopment · 2024-04-25T22:35:06Z

It's due to a different pull request that got merged I think
b97bc39#diff-ad8b15a29dd7c625dd2688de421972baaa73494a72d7210d679efc5f2ec0d888

llama_token_is_eog is supposed to return true for <|eot_id|> as far as I'm aware

bool llama_token_is_eog(const struct llama_model * model, llama_token token) { return token != -1 && ( token == llama_token_eos(model) || token == llama_token_eot(model) ); }

thecivilizedgamer · 2024-04-25T22:49:47Z

I'm seeing the same issue, with not only Llama 3 but also Phi 3 as described in this issue #6903

Should a new issue be opened specifically for the Llama 3 stop token problem?

DifferentialityDevelopment · 2024-04-25T22:52:18Z

I'm seeing the same issue, with not only Llama 3 but also Phi 3 as described in this issue #6903

Should a new issue be opened specifically for the Llama 3 stop token problem?

This pull request was supposed to fix the problem
#6860

I'm downloading a quant that was recently created to see if it exhibits the same behavior

thecivilizedgamer · 2024-04-25T23:00:06Z

I'm seeing the same issue, with not only Llama 3 but also Phi 3 as described in this issue #6903
Should a new issue be opened specifically for the Llama 3 stop token problem?

This pull request was supposed to fix the problem #6860

I'm downloading a quant that was recently created to see if it exhibits the same behavior

No you're right, I just pulled the latest commit and rebuilt, and it's now working again (for Llama 3, not for Phi 3) 😃

DifferentialityDevelopment · 2024-04-25T23:01:02Z

The latest quant (for llama 3) I just downloaded does not exhibit this behavior,
Specifically I tested this https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF
The latest GGUF for phi 3 I could find was 10 hours ago, but it could have been made with an older version of llama.cpp
Going to do make my own quant using the latest code and see if it still happens

Update:
I'm unable to convert microsoft/Phi-3-mini-4k-instruct to gguf using convert.py with latest llama.cpp/convert.py
Not sure why as the pull request to support Phi 3 conversion was already merged..

DifferentialityDevelopment · 2024-04-26T07:48:39Z

So I had it wrong, you can now specify stop words from the client side

Without adding stop parameter

With the addition of stop parameter

x4080 · 2024-04-26T22:34:00Z

@DifferentialityDevelopment do you mean adding "<|eot_id|>" as stop token ? Shouldn't it be automatically stop without adding stop token ?

DifferentialityDevelopment · 2024-04-26T22:38:56Z

@DifferentialityDevelopment do you mean adding "<|eot_id|>" as stop token ? Shouldn't it be automatically stop without adding stop token ?

I'm not all too sure, it depends on how the gguf was created, what the tokenizer config was at the time etc, for instance llama 3 I tested and it stopped just fine, but with phi I had to add it else it would output the eot token, with the parameter it stops upon encountering any of the specified stop tokens.
It's a new thing that was added.

* Added llama-3 chat template * Update llama.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Update llama.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Update tests/test-chat-template.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Added EOS stop sequence according to ggerganov/llama.cpp#6751 (comment) * Removed adding of BOS token before first message * Removed bos token from expected output from llama-3 * Update tests/test-chat-template.cpp Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> * Update tests/test-chat-template.cpp Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> * Added <|end_of_text|> as another stop token * Reverted last change of adding the end_of_text stop word for llama 3 --------- Co-authored-by: Wouter Tichelaar <tichelaarw@spar.net> Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

teleprint-me · 2024-04-27T05:30:15Z

Special tokens are not handled by the general API, but the Special tokens are handled by the templates, or at least they were. I've been really busy this past week, so I haven't had time to review what the specific changes were because they were handled previously during chat.

Added llama-3 chat template

1f7945f

DifferentialityDevelopment mentioned this pull request Apr 18, 2024

llama3 family support #6747

Closed

SamuelTallet reviewed Apr 18, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

llama.cpp Outdated Show resolved Hide resolved

tests/test-chat-template.cpp Outdated Show resolved Hide resolved

DifferentialityDevelopment and others added 3 commits April 19, 2024 00:27

Update llama.cpp

24874e7

Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>

Update llama.cpp

70eb88c

Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>

Update tests/test-chat-template.cpp

bf63ff5

Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>

DifferentialityDevelopment requested a review from SamuelTallet April 18, 2024 23:14

zhentaoyu mentioned this pull request Apr 19, 2024

[Model Enabling] llama3-8b-instruct-chat Enabling intel/neural-speed#225

Merged

4 tasks

Added EOS stop sequence according to ggerganov#6751 (comment)

7370d66

reneleonhardt mentioned this pull request Apr 19, 2024

Draft: feat: Support Llama 3 model (#478) carlrobertoh/CodeGPT#479

Merged

ggerganov approved these changes Apr 19, 2024

View reviewed changes

ggerganov requested review from ngxson and removed request for SamuelTallet April 19, 2024 07:42

ngxson requested changes Apr 19, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

Removed adding of BOS token before first message

a55d8a9

ngxson approved these changes Apr 19, 2024

View reviewed changes

Removed bos token from expected output from llama-3

373bab1

ngxson mentioned this pull request Apr 20, 2024

Support Llama 3 conversion #6745

Merged

mlim15 mentioned this pull request Apr 20, 2024

Prompt Format Updates for LLama3 huggingface/chat-ui#1035

Open

This was referenced Apr 20, 2024

Flaky server responses with llama 3 #6785

Closed

Special tokens are not rendered correctly (as empty) -- llama3 specific? #6770

Closed

Touch-Night mentioned this pull request Apr 20, 2024

Add Llama 3 instruction template oobabooga/text-generation-webui#5891

Merged

1 task

This was referenced Apr 20, 2024

Llama.cpp server destroys <|eot_id|> token even midway through prompt! #6793

Closed

Error Converting Hugging Face Llama 3 finetuned Model to GUFF #6775

Closed

Merge branch 'master' into master

16f8bba

ggerganov merged commit 7dbdba5 into ggerganov:master Apr 21, 2024
51 of 60 checks passed

K-Mistele mentioned this pull request Apr 23, 2024

[New Model]: Llama 3 8B Instruct vllm-project/vllm#4297

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added llama-3 chat template #6751

Added llama-3 chat template #6751

DifferentialityDevelopment commented Apr 18, 2024

DifferentialityDevelopment commented Apr 18, 2024

SamuelTallet left a comment

thecivilizedgamer commented Apr 18, 2024

DifferentialityDevelopment commented Apr 18, 2024 •

edited

Loading

x4080 commented Apr 19, 2024

thecivilizedgamer commented Apr 19, 2024

DifferentialityDevelopment commented Apr 19, 2024

ngxson left a comment

DifferentialityDevelopment commented Apr 19, 2024

ggerganov commented Apr 19, 2024

DifferentialityDevelopment commented Apr 19, 2024

foldl commented Apr 19, 2024

ngxson commented Apr 19, 2024

foldl commented Apr 19, 2024

ngxson commented Apr 19, 2024

ngxson commented Apr 19, 2024

teleprint-me commented Apr 20, 2024 •

edited

Loading

phymbert commented Apr 20, 2024

ngxson commented Apr 20, 2024

phymbert commented Apr 20, 2024

reneleonhardt commented Apr 20, 2024 •

edited

Loading

ggerganov commented Apr 20, 2024

x4080 commented Apr 25, 2024

DifferentialityDevelopment commented Apr 25, 2024 •

edited

Loading

thecivilizedgamer commented Apr 25, 2024

DifferentialityDevelopment commented Apr 25, 2024 •

edited

Loading

thecivilizedgamer commented Apr 25, 2024

DifferentialityDevelopment commented Apr 25, 2024 •

edited

Loading

DifferentialityDevelopment commented Apr 26, 2024

x4080 commented Apr 26, 2024

DifferentialityDevelopment commented Apr 26, 2024

teleprint-me commented Apr 27, 2024

Added llama-3 chat template #6751

Added llama-3 chat template #6751

Conversation

DifferentialityDevelopment commented Apr 18, 2024

DifferentialityDevelopment commented Apr 18, 2024

SamuelTallet left a comment

Choose a reason for hiding this comment

thecivilizedgamer commented Apr 18, 2024

DifferentialityDevelopment commented Apr 18, 2024 • edited Loading

x4080 commented Apr 19, 2024

thecivilizedgamer commented Apr 19, 2024

DifferentialityDevelopment commented Apr 19, 2024

ngxson left a comment

Choose a reason for hiding this comment

DifferentialityDevelopment commented Apr 19, 2024

ggerganov commented Apr 19, 2024

DifferentialityDevelopment commented Apr 19, 2024

foldl commented Apr 19, 2024

ngxson commented Apr 19, 2024

foldl commented Apr 19, 2024

ngxson commented Apr 19, 2024

ngxson commented Apr 19, 2024

teleprint-me commented Apr 20, 2024 • edited Loading

phymbert commented Apr 20, 2024

ngxson commented Apr 20, 2024

phymbert commented Apr 20, 2024

reneleonhardt commented Apr 20, 2024 • edited Loading

ggerganov commented Apr 20, 2024

x4080 commented Apr 25, 2024

DifferentialityDevelopment commented Apr 25, 2024 • edited Loading

thecivilizedgamer commented Apr 25, 2024

DifferentialityDevelopment commented Apr 25, 2024 • edited Loading

thecivilizedgamer commented Apr 25, 2024

DifferentialityDevelopment commented Apr 25, 2024 • edited Loading

DifferentialityDevelopment commented Apr 26, 2024

x4080 commented Apr 26, 2024

DifferentialityDevelopment commented Apr 26, 2024

teleprint-me commented Apr 27, 2024

DifferentialityDevelopment commented Apr 18, 2024 •

edited

Loading

teleprint-me commented Apr 20, 2024 •

edited

Loading

reneleonhardt commented Apr 20, 2024 •

edited

Loading

DifferentialityDevelopment commented Apr 25, 2024 •

edited

Loading

DifferentialityDevelopment commented Apr 25, 2024 •

edited

Loading

DifferentialityDevelopment commented Apr 25, 2024 •

edited

Loading