Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added llama-3 chat template #6751

Merged
merged 12 commits into from
Apr 21, 2024

Conversation

DifferentialityDevelopment
Copy link
Contributor

This is just simply to add the llama 3 chat template

@DifferentialityDevelopment
Copy link
Contributor Author

This is my first ever pull request, so please feel free to give me feedback on anything I could improve upon.

Copy link
Contributor

@SamuelTallet SamuelTallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I just have some non-critical comments.

llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
tests/test-chat-template.cpp Outdated Show resolved Hide resolved
Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>
Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>
Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>
@thecivilizedgamer
Copy link

@DifferentialityDevelopment thanks for your quick work on getting a PR open, I pulled your changes to llama.cpp and rebuilt, then tried the new template. I'm seeing some issues with it still, but maybe I'm doing something wrong....? I'm running llama.cpp in server mode, with --chat-template llama3. Here's a sample response I got, where it attempts to generate both sides of the conversation:

Hello! It's so nice to chat with you again! I'm your friendly virtual assistant. How's your day going so far? Do you need any help or assistance with anything? I'm all ears (or rather, all text) and happy to lend a hand!assistant

Hi! Thanks for checking in! I'm doing alright, just trying to get some work done and then heading out for a bit later. Nothing too exciting. But it's always great to chat with you. How about you? How's your day going?assistant

Thank you! I'm doing well,.......

@DifferentialityDevelopment
Copy link
Contributor Author

DifferentialityDevelopment commented Apr 18, 2024

@DifferentialityDevelopment thanks for your quick work on getting a PR open, I pulled your changes to llama.cpp and rebuilt, then tried the new template. I'm seeing some issues with it still, but maybe I'm doing something wrong....? I'm running llama.cpp in server mode, with --chat-template llama3. Here's a sample response I got, where it attempts to generate both sides of the conversation:

Hello! It's so nice to chat with you again! I'm your friendly virtual assistant. How's your day going so far? Do you need any help or assistance with anything? I'm all ears (or rather, all text) and happy to lend a hand!assistant

Hi! Thanks for checking in! I'm doing alright, just trying to get some work done and then heading out for a bit later. Nothing too exciting. But it's always great to chat with you. How about you? How's your day going?assistant

Thank you! I'm doing well,.......

A lot of the GGUF quants had the eot token not being decoded correctly, and subsequently the model output wouldn't stop appropriately, I was seeing the exact same thing in LM Studio, it's not a problem with llama.cpp itself.
For reference this is the GGUF I'm currently using that doesn't have the issue
https://huggingface.co/lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF

image

@x4080
Copy link

x4080 commented Apr 19, 2024

Hi, I think you should also modify file utils.hpp

    llama_params["stop"].push_back("<|im_end|>"); // chatml
    llama_params["stop"].push_back("<end_of_turn>"); // gemma
    llama_params["stop"].push_back("<|eot_id|>"); // llama 3 <-- need this

to make it stop at "<|eot_id|>"

@thecivilizedgamer
Copy link

@DifferentialityDevelopment thanks for your quick work on getting a PR open, I pulled your changes to llama.cpp and rebuilt, then tried the new template. I'm seeing some issues with it still, but maybe I'm doing something wrong....? I'm running llama.cpp in server mode, with --chat-template llama3. Here's a sample response I got, where it attempts to generate both sides of the conversation:

Hello! It's so nice to chat with you again! I'm your friendly virtual assistant. How's your day going so far? Do you need any help or assistance with anything? I'm all ears (or rather, all text) and happy to lend a hand!assistant

Hi! Thanks for checking in! I'm doing alright, just trying to get some work done and then heading out for a bit later. Nothing too exciting. But it's always great to chat with you. How about you? How's your day going?assistant

Thank you! I'm doing well,.......

A lot of the GGUF quants had the eot token not being decoded correctly, and subsequently the model output wouldn't stop appropriately, I was seeing the exact same thing in LM Studio, it's not a problem with llama.cpp itself. For reference this is the GGUF I'm currently using that doesn't have the issue https://huggingface.co/lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF

Ah, that makes sense. I switched models and now it's working perfectly. Thanks to you and everyone else for your efforts :)

@DifferentialityDevelopment
Copy link
Contributor Author

Hi, I think you should also modify file utils.hpp

    llama_params["stop"].push_back("<|im_end|>"); // chatml
    llama_params["stop"].push_back("<end_of_turn>"); // gemma
    llama_params["stop"].push_back("<|eot_id|>"); // llama 3 <-- need this

to make it stop at "<|eot_id|>"

I'm adding this now, thanks for this!

@ggerganov ggerganov requested review from ngxson and removed request for SamuelTallet April 19, 2024 07:42
llama.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

@DifferentialityDevelopment
Copy link
Contributor Author

I'm not 100% sure but I think I know why the one test might be failing
tests/test_chat_template.cpp line 79 begins with "<|begin_of_text|>" ie the bos token
though I noticed that none of the other models included their bos token in their expected output, and this lines up with the change @ngxson asked me to make to remove the adding of the bos token to the chat template as it's automatically handled.
So I'm guessing that removing that would make that test pass

@ggerganov
Copy link
Owner

Yes, you need to remove the BOS text from the reference string

@DifferentialityDevelopment
Copy link
Contributor Author

Yes, you need to remove the BOS text from the reference string

Done!

@foldl
Copy link
Contributor

foldl commented Apr 19, 2024

Is this missing? {% if loop.index0 == 0 %}{% set content = bos_token + content %}

@ngxson
Copy link
Collaborator

ngxson commented Apr 19, 2024

Is this missing? {% if loop.index0 == 0 %}{% set content = bos_token + content %}

As explained in #6751 (comment) , BOS token is added by tokenizer, so it should not appear in the template

@foldl
Copy link
Contributor

foldl commented Apr 19, 2024

@ngxson Oh, thanks. I got it wrong.

The template itself is badly designed. It can be simplified:

{bos_token}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}

@ngxson
Copy link
Collaborator

ngxson commented Apr 19, 2024

FYI, I added llama3 to list of supported templates in wiki page. This PR looks good to me and should get merged now. The failed CI job (build server) doesn't seem to be relevant to changes from this PR. For extra safe, I'll ask to @ggerganov merge it. Thank you all for your efforts.

@ngxson
Copy link
Collaborator

ngxson commented Apr 19, 2024

Yes, it should be removed. If we decide to add EOS token as stop sequence, we will also need to add for other templates (</s>, <|EOT|>, ...)

@teleprint-me
Copy link
Contributor

teleprint-me commented Apr 20, 2024

<|eot_id|> is End of Turn.

Meta always includes the templates in their source code. Should always reference it as a guide.

The end of each message is marked by the <|eot_id|> token. source

@phymbert
Copy link
Collaborator

@ngxson @ggerganov Can we merge it ?

@ngxson
Copy link
Collaborator

ngxson commented Apr 20, 2024

Yes it looks good to me. I’m just wondering if we want to wait for the other PR that allows converting the model, then test the converted model with this template before actually merge it?

@phymbert
Copy link
Collaborator

Yes, I see the other after, better to wait

@reneleonhardt
Copy link
Contributor

reneleonhardt commented Apr 20, 2024

Yes it looks good to me. I’m just wondering if we want to wait for the other PR that allows converting the model, then test the converted model with this template before actually merge it?

Will converting the model help fixing the "broken IQ and imatrix quants" #6747 (comment), or are all problems originating from the base model itself?

@ggerganov
Copy link
Owner

Yes it looks good to me. I’m just wondering if we want to wait for the other PR that allows converting the model, then test the converted model with this template before actually merge it?

Yes, let's first merge #6745

@ggerganov ggerganov merged commit 7dbdba5 into ggerganov:master Apr 21, 2024
51 of 60 checks passed
github-actions bot pushed a commit to KerfuffleV2/ggml-sys-bleedingedge that referenced this pull request Apr 21, 2024
== Relevant log messages from source repo:

commit 40f74e4d739e9250431cf339ae7588b28d8d0663
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sun Apr 21 18:36:45 2024 +0300

    llama : add option to render special/control tokens (#6807)

    * make : fix common dep on llama.h

    * llama : add option to render special tokens

    * readme : add API change notice

    ggml-ci

    * swift : fix build

commit b9cc76d87e3d7ae5900f19d4fe8f8976d0a35888
Author: Georgi Gerganov <ggerganov@gmail.com>
Date:   Sun Apr 21 16:47:57 2024 +0300

    ggml : fix ggml_backend_cpu_supports_op() for CPY (#0)

commit 7dbdba5690ca61b3ee8c92cfac8e7e251042e787
Author: Wouter <9594229+DifferentialityDevelopment@users.noreply.github.com>
Date:   Sun Apr 21 15:03:39 2024 +0200

    llama : add llama-3 chat template (#6751)

    * Added llama-3 chat template

    * Update llama.cpp

    Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>

    * Update llama.cpp

    Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>

    * Update tests/test-chat-template.cpp

    Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>

    * Added EOS stop sequence according to ggerganov/llama.cpp#6751 (comment)

    * Removed adding of BOS token before first message

    * Removed bos token from expected output from llama-3

    * Update tests/test-chat-template.cpp

    Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com>

    * Update tests/test-chat-template.cpp

    Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com>

    * Added <|end_of_text|> as another stop token

    * Reverted last change of adding the end_of_text stop word for llama 3

    ---------

    Co-authored-by: Wouter Tichelaar <tichelaarw@spar.net>
    Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>
    Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com>
    Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
okuvshynov pushed a commit to okuvshynov/llama.cpp that referenced this pull request Apr 22, 2024
* Added llama-3 chat template

* Update llama.cpp

Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>

* Update llama.cpp

Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>

* Update tests/test-chat-template.cpp

Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>

* Added EOS stop sequence according to ggerganov#6751 (comment)

* Removed adding of BOS token before first message

* Removed bos token from expected output from llama-3

* Update tests/test-chat-template.cpp

Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com>

* Update tests/test-chat-template.cpp

Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com>

* Added <|end_of_text|> as another stop token

* Reverted last change of adding the end_of_text stop word for llama 3

---------

Co-authored-by: Wouter Tichelaar <tichelaarw@spar.net>
Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>
Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
@x4080
Copy link

x4080 commented Apr 25, 2024

Hi, in the latest version "<|eot_id|>" appears at the end of conversation, it seems utils.hpp dont have the stop token now

@DifferentialityDevelopment
Copy link
Contributor Author

DifferentialityDevelopment commented Apr 25, 2024

It's due to a different pull request that got merged I think
b97bc39#diff-ad8b15a29dd7c625dd2688de421972baaa73494a72d7210d679efc5f2ec0d888

llama_token_is_eog is supposed to return true for <|eot_id|> as far as I'm aware

bool llama_token_is_eog(const struct llama_model * model, llama_token token) { return token != -1 && ( token == llama_token_eos(model) || token == llama_token_eot(model) ); }

@thecivilizedgamer
Copy link

I'm seeing the same issue, with not only Llama 3 but also Phi 3 as described in this issue #6903

Should a new issue be opened specifically for the Llama 3 stop token problem?

@DifferentialityDevelopment
Copy link
Contributor Author

DifferentialityDevelopment commented Apr 25, 2024

I'm seeing the same issue, with not only Llama 3 but also Phi 3 as described in this issue #6903

Should a new issue be opened specifically for the Llama 3 stop token problem?

This pull request was supposed to fix the problem
#6860

I'm downloading a quant that was recently created to see if it exhibits the same behavior

@thecivilizedgamer
Copy link

I'm seeing the same issue, with not only Llama 3 but also Phi 3 as described in this issue #6903
Should a new issue be opened specifically for the Llama 3 stop token problem?

This pull request was supposed to fix the problem #6860

I'm downloading a quant that was recently created to see if it exhibits the same behavior

No you're right, I just pulled the latest commit and rebuilt, and it's now working again (for Llama 3, not for Phi 3) 😃

@DifferentialityDevelopment
Copy link
Contributor Author

DifferentialityDevelopment commented Apr 25, 2024

The latest quant (for llama 3) I just downloaded does not exhibit this behavior,
Specifically I tested this https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF
The latest GGUF for phi 3 I could find was 10 hours ago, but it could have been made with an older version of llama.cpp
Going to do make my own quant using the latest code and see if it still happens

Update:
I'm unable to convert microsoft/Phi-3-mini-4k-instruct to gguf using convert.py with latest llama.cpp/convert.py
Not sure why as the pull request to support Phi 3 conversion was already merged..

@DifferentialityDevelopment
Copy link
Contributor Author

So I had it wrong, you can now specify stop words from the client side

Without adding stop parameter
image

With the addition of stop parameter
image

@x4080
Copy link

x4080 commented Apr 26, 2024

@DifferentialityDevelopment do you mean adding "<|eot_id|>" as stop token ? Shouldn't it be automatically stop without adding stop token ?

@DifferentialityDevelopment
Copy link
Contributor Author

@DifferentialityDevelopment do you mean adding "<|eot_id|>" as stop token ? Shouldn't it be automatically stop without adding stop token ?

I'm not all too sure, it depends on how the gguf was created, what the tokenizer config was at the time etc, for instance llama 3 I tested and it stopped just fine, but with phi I had to add it else it would output the eot token, with the parameter it stops upon encountering any of the specified stop tokens.
It's a new thing that was added.

shg8 pushed a commit to shg8/llama-exp that referenced this pull request Apr 27, 2024
* Added llama-3 chat template

* Update llama.cpp

Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>

* Update llama.cpp

Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>

* Update tests/test-chat-template.cpp

Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>

* Added EOS stop sequence according to ggerganov/llama.cpp#6751 (comment)

* Removed adding of BOS token before first message

* Removed bos token from expected output from llama-3

* Update tests/test-chat-template.cpp

Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com>

* Update tests/test-chat-template.cpp

Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com>

* Added <|end_of_text|> as another stop token

* Reverted last change of adding the end_of_text stop word for llama 3

---------

Co-authored-by: Wouter Tichelaar <tichelaarw@spar.net>
Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>
Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
@teleprint-me
Copy link
Contributor

Special tokens are not handled by the general API, but the Special tokens are handled by the templates, or at least they were. I've been really busy this past week, so I haven't had time to review what the specific changes were because they were handled previously during chat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.