-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added llama-3 chat template #6751
Added llama-3 chat template #6751
Conversation
This is my first ever pull request, so please feel free to give me feedback on anything I could improve upon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I just have some non-critical comments.
Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>
Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>
Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com>
@DifferentialityDevelopment thanks for your quick work on getting a PR open, I pulled your changes to llama.cpp and rebuilt, then tried the new template. I'm seeing some issues with it still, but maybe I'm doing something wrong....? I'm running llama.cpp in server mode, with
|
A lot of the GGUF quants had the eot token not being decoded correctly, and subsequently the model output wouldn't stop appropriately, I was seeing the exact same thing in LM Studio, it's not a problem with llama.cpp itself. |
Hi, I think you should also modify file utils.hpp
to make it stop at "<|eot_id|>" |
Ah, that makes sense. I switched models and now it's working perfectly. Thanks to you and everyone else for your efforts :) |
I'm adding this now, thanks for this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
I'm not 100% sure but I think I know why the one test might be failing |
Yes, you need to remove the BOS text from the reference string |
Done! |
Is this missing? |
As explained in #6751 (comment) , BOS token is added by tokenizer, so it should not appear in the template |
@ngxson Oh, thanks. I got it wrong. The template itself is badly designed. It can be simplified:
|
FYI, I added llama3 to list of supported templates in wiki page. This PR looks good to me and should get merged now. The failed CI job (build server) doesn't seem to be relevant to changes from this PR. For extra safe, I'll ask to @ggerganov merge it. Thank you all for your efforts. |
Yes, it should be removed. If we decide to add EOS token as stop sequence, we will also need to add for other templates ( |
Meta always includes the templates in their source code. Should always reference it as a guide.
|
@ngxson @ggerganov Can we merge it ? |
Yes it looks good to me. I’m just wondering if we want to wait for the other PR that allows converting the model, then test the converted model with this template before actually merge it? |
Yes, I see the other after, better to wait |
Will converting the model help fixing the |
Yes, let's first merge #6745 |
== Relevant log messages from source repo: commit 40f74e4d739e9250431cf339ae7588b28d8d0663 Author: Georgi Gerganov <ggerganov@gmail.com> Date: Sun Apr 21 18:36:45 2024 +0300 llama : add option to render special/control tokens (#6807) * make : fix common dep on llama.h * llama : add option to render special tokens * readme : add API change notice ggml-ci * swift : fix build commit b9cc76d87e3d7ae5900f19d4fe8f8976d0a35888 Author: Georgi Gerganov <ggerganov@gmail.com> Date: Sun Apr 21 16:47:57 2024 +0300 ggml : fix ggml_backend_cpu_supports_op() for CPY (#0) commit 7dbdba5690ca61b3ee8c92cfac8e7e251042e787 Author: Wouter <9594229+DifferentialityDevelopment@users.noreply.github.com> Date: Sun Apr 21 15:03:39 2024 +0200 llama : add llama-3 chat template (#6751) * Added llama-3 chat template * Update llama.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Update llama.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Update tests/test-chat-template.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Added EOS stop sequence according to ggerganov/llama.cpp#6751 (comment) * Removed adding of BOS token before first message * Removed bos token from expected output from llama-3 * Update tests/test-chat-template.cpp Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> * Update tests/test-chat-template.cpp Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> * Added <|end_of_text|> as another stop token * Reverted last change of adding the end_of_text stop word for llama 3 --------- Co-authored-by: Wouter Tichelaar <tichelaarw@spar.net> Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Added llama-3 chat template * Update llama.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Update llama.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Update tests/test-chat-template.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Added EOS stop sequence according to ggerganov#6751 (comment) * Removed adding of BOS token before first message * Removed bos token from expected output from llama-3 * Update tests/test-chat-template.cpp Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> * Update tests/test-chat-template.cpp Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> * Added <|end_of_text|> as another stop token * Reverted last change of adding the end_of_text stop word for llama 3 --------- Co-authored-by: Wouter Tichelaar <tichelaarw@spar.net> Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Hi, in the latest version "<|eot_id|>" appears at the end of conversation, it seems utils.hpp dont have the stop token now |
It's due to a different pull request that got merged I think llama_token_is_eog is supposed to return true for <|eot_id|> as far as I'm aware
|
I'm seeing the same issue, with not only Llama 3 but also Phi 3 as described in this issue #6903 Should a new issue be opened specifically for the Llama 3 stop token problem? |
This pull request was supposed to fix the problem I'm downloading a quant that was recently created to see if it exhibits the same behavior |
No you're right, I just pulled the latest commit and rebuilt, and it's now working again (for Llama 3, not for Phi 3) 😃 |
The latest quant (for llama 3) I just downloaded does not exhibit this behavior, Update: |
@DifferentialityDevelopment do you mean adding "<|eot_id|>" as stop token ? Shouldn't it be automatically stop without adding stop token ? |
I'm not all too sure, it depends on how the gguf was created, what the tokenizer config was at the time etc, for instance llama 3 I tested and it stopped just fine, but with phi I had to add it else it would output the eot token, with the parameter it stops upon encountering any of the specified stop tokens. |
* Added llama-3 chat template * Update llama.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Update llama.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Update tests/test-chat-template.cpp Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> * Added EOS stop sequence according to ggerganov/llama.cpp#6751 (comment) * Removed adding of BOS token before first message * Removed bos token from expected output from llama-3 * Update tests/test-chat-template.cpp Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> * Update tests/test-chat-template.cpp Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> * Added <|end_of_text|> as another stop token * Reverted last change of adding the end_of_text stop word for llama 3 --------- Co-authored-by: Wouter Tichelaar <tichelaarw@spar.net> Co-authored-by: Samuel Tallet <36248671+SamuelTallet@users.noreply.github.com> Co-authored-by: Rene Leonhardt <65483435+reneleonhardt@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Special tokens are not handled by the general API, but the Special tokens are handled by the templates, or at least they were. I've been really busy this past week, so I haven't had time to review what the specific changes were because they were handled previously during chat. |
This is just simply to add the llama 3 chat template