bug: Duplicate BOS Token in Hugging Face Chat Templates #618

Van-QA · 2024-05-28T10:31:38Z

Description:
When using chat templates in Hugging Face, the Beginning-OfSentence (BOS) token is often already included in the template. However, Llama.cpp also automatically adds the BOS token, resulting in a duplicate BOS token.

Expected behavior:
The system should automatically detect and remove any duplicate BOS tokens in the chat template. This would ensure proper functioning of the chat system without causing errors due to redundant tokens.

Additional context:
This issue may cause unexpected behavior or errors in the chat system. It is recommended that Cortex checks for and deduplicates the BOS token if it is present in the user's template to maintain a consistent and error-free chat experience.

dan-menlo · 2024-09-08T04:52:42Z

@nguyenhoangthuan99 I am putting this as a sub-issue of #1151. This issue may be stale if HF has fixed upstream

nguyenhoangthuan99 · 2024-09-25T07:26:54Z

Problem
This is example of huggingface chat template that we are using <|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n

The token <|begin_of_text|> is BOS token. However, when we use this chat template, the llama.cpp engine automatically add <|begin_of_text|> to the prompt -> there are 2 BOS token in the prompt e.g. <|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|>.

This issue won't affect too much to the response, and most models still work well with this. We are using cortex with double BOS token with llama.cpp engine without any error.

Solution
There are 2 solutions right now for this:

Handle at cortex.llamacpp engine. Every time, when run inference we need to check if there is BOS token of that model in the chat template. the BOS token is parsed from GGUF file by llama.cpp
Handle at cortex.cpp side. We need to save BOS token to model.yml and let cortex.cpp remove it when request to cortex.llamacpp engine.

I think the first solution is better because we don't need to update the cortex.cpp part and model.yml file which is related to many other parts of cortex.cpp.

This issue is not critical, and it needs effort to read llama.cpp source to handle.

nguyenhoangthuan99 · 2024-09-30T02:10:10Z

I want to close this issue because llama.cpp set tokenizer_add_bos default to false on this PR

For that reason, this bug no longer appear when running inference with llama.cpp.
Tested with cortex.cpp, when running inference, no warning log about duplicated BOS token appeared.

gabrielle-ong · 2024-10-03T10:31:39Z

Solved by llama.cpp upstream (thanks @nguyenhoangthuan99 for investigating)

Van-QA added the type: bug Something isn't working label May 28, 2024

github-project-automation bot added this to Menlo May 28, 2024

Van-QA mentioned this issue May 28, 2024

epic: Cortex supports preset configuration list #617

Closed

5 tasks

Van-QA added the roadmap: Cortex label May 30, 2024

louis-menlo self-assigned this May 31, 2024

imtuyethan removed the roadmap: Cortex label Aug 28, 2024

imtuyethan assigned nguyenhoangthuan99 and unassigned louis-menlo Aug 28, 2024

imtuyethan moved this from Icebox to Need Investigation in Menlo Sep 2, 2024

dan-menlo moved this from Need Investigation to Scheduled in Menlo Sep 3, 2024

dan-menlo added category: model running Inference ux, handling context/parameters, runtime category: engine management Related to engine abstraction labels Sep 6, 2024

freelerobot added the P1: important Important feature / fix label Sep 6, 2024

dan-menlo assigned louis-menlo Sep 9, 2024

dan-menlo mentioned this issue Sep 11, 2024

epic: llama.cpp params are settable via API call or model.yaml #1151

Closed

7 tasks

nguyenhoangthuan99 moved this from Scheduled to Need Investigation in Menlo Sep 25, 2024

nguyenhoangthuan99 moved this from Investigating to In Review in Menlo Sep 30, 2024

nguyenhoangthuan99 moved this from In Review to Review + QA in Menlo Sep 30, 2024

gabrielle-ong closed this as not planned Won't fix, can't repro, duplicate, stale Oct 3, 2024

gabrielle-ong moved this from Review + QA to Completed in Menlo Oct 3, 2024

gabrielle-ong added this to the v1.0.0 milestone Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Duplicate BOS Token in Hugging Face Chat Templates #618

bug: Duplicate BOS Token in Hugging Face Chat Templates #618

Van-QA commented May 28, 2024

dan-menlo commented Sep 8, 2024

nguyenhoangthuan99 commented Sep 25, 2024 •

edited

Loading

nguyenhoangthuan99 commented Sep 30, 2024 •

edited

Loading

gabrielle-ong commented Oct 3, 2024

bug: Duplicate BOS Token in Hugging Face Chat Templates #618

bug: Duplicate BOS Token in Hugging Face Chat Templates #618

Comments

Van-QA commented May 28, 2024

dan-menlo commented Sep 8, 2024

nguyenhoangthuan99 commented Sep 25, 2024 • edited Loading

nguyenhoangthuan99 commented Sep 30, 2024 • edited Loading

gabrielle-ong commented Oct 3, 2024

nguyenhoangthuan99 commented Sep 25, 2024 •

edited

Loading

nguyenhoangthuan99 commented Sep 30, 2024 •

edited

Loading