-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: Duplicate BOS Token in Hugging Face Chat Templates #618
Comments
@nguyenhoangthuan99 I am putting this as a sub-issue of #1151. This issue may be stale if HF has fixed upstream |
Problem The token This issue won't affect too much to the response, and most models still work well with this. We are using cortex with double BOS token with llama.cpp engine without any error. Solution
I think the first solution is better because we don't need to update the cortex.cpp part and model.yml file which is related to many other parts of cortex.cpp. This issue is not critical, and it needs effort to read llama.cpp source to handle. |
I want to close this issue because llama.cpp set For that reason, this bug no longer appear when running inference with llama.cpp. |
Solved by llama.cpp upstream (thanks @nguyenhoangthuan99 for investigating) |
Description:
When using chat templates in Hugging Face, the Beginning-OfSentence (BOS) token is often already included in the template. However, Llama.cpp also automatically adds the BOS token, resulting in a duplicate BOS token.
Expected behavior:
The system should automatically detect and remove any duplicate BOS tokens in the chat template. This would ensure proper functioning of the chat system without causing errors due to redundant tokens.
Additional context:
This issue may cause unexpected behavior or errors in the chat system. It is recommended that Cortex checks for and deduplicates the BOS token if it is present in the user's template to maintain a consistent and error-free chat experience.
The text was updated successfully, but these errors were encountered: