Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: Error\nvector::_M_default_append when using certain models since "llama_chat_apply_template" #5627

Closed
infozzdatalabs opened this issue Feb 21, 2024 · 3 comments · Fixed by #5628

Comments

@infozzdatalabs
Copy link

Description

Server returns "500 Internal Server Error\nvector::_M_default_append" when using certain models trying to use model's template with docker cuda image.

Steps to Reproduce

I'm using openAI in python:

def api_openai(placeholder, system_prompt, user_prompt, temperature, logit_bias):
full_response = ""
for response in openai_client.chat.completions.create(
model=st.session_state["openai_model"],
messages=[{"role": "system",
"content": system_prompt},
{"role": "user",
"content": user_prompt}],
stream=True, temperature=temperature, frequency_penalty=1, logit_bias=logit_bias):
full_response += (response.choices[0].delta.content or "")
placeholder.info(full_response + "▌")

return full_response

Actual Behavior

"500 Internal Server Error\nvector::_M_default_append"

Screenshots

Untitled

Environment

Additional Information

Models that I've tried that works:
https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF
https://huggingface.co/brittlewis12/NeuralDaredevil-7B-GGUF

Related Issues

i used #5593

Proposed Solution

I think that the problem could be related to the extracted chat_template, in Hugginface are using without problems "tokenizer.apply_chat_template" but i don't know if llama.cpp implementation works like that

@ggerganov
Copy link
Owner

@ngxson Could you take a look if this is caused by the chat template changes?

I know we verify if the custom template is supported (if provided on the command line) but I don't think we check if the model's built-in template is supported and this might be causing the crash.

We should check that and either fallback to some standard template and print noticeable warning and / or write some short instructions for adding support for new chat templates in llama.cpp so that people submit PRs

@ngxson
Copy link
Collaborator

ngxson commented Feb 21, 2024

Thanks for the detail bug report, I'll look into this today. Seems like it's really because we haven't support these templates. In this case, we can fallback to chatml.

I'll add them to the list of supported template too.

@infozzdatalabs
Copy link
Author

This problem started to happen yesterday when I started using the new image after the implementation, before that, the models did not give good results because of the inadequate chat template, but it did not crash.
Thank you very much for the fast response :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants