server: Error\nvector::_M_default_append when using certain models since "llama_chat_apply_template" #5627

infozzdatalabs · 2024-02-21T08:19:57Z

Description

Server returns "500 Internal Server Error\nvector::_M_default_append" when using certain models trying to use model's template with docker cuda image.

Steps to Reproduce

I'm using openAI in python:

def api_openai(placeholder, system_prompt, user_prompt, temperature, logit_bias):
full_response = ""
for response in openai_client.chat.completions.create(
model=st.session_state["openai_model"],
messages=[{"role": "system",
"content": system_prompt},
{"role": "user",
"content": user_prompt}],
stream=True, temperature=temperature, frequency_penalty=1, logit_bias=logit_bias):
full_response += (response.choices[0].delta.content or "")
placeholder.info(full_response + "▌")

return full_response

Actual Behavior

"500 Internal Server Error\nvector::_M_default_append"

Screenshots

Environment

Operating System: Docker
Docker compose:
api-server:
container_name: api-server
image: ghcr.io/ggerganov/llama.cpp:server-cuda
command: >
-m models/alphamonarch-7b.Q5_K_M.gguf
--ctx-size 8192
--host 0.0.0.0
--port 8080
--n-gpu-layers 1000
-np 1
-cb
--grp-attn-n 4
--grp-attn-w 2048
--api-key key
--verbose
ports:
- "8080:8080"
Models that failed:
https://huggingface.co/mlabonne/AlphaMonarch-7B-GGUF
https://huggingface.co/CultriX/OmniBeagle-7B-GGUF

Additional Information

Models that I've tried that works:
https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF
https://huggingface.co/brittlewis12/NeuralDaredevil-7B-GGUF

Related Issues

i used #5593

Proposed Solution

I think that the problem could be related to the extracted chat_template, in Hugginface are using without problems "tokenizer.apply_chat_template" but i don't know if llama.cpp implementation works like that

The text was updated successfully, but these errors were encountered:

ggerganov · 2024-02-21T08:32:50Z

@ngxson Could you take a look if this is caused by the chat template changes?

I know we verify if the custom template is supported (if provided on the command line) but I don't think we check if the model's built-in template is supported and this might be causing the crash.

We should check that and either fallback to some standard template and print noticeable warning and / or write some short instructions for adding support for new chat templates in llama.cpp so that people submit PRs

ngxson · 2024-02-21T08:47:12Z

Thanks for the detail bug report, I'll look into this today. Seems like it's really because we haven't support these templates. In this case, we can fallback to chatml.

I'll add them to the list of supported template too.

infozzdatalabs · 2024-02-21T08:50:41Z

This problem started to happen yesterday when I started using the new image after the implementation, before that, the models did not give good results because of the inadequate chat template, but it did not crash.
Thank you very much for the fast response :)

infozzdatalabs added the bug-unconfirmed label Feb 21, 2024

ngxson mentioned this issue Feb 21, 2024

Server: fallback to chatml, add AlphaMonarch chat template #5628

Merged

ggerganov closed this as completed in #5628 Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: Error\nvector::_M_default_append when using certain models since "llama_chat_apply_template" #5627

server: Error\nvector::_M_default_append when using certain models since "llama_chat_apply_template" #5627

infozzdatalabs commented Feb 21, 2024

ggerganov commented Feb 21, 2024

ngxson commented Feb 21, 2024

infozzdatalabs commented Feb 21, 2024

server: Error\nvector::_M_default_append when using certain models since "llama_chat_apply_template" #5627

server: Error\nvector::_M_default_append when using certain models since "llama_chat_apply_template" #5627

Comments

infozzdatalabs commented Feb 21, 2024

Description

Steps to Reproduce

Actual Behavior

Screenshots

Environment

Additional Information

Related Issues

Proposed Solution

ggerganov commented Feb 21, 2024

ngxson commented Feb 21, 2024

infozzdatalabs commented Feb 21, 2024