-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference server not working with models tuned on <|system|>,<|prompter|>,<|assistant|> or <|im_start|>,<|im_end|> format #1000
Comments
On further investigation it seems related to the
These definitions come from FastChat Maybe I only need to tell vLLM to use a different conversation template |
The |
When inferencing |
how do we go about changing the system prompt? |
@horiacristescu I have similar problem. Can you try setting |
Trying to run the vLLM server with https://huggingface.co/Open-Orca/LlongOrca-13B-16k but it returns just white space.
It uses messages formatted as:
Also tried https://huggingface.co/OpenAssistant/llama2-13b-orca-8k-3319 but it returns empty.
Message format:
Is it possible to use models that require such different formatting? The vLLM request is abstracted away and only sends messages list. I tried wrapping the content with the special tokens.
The only prompt format that works for me on vLLM server is
from Open-Orca/OpenOrca-Platypus2-13B
Maybe I am missing an argument when running the server?
The text was updated successfully, but these errors were encountered: