Inference server not working with models tuned on <|system|>,<|prompter|>,<|assistant|> or <|im_start|>,<|im_end|> format #1000

horiacristescu · 2023-09-09T14:51:10Z

Trying to run the vLLM server with https://huggingface.co/Open-Orca/LlongOrca-13B-16k but it returns just white space.

It uses messages formatted as:

<|im_start|>system
You are LlongOrca, a large language model trained by Alignment Lab AI. Write out your reasoning step-by-step to be sure you get the right answers!
<|im_end|>

Also tried https://huggingface.co/OpenAssistant/llama2-13b-orca-8k-3319 but it returns empty.
Message format:

<|system|>system message</s><|prompter|>user prompt</s><|assistant|>

Is it possible to use models that require such different formatting? The vLLM request is abstracted away and only sends messages list. I tried wrapping the content with the special tokens.

The only prompt format that works for me on vLLM server is

### Instruction:
<prompt>
### Response:

from Open-Orca/OpenOrca-Platypus2-13B

Maybe I am missing an argument when running the server?

The text was updated successfully, but these errors were encountered:

horiacristescu · 2023-09-09T15:42:52Z

On further investigation it seems related to the get_conversation_template() function.
It returns the wrong template for the model (Open-Orca/LlongOrca-13B-16k, should be with <|im_start|> and <|im_end|>)

conv= {'name': 'one_shot', 'system_template': '{system_message}', 'system_message': "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.", 'roles': ('Human', 'Assistant'), 'messages': [['Human', 'Got any creative ideas for a 10 year old’s birthday?'], ['Assistant', "Of course! Here are some creative ideas for a 10-year-old's birthday party:\n1. Treasure Hunt: Organize a treasure hunt in your backyard or nearby park. Create clues and riddles for the kids to solve, leading them to hidden treasures and surprises.\n2. Science Party: Plan a science-themed party where kids can engage in fun and interactive experiments. You can set up different stations with activities like making slime, erupting volcanoes, or creating simple chemical reactions.\n3. Outdoor Movie Night: Set up a backyard movie night with a projector and a large screen or white sheet. Create a cozy seating area with blankets and pillows, and serve popcorn and snacks while the kids enjoy a favorite movie under the stars.\n4. DIY Crafts Party: Arrange a craft party where kids can unleash their creativity. Provide a variety of craft supplies like beads, paints, and fabrics, and let them create their own unique masterpieces to take home as party favors.\n5. Sports Olympics: Host a mini Olympics event with various sports and games. Set up different stations for activities like sack races, relay races, basketball shooting, and obstacle courses. Give out medals or certificates to the participants.\n6. Cooking Party: Have a cooking-themed party where the kids can prepare their own mini pizzas, cupcakes, or cookies. Provide toppings, frosting, and decorating supplies, and let them get hands-on in the kitchen.\n7. Superhero Training Camp: Create a superhero-themed party where the kids can engage in fun training activities. Set up an obstacle course, have them design their own superhero capes or masks, and organize superhero-themed games and challenges.\n8. Outdoor Adventure: Plan an outdoor adventure party at a local park or nature reserve. Arrange activities like hiking, nature scavenger hunts, or a picnic with games. Encourage exploration and appreciation for the outdoors.\nRemember to tailor the activities to the birthday child's interests and preferences. Have a great celebration!"]], 'offset': 2, 'sep_style': <SeparatorStyle.ADD_COLON_SINGLE: 1>, 'sep': '\n### ', 'sep2': None, 'stop_str': '###', 'stop_token_ids': None}

These definitions come from FastChat
https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py

Maybe I only need to tell vLLM to use a different conversation template

viktor-ferenczi · 2023-09-10T22:26:11Z

The get_conversation_template function comes from FastChat. So this issue may belong there.

agrogov · 2023-10-04T09:54:30Z

When inferencing mistralai/Mistral-7B-Instruct-v0.1 or mistralai/Mistral-7B-v0.1 models vLLM using this template instead of this one

timothylimyl · 2023-11-01T03:14:45Z

how do we go about changing the system prompt?

vackosar · 2023-11-17T20:24:26Z

@horiacristescu I have similar problem. Can you try setting --served-model-name zephyr and then requesting the model under zephyr name. That will actually cause retrieval of the right conversation format. But still, it is returning ChatML special tokens and just generating the conversation on-and-on for me. I still had to add stopping sequences for the special tokens. vLLM it seems to be lacking the stopping on special tokens.

dongxiaolong mentioned this issue Nov 9, 2023

Add Chat Template Support to vLLM #1493

Closed

hmellor closed this as completed Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference server not working with models tuned on <|system|>,<|prompter|>,<|assistant|> or <|im_start|>,<|im_end|> format #1000

Inference server not working with models tuned on <|system|>,<|prompter|>,<|assistant|> or <|im_start|>,<|im_end|> format #1000

horiacristescu commented Sep 9, 2023 •

edited

Loading

horiacristescu commented Sep 9, 2023 •

edited

Loading

viktor-ferenczi commented Sep 10, 2023

agrogov commented Oct 4, 2023 •

edited

Loading

timothylimyl commented Nov 1, 2023

vackosar commented Nov 17, 2023

Inference server not working with models tuned on <|system|>,<|prompter|>,<|assistant|> or <|im_start|>,<|im_end|> format #1000

Inference server not working with models tuned on <|system|>,<|prompter|>,<|assistant|> or <|im_start|>,<|im_end|> format #1000

Comments

horiacristescu commented Sep 9, 2023 • edited Loading

horiacristescu commented Sep 9, 2023 • edited Loading

viktor-ferenczi commented Sep 10, 2023

agrogov commented Oct 4, 2023 • edited Loading

timothylimyl commented Nov 1, 2023

vackosar commented Nov 17, 2023

horiacristescu commented Sep 9, 2023 •

edited

Loading

horiacristescu commented Sep 9, 2023 •

edited

Loading

agrogov commented Oct 4, 2023 •

edited

Loading