[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use #10164

tjohnson31415 · 2024-11-08T17:32:22Z

For our use-case, we want to serve the Llama 3.2 Vision models while also supporting non-vision requests that use tools. The current recommended/example chat template assumes tool use. It injects a tool-use system prompt even when tools are not requested and it does not support image inputs. This PR updates the template to support tool-use, vision inputs, and plain chat generation depending on the input conversation.

Examples below show the results of templating for a few different use-cases. This was done using the meta-llama/Llama-3.2-11B-Vision-Instruct model's tokenizer. "New" refers to the template in this PR, "Old" is the current vLLM example template from main, and "Base" is using the template from the tokenizer_config.json in HF Hub.

FIX #10324

Basic Chat

Input

[
    {
      "role": "user",
      "content": "What is vLLM?\n"
    }
]

Old

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question.<|eot_id|><|start_header_id|>user<|end_header_id|>

[{'type': 'text', 'text': 'What is vLLM?\n'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

New

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

What is vLLM?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Base

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

What is vLLM?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>

System Prompt

Input

[
    {
        "role": "system",
        "content": "You are an expert on vLLM.",
    },
    {
      "role": "user",
      "content": "What is vLLM?\n",dd
    }
]

Old

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

[{'type': 'text', 'text': 'You are an expert on vLLM.'}]<|eot_id|><|start_header_id|>user<|end_header_id|>

[{'type': 'text', 'text': 'What is vLLM?\n'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

New

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

<|eot_id|><|start_header_id|>system<|end_header_id|>

You are an expert on vLLM.<|eot_id|><|start_header_id|>user<|end_header_id|>

What is vLLM?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Base

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

[{'type': 'text', 'text': 'You are an expert on vLLM.'}]<|eot_id|><|start_header_id|>user<|end_header_id|>

What is vLLM?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>

NB: vLLM transforms the system prompt's string content into a JSON object for mllama, but the base template assumes it will always be a string.

Image

Input

[
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": image_url}},
        {"type": "text", "text": "Describe the image."},
      ]
    }
]

Old

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question.<|eot_id|><|start_header_id|>user<|end_header_id|>

[{'type': 'image'}, {'type': 'text', 'text': 'Describe the image.'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

New

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>Describe the image.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Base

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>Describe the image.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Image with System Prompt

Input

[
    {
        "role": "system",
        "content": "You are a helpful assistant model.",
    },
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": image_url}},
        {"type": "text", "text": "Describe the image."},
      ]
    }
]

Old

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

[{'type': 'text', 'text': 'You are a helpful assistant model.'}]<|eot_id|><|start_header_id|>user<|end_header_id|>

[{'type': 'image'}, {'type': 'text', 'text': 'Describe the image.'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

New

Throws exception:

TemplateError: Prompting with images is incompatible with system messages and tool use.

Base

Throws exception:

TemplateError: Prompting with images is incompatible with system messages and tool use.

Tool Use Request

Input

messages = [
    {
      "role": "user",
      "content": "What is the weather in San Fransisco?",
    }
]

get_current_weather = {
    "type": "function",
    "function": {
        "name": "get_current_temperature",
        "description": "Gets the temperature at a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The location to get the temperature for"
                }
            },
            "required": [
                "location"
            ]
        }
    }
}
tools = [get_current_weather]

Old

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

You have access to the following functions. To call a function, please respond with JSON for a function call.Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.Do not use variables.

{
    "type": "function",
    "function": {
        "name": "get_current_temperature",
        "description": "Gets the temperature at a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The location to get the temperature for"
                }
            },
            "required": [
                "location"
            ]
        }
    }
}

You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question.<|eot_id|><|start_header_id|>user<|end_header_id|>

[{'type': 'text', 'text': 'What is the weather in San Fransisco?'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

New

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

You have access to the following functions. To call a function, please respond with JSON for a function call. Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

{
    "type": "function",
    "function": {
        "name": "get_current_temperature",
        "description": "Gets the temperature at a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The location to get the temperature for"
                }
            },
            "required": [
                "location"
            ]
        }
    }
}

You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question.<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the weather in San Fransisco?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Base

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.Do not use variables.

{
    "type": "function",
    "function": {
        "name": "get_current_temperature",
        "description": "Gets the temperature at a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The location to get the temperature for"
                }
            },
            "required": [
                "location"
            ]
        }
    }
}

[{'type': 'text', 'text': 'What is the weather in San Fransisco?'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

NB: vLLM transforms the string content into a JSON object for mllama, but the base template assumes it will be a string when merging the user message with the tool info.

github-actions · 2024-11-08T17:32:33Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

tjohnson31415 · 2024-11-08T20:31:13Z

cc: @K-Mistele and @maxdebayser for review. Thanks!

examples/tool_chat_template_llama3.2_json.jinja

DarkLight1337 · 2024-11-09T03:46:03Z

cc @K-Mistele see if this chat template still looks good to you for tool use.

K-Mistele · 2024-11-11T04:48:14Z

cc @K-Mistele see if this chat template still looks good to you for tool use.

Thanks for the ping! I'm getting ready for some travel but can take a look while I'm on the plane tomorrow.

K-Mistele · 2024-11-11T04:48:36Z

Possibly related #9859

examples/tool_chat_template_llama3.2_json.jinja

maxdebayser · 2024-11-11T14:08:21Z

examples/tool_chat_template_llama3.2_json.jinja

+{%- if not image_ns.has_images %}
+    {{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
+    {%- if tools is not none %}
+        {{- "Environment: ipython\n" }}


I think this is fine for JSON tool calling, but is it also true that the pythonic tool calling is incompatible with images?

From https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/vision_prompt_format.md#builtin-and-zero-shot-tool-calling :

Tool Calling does NOT work with images in the prompt as of now.

examples/tool_chat_template_llama3.2_json.jinja

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

tjohnson31415 · 2024-11-19T21:12:12Z

I pushed some additional changes:

rebased on current main
inverted the check from is mapping to is string, re: this comment
made changes to the 3.1 example tools template similar to the ones for 3.2 (i.e. don't inject the default tool prompt unless tools are requested)

I also see that #9919 was merged to handle detecting what format of content the template expects. The updates I did in this PR has the example templates handle both formats, but that may be uncessary now.

DarkLight1337

The chat content format detection doesn't distinguish between text and vision inputs (only the format of message["content"][int]. This change LGTM!

DarkLight1337 · 2024-11-20T02:35:24Z

Since the chat templates now support "openai" format, we should update the tests accordingly.

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

…ol use (vllm-project#10164) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>

…ol use (vllm-project#10164) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

DarkLight1337 reviewed Nov 9, 2024

View reviewed changes

examples/tool_chat_template_llama3.2_json.jinja Outdated Show resolved Hide resolved

K-Mistele mentioned this pull request Nov 11, 2024

[Frontend] Pythonic tool parser #9859

Merged

maxdebayser reviewed Nov 11, 2024

View reviewed changes

examples/tool_chat_template_llama3.2_json.jinja Outdated Show resolved Hide resolved

maxdebayser reviewed Nov 11, 2024

View reviewed changes

DarkLight1337 reviewed Nov 12, 2024

View reviewed changes

examples/tool_chat_template_llama3.2_json.jinja Outdated Show resolved Hide resolved

DarkLight1337 mentioned this pull request Nov 14, 2024

[Bug] custom chat template sends to model [{'type': 'text', 'text': '...'}] #10324

Closed

1 task

tjohnson31415 added 5 commits November 19, 2024 11:22

update llama 3.2 chat template

9e4666d

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

remove 'is iterable' check when rendering a tool response

be7a49e

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

support system prompt with object content

ad1a2f7

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

a few more tweaks to the 3.2 template

41e41aa

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

changes to llama 3.1 template similar to the ones for 3.2

77e82ac

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

tjohnson31415 force-pushed the llama-chat-templates branch from 806db87 to 77e82ac Compare November 19, 2024 20:47

DarkLight1337 approved these changes Nov 20, 2024

View reviewed changes

test: openai format for llama example tool templates

ecbb268

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

tjohnson31415 requested review from robertgshaw2-neuralmagic and simon-mo as code owners November 20, 2024 17:44

DarkLight1337 enabled auto-merge (squash) November 21, 2024 02:04

DarkLight1337 approved these changes Nov 21, 2024

View reviewed changes

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 21, 2024

fixes to templates to get unit tests passing

7c6c0ba

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

auto-merge was automatically disabled November 22, 2024 19:29
Head branch was pushed to by a user without write access

tjohnson31415 changed the title ~~[Bugfix][Frontend] Update Llama 3.2 Chat Template to support Vision and Non-Tool use~~ [Bugfix][Frontend] Update Llama Chat Templates to support Vision and Non-Tool use Nov 22, 2024

tjohnson31415 changed the title ~~[Bugfix][Frontend] Update Llama Chat Templates to support Vision and Non-Tool use~~ [Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use Nov 22, 2024

DarkLight1337 merged commit 9195dbd into vllm-project:main Nov 23, 2024
33 checks passed

tjohnson31415 deleted the llama-chat-templates branch November 25, 2024 16:19

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[Bugfix][Frontend] Update Llama Chat Templates to also support Non-To…

a5bb1f7

…ol use (vllm-project#10164) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use #10164

[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use #10164

tjohnson31415 commented Nov 8, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 8, 2024

tjohnson31415 commented Nov 8, 2024

DarkLight1337 commented Nov 9, 2024

K-Mistele commented Nov 11, 2024

K-Mistele commented Nov 11, 2024

maxdebayser Nov 11, 2024

tjohnson31415 Nov 11, 2024

tjohnson31415 commented Nov 19, 2024 •

edited

Loading

DarkLight1337 left a comment

DarkLight1337 commented Nov 20, 2024

[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use #10164

[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use #10164

Conversation

tjohnson31415 commented Nov 8, 2024 • edited by github-actions bot Loading

Input

Old

New

Base

Input

Old

New

Base

Input

Old

New

Base

Input

Old

New

Base

Input

Old

New

Base

github-actions bot commented Nov 8, 2024

tjohnson31415 commented Nov 8, 2024

DarkLight1337 commented Nov 9, 2024

K-Mistele commented Nov 11, 2024

K-Mistele commented Nov 11, 2024

maxdebayser Nov 11, 2024

Choose a reason for hiding this comment

tjohnson31415 Nov 11, 2024

Choose a reason for hiding this comment

tjohnson31415 commented Nov 19, 2024 • edited Loading

DarkLight1337 left a comment

Choose a reason for hiding this comment

DarkLight1337 commented Nov 20, 2024

tjohnson31415 commented Nov 8, 2024 •

edited by github-actions bot

Loading

tjohnson31415 commented Nov 19, 2024 •

edited

Loading