Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use #10164

Merged
merged 7 commits into from
Nov 23, 2024

Conversation

tjohnson31415
Copy link
Contributor

@tjohnson31415 tjohnson31415 commented Nov 8, 2024

For our use-case, we want to serve the Llama 3.2 Vision models while also supporting non-vision requests that use tools. The current recommended/example chat template assumes tool use. It injects a tool-use system prompt even when tools are not requested and it does not support image inputs. This PR updates the template to support tool-use, vision inputs, and plain chat generation depending on the input conversation.

Examples below show the results of templating for a few different use-cases. This was done using the meta-llama/Llama-3.2-11B-Vision-Instruct model's tokenizer. "New" refers to the template in this PR, "Old" is the current vLLM example template from main, and "Base" is using the template from the tokenizer_config.json in HF Hub.

FIX #10324

Basic Chat

Input

[
    {
      "role": "user",
      "content": "What is vLLM?\n"
    }
]

Old

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question.<|eot_id|><|start_header_id|>user<|end_header_id|>

[{'type': 'text', 'text': 'What is vLLM?\n'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>


New

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

What is vLLM?<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Base

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

What is vLLM?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>


System Prompt

Input

[
    {
        "role": "system",
        "content": "You are an expert on vLLM.",
    },
    {
      "role": "user",
      "content": "What is vLLM?\n",dd
    }
]

Old

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

[{'type': 'text', 'text': 'You are an expert on vLLM.'}]<|eot_id|><|start_header_id|>user<|end_header_id|>

[{'type': 'text', 'text': 'What is vLLM?\n'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>


New

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

<|eot_id|><|start_header_id|>system<|end_header_id|>

You are an expert on vLLM.<|eot_id|><|start_header_id|>user<|end_header_id|>

What is vLLM?<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Base

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

[{'type': 'text', 'text': 'You are an expert on vLLM.'}]<|eot_id|><|start_header_id|>user<|end_header_id|>

What is vLLM?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>


NB: vLLM transforms the system prompt's string content into a JSON object for mllama, but the base template assumes it will always be a string.

Image

Input

[
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": image_url}},
        {"type": "text", "text": "Describe the image."},
      ]
    }
]

Old

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question.<|eot_id|><|start_header_id|>user<|end_header_id|>

[{'type': 'image'}, {'type': 'text', 'text': 'Describe the image.'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>


New

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>Describe the image.<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Base

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>Describe the image.<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Image with System Prompt

Input

[
    {
        "role": "system",
        "content": "You are a helpful assistant model.",
    },
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": image_url}},
        {"type": "text", "text": "Describe the image."},
      ]
    }
]

Old

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

[{'type': 'text', 'text': 'You are a helpful assistant model.'}]<|eot_id|><|start_header_id|>user<|end_header_id|>

[{'type': 'image'}, {'type': 'text', 'text': 'Describe the image.'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>


New

Throws exception:

TemplateError: Prompting with images is incompatible with system messages and tool use.

Base

Throws exception:

TemplateError: Prompting with images is incompatible with system messages and tool use.
Tool Use Request

Input

messages = [
    {
      "role": "user",
      "content": "What is the weather in San Fransisco?",
    }
]

get_current_weather = {
    "type": "function",
    "function": {
        "name": "get_current_temperature",
        "description": "Gets the temperature at a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The location to get the temperature for"
                }
            },
            "required": [
                "location"
            ]
        }
    }
}
tools = [get_current_weather]

Old

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

You have access to the following functions. To call a function, please respond with JSON for a function call.Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.Do not use variables.

{
    "type": "function",
    "function": {
        "name": "get_current_temperature",
        "description": "Gets the temperature at a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The location to get the temperature for"
                }
            },
            "required": [
                "location"
            ]
        }
    }
}

You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question.<|eot_id|><|start_header_id|>user<|end_header_id|>

[{'type': 'text', 'text': 'What is the weather in San Fransisco?'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>


New

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

You have access to the following functions. To call a function, please respond with JSON for a function call. Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

{
    "type": "function",
    "function": {
        "name": "get_current_temperature",
        "description": "Gets the temperature at a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The location to get the temperature for"
                }
            },
            "required": [
                "location"
            ]
        }
    }
}

You are a helpful assistant with tool calling capabilities. Only reply with a tool call if the function exists in the library provided by the user. If it doesn't exist, just reply directly in natural language. When you receive a tool call response, use the output to format an answer to the original user question.<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the weather in San Fransisco?<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Base

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: 08 Nov 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.Do not use variables.

{
    "type": "function",
    "function": {
        "name": "get_current_temperature",
        "description": "Gets the temperature at a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The location to get the temperature for"
                }
            },
            "required": [
                "location"
            ]
        }
    }
}

[{'type': 'text', 'text': 'What is the weather in San Fransisco?'}]<|eot_id|><|start_header_id|>assistant<|end_header_id|>


NB: vLLM transforms the string content into a JSON object for mllama, but the base template assumes it will be a string when merging the user message with the tool info.

Copy link

github-actions bot commented Nov 8, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@tjohnson31415
Copy link
Contributor Author

cc: @K-Mistele and @maxdebayser for review. Thanks!

@DarkLight1337
Copy link
Member

cc @K-Mistele see if this chat template still looks good to you for tool use.

@K-Mistele
Copy link
Contributor

cc @K-Mistele see if this chat template still looks good to you for tool use.

Thanks for the ping! I'm getting ready for some travel but can take a look while I'm on the plane tomorrow.

@K-Mistele
Copy link
Contributor

Possibly related #9859

{%- if not image_ns.has_images %}
{{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
{%- if tools is not none %}
{{- "Environment: ipython\n" }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for JSON tool calling, but is it also true that the pythonic tool calling is incompatible with images?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
@tjohnson31415
Copy link
Contributor Author

tjohnson31415 commented Nov 19, 2024

I pushed some additional changes:

  • rebased on current main
  • inverted the check from is mapping to is string, re: this comment
  • made changes to the 3.1 example tools template similar to the ones for 3.2 (i.e. don't inject the default tool prompt unless tools are requested)

I also see that #9919 was merged to handle detecting what format of content the template expects. The updates I did in this PR has the example templates handle both formats, but that may be uncessary now.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chat content format detection doesn't distinguish between text and vision inputs (only the format of message["content"][int]. This change LGTM!

@DarkLight1337
Copy link
Member

Since the chat templates now support "openai" format, we should update the tests accordingly.

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 21, 2024
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
auto-merge was automatically disabled November 22, 2024 19:29

Head branch was pushed to by a user without write access

@tjohnson31415 tjohnson31415 changed the title [Bugfix][Frontend] Update Llama 3.2 Chat Template to support Vision and Non-Tool use [Bugfix][Frontend] Update Llama Chat Templates to support Vision and Non-Tool use Nov 22, 2024
@tjohnson31415 tjohnson31415 changed the title [Bugfix][Frontend] Update Llama Chat Templates to support Vision and Non-Tool use [Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use Nov 22, 2024
@DarkLight1337 DarkLight1337 merged commit 9195dbd into vllm-project:main Nov 23, 2024
33 checks passed
@tjohnson31415 tjohnson31415 deleted the llama-chat-templates branch November 25, 2024 16:19
mfournioux pushed a commit to mfournioux/vllm that referenced this pull request Nov 28, 2024
…ol use (vllm-project#10164)

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
…ol use (vllm-project#10164)

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] custom chat template sends to model [{'type': 'text', 'text': '...'}]
4 participants