vllm-project · simon-mo · Sep 4, 2024 · Jun 16, 2024 · Jun 16, 2024 · Jun 17, 2024
diff --git a/docs/source/serving/openai_compatible_server.md b/docs/source/serving/openai_compatible_server.md
@@ -110,14 +110,63 @@ directory [here](https://github.com/vllm-project/vllm/tree/main/examples/)
 :func: create_parser_for_docs
 :prog: vllm serve
 ```
+## Tool Calling in the Chat Completion API
+### Named Function Calling
+vLLM supports only named function calling in the chat completion API by default. It does so using Outlines, so this is 
+enabled by default, and will work with any supported model. You are guaranteed a validly-parsable function call - not a 
+high-quality one. 
 
-## Tool calling in the chat completion API
-vLLM supports only named function calling in the chat completion API. The `tool_choice` options `auto` and `required` are **not yet supported** but on the roadmap.
+To use a named function, you need to define the functions in the `tools` parameter of the chat completion request, and 
+specify the `name` of one of the tools in the `tool_choice` parameter of the chat completion request. 
 
-To use a named function you need to define the function in the `tools` parameter and call it in the `tool_choice` parameter. 
-
-It is the callers responsibility to prompt the model with the tool information, vLLM will not automatically manipulate the prompt. **This may change in the future.**
+It is the callers responsibility to prompt the model with the tool information, vLLM will not automatically manipulate the prompt.
 
 vLLM will use guided decoding to ensure the response matches the tool parameter object defined by the JSON schema in the `tools` parameter.
 
-Please refer to the OpenAI API reference documentation for more information.
+
+### Automatic Function Calling
+_This feature is in **beta**. It has limited model support, is not guaranteed to be stable, and does not have 
+well-defined failure modes._ As such, it must be explicitly enabled when desired.
+
+To enable this feature, you must set the following flags:
+* `--enable-api-tools` -- **mandatory** for Auto tool choice. tells vLLM that you want to enable tool templating and extraction.
+* `--enable-auto-toolchoice` -- **mandatory** Auto tool choice. tells vLLM that you want to enable the model to generate its' own tool scalls when it 
+deems appropriate. 
+* `--chat-template` -- **optional** for auto tool choice. the path to the chat template which handles `tool`-role messages and `assistant`-role messages 
+that contain previously generated tool calls.This argument can be set to `tool_use` if your model has a tool use chat 
+template configured in the `tokenizer_config.json`. In this case, it will be used per the `transformers` specification. More on this [here](https://huggingface.co/docs/transformers/en/chat_templating#why-do-some-models-have-multiple-templates)
+from HuggingFace; and you can find an example of this in a `tokenizer_config.json` [here]()
+* `--tool-parser` -- select the tool parser to use - currently either `hermes` or `mistral`. 
+
+If your favorite tool-calling model is not supported, please feel free to contribute a parser & tool use chat template! 
+
+#### Hermes Models
+Supported models in this series:
+* `NousResearch/Hermes-2-Pro-Llama-3-8B`
+* `NousResearch/Hermes-2-Theta-Llama-3-70B`
+* `NousResearch/Hermes-2-Pro-Llama-3-70B`
+* `NousResearch/Hermes-2-Theta-Llama-3-8B`
+* `NousResearch/Hermes-2-Pro-Mistral-7B`
+
+_Note that the Hermes 2 **Theta** models are known to have degraded tool call quality & capabilities due to the merge 
+step in their creation_. It is recommended to use the Hermes 2 **Pro** models. 
+
+Recommended flags: `--tool-parser hermes --chat-template examples/tool_chat_template_hermes.jinja`
+
+#### Mistral Models
+Supported models:
+* `mistralai/Mistral-7B-Instruct-v0.3`
+
+There are several known issues with tool-calling in Mistral models:
+* Attempting to generate > 1 tool call at a time usually results in a parser failure, since the model generates the calls
+in an unpredictable format due to the aforementioned chat template issue. **This can be mitigated by setting the 
+`temperature` to `0` in the OpenAI-style API call** - do this, and tool calls (including parallel ones) are **far** more 
+consistent
+* Mistral function-calling / tool use generates calls with _single_ quotes `'` instead of double quotes `"`. As a 
+result, tool call generations can't be handled as JSON by the parser automatically without using `eval`, which would 
+present security issues for vLLM users. As a result, to support Mistral tool calls, we find-and-replace single-quotes 
+with double-quotes in mistral-generated tool calls. Therefore, **it is important to ensure that your tool call 
+arguments do not contain single quotes.** Escaped double quotes may be handled properly, but otherwise you should
+expect parser issues. 
+
+Recommended flags: `--tool-parser mistral --chat-template examples/tool_chat_template_mistral.jinja`
diff --git a/examples/openai_chat_completion_client_with_tools.py b/examples/openai_chat_completion_client_with_tools.py
@@ -0,0 +1,143 @@
+from openai import OpenAI
+import json
+
+# Modify OpenAI's API key and API base to use vLLM's API server.
+openai_api_key = "EMPTY"
+openai_api_base = "http://localhost:8000/v1"
+
+client = OpenAI(
+    # defaults to os.environ.get("OPENAI_API_KEY")
+    api_key=openai_api_key,
+    base_url=openai_api_base,
+)
+
+models = client.models.list()
+model = models.data[0].id
+
+tools = [{
+    "type": "function",
+    "function": {
+        "name": "get_current_weather",
+        "description": "Get the current weather in a given location",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "city": {
+                    "type":
+                    "string",
+                    "description":
+                    "The city to find the weather for, e.g. 'San Francisco'"
+                },
+                "state": {
+                    "type":
+                    "string",
+                    "description":
+                    "the two-letter abbreviation for the state that the city is in, e.g. 'CA' which would mean 'California'"
+                },
+                "unit": {
+                    "type": "string",
+                    "description": "The unit to fetch the temperature in",
+                    "enum": ["celsius", "fahrenheit"]
+                }
+            },
+            "required": ["city", "state", "unit"]
+        }
+    }
+}]
+
+messages = [{
+    "role": "user",
+    "content": "Hi! How are you doing today?"
+}, {
+    "role": "assistant",
+    "content": "I'm doing well! How can I help you?"
+}, {
+    "role":
+    "user",
+    "content":
+    "Can you tell me what the temperate will be in Dallas and San Francisco, in fahrenheit?"
+}]
+
+chat_completion = client.chat.completions.create(messages=messages,
+                                                 model=model,
+                                                 tools=tools)
+
+print("Chat completion results:")
+print(chat_completion)
+print('\n\n')
+
+tool_calls_stream = client.chat.completions.create(messages=messages,
+                                                   model=model,
+                                                   tools=tools,
+                                                   stream=True)
+
+chunks = []
+for chunk in tool_calls_stream:
+    chunks.append(chunk)
+    if chunk.choices[0].delta.tool_calls:
+        print(chunk.choices[0].delta.tool_calls[0])
+    else:
+        print(chunk.choices[0].delta)
+
+arguments = []
+tool_call_idx = -1
+for chunk in chunks:
+
+    if chunk.choices[0].delta.tool_calls:
+        if chunk.choices[0].delta.tool_calls[0].index != tool_call_idx:
+            if tool_call_idx >= 0:
+                print(
+                    f'streamed tool call arguments: {arguments[tool_call_idx]}\n\n'
+                )
+            tool_call_idx = chunk.choices[0].delta.tool_calls[0].index
+            arguments.append('')
+        if chunk.choices[0].delta.tool_calls[0].id:
+            print(
+                f'streamed tool call id: {chunk.choices[0].delta.tool_calls[0].id}'
+            )
+        if chunk.choices[0].delta.tool_calls[0].function:
+            if chunk.choices[0].delta.tool_calls[0].function.name:
+                print(
+                    f'streamed tool call name: {chunk.choices[0].delta.tool_calls[0].function.name}'
+                )
+            if chunk.choices[0].delta.tool_calls[0].function.arguments:
+                arguments[tool_call_idx] += chunk.choices[0].delta.tool_calls[
+                    0].function.arguments
+
+if len(arguments):
+    print(f'streamed tool call arguments: {arguments[-1]}')
+
+print('\n\n')
+
+messages.append({
+    "role": "assistant",
+    "tool_calls": chat_completion.choices[0].message.tool_calls
+})
+
+
+# Now, simulate a tool call
+def get_current_weather(city: str, state: str, unit: 'str'):
+    return "The weather in Dallas, Texas is 85 degrees fahrenheit. It is partly cloudly, with highs in the 90's."
+
+
+available_tools = {"get_current_weather": get_current_weather}
+
+completion_tool_calls = chat_completion.choices[0].message.tool_calls
+for call in completion_tool_calls:
+    tool_to_call = available_tools[call.function.name]
+    args = json.loads(call.function.arguments)
+    result = tool_to_call(**args)
+    print(result)
+    messages.append({
+        "role": "tool",
+        "content": result,
+        "tool_call_id": call.id,
+        "name": call.function.name
+    })
+
+chat_completion_2 = client.chat.completions.create(messages=messages,
+                                                   model=model,
+                                                   tools=tools,
+                                                   stream=False)
+print('\n\n')
+print(chat_completion_2)
diff --git a/examples/tool_chat_template_hermes.jinja b/examples/tool_chat_template_hermes.jinja
@@ -0,0 +1,123 @@
+{%- macro json_to_python_type(json_spec) %}
+    {%- set basic_type_map = {
+    "string": "str",
+    "number": "float",
+    "integer": "int",
+    "boolean": "bool"
+} %}
+
+    {%- if basic_type_map[json_spec.type] is defined %}
+        {{- basic_type_map[json_spec.type] }}
+    {%- elif json_spec.type == "array" %}
+        {{- "list[" +  json_to_python_type(json_spec|items) + "]" }}
+    {%- elif json_spec.type == "object" %}
+        {%- if json_spec.additionalProperties is defined %}
+            {{- "dict[str, " + json_to_python_type(json_spec.additionalProperties) + ']' }}
+        {%- else %}
+            {{- "dict" }}
+        {%- endif %}
+    {%- elif json_spec.type is iterable %}
+        {{- "Union[" }}
+        {%- for t in json_spec.type %}
+            {{- json_to_python_type({"type": t}) }}
+            {%- if not loop.last %}
+                {{- "," }}
+            {%- endif %}
+        {%- endfor %}
+        {{- "]" }}
+    {%- else %}
+        {{- "Any" }}
+    {%- endif %}
+{%- endmacro %}
+
+
+{{- bos_token }}
+{%- if tools is iterable and tools | length > 0 %}
+    {{- "You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools> " }}
+    {%- for tool in tools %}
+        {%- if tool.function is defined %}
+            {%- set tool = tool.function %}
+        {%- endif %}
+        {{- '{"type": "function", "function": ' }}
+        {{- '{"name": ' + tool.name + '", ' }}
+        {{- '"description": "' + tool.name + '(' }}
+        {%- for param_name, param_fields in tool.parameters.properties|items %}
+            {{- param_name + ": " + json_to_python_type(param_fields) }}
+            {%- if not loop.last %}
+                {{- ", " }}
+            {%- endif %}
+        {%- endfor %}
+        {{- ")" }}
+        {%- if tool.return is defined %}
+            {{- " -> " + json_to_python_type(tool.return) }}
+        {%- endif %}
+        {{- " - " + tool.description + "\n\n" }}
+        {%- for param_name, param_fields in tool.parameters.properties|items %}
+            {%- if loop.first %}
+                {{- "    Args:\n" }}
+            {%- endif %}
+            {{- "        " + param_name + "(" + json_to_python_type(param_fields) + "): " + param_fields.description|trim }}
+        {%- endfor %}
+        {%- if tool.return is defined and tool.return.description is defined %}
+            {{- "\n    Returns:\n        " + tool.return.description }}
+        {%- endif %}
+        {{- '"' }}
+        {{- ', "parameters": ' }}
+        {%- if tool.parameters.properties | length == 0 %}
+            {{- "{}" }}
+        {%- else %}
+            {{- tool.parameters|tojson }}
+        {%- endif %}
+        {{- "}" }}
+        {%- if not loop.last %}
+            {{- "\n" }}
+        {%- endif %}
+    {%- endfor %}
+    {{- " </tools>" }}
+    {{- 'Use the following pydantic model json schema for each tool call you will make: {"properties": {"arguments": {"title": "Arguments", "type": "object"}, "name": {"title": "Name", "type": "string"}}, "required": ["arguments", "name"], "title": "FunctionCall", "type": "object"}
+' }}
+    {{- "For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
+" }}
+    {{- "<tool_call>
+" }}
+    {{- '{"name": <function-name>, "arguments": <args-dict>}
+' }}
+    {{- '</tool_call><|im_end|>' }}
+{%- endif %}
+{%- for message in messages %}
+    {%- if message.role == "user" or message.role == "system" or (message.role == "assistant" and message.tool_calls is not defined) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role  }}
+        {%- for tool_call in message.tool_calls %}
+            {{- '\n<tool_call>\n' }}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '{ ' }}
+            {%- if tool_call.arguments is defined %}
+                {{- '"arguments": ' }}
+                {{- tool_call.arguments|tojson }}
+                {{- ', ' }}
+            {%- endif %}
+            {{- '"name": "' }}
+            {{- tool_call.name }}
+            {{- '"}' }}
+            {{- '\n</tool_call> ' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if not message.name is defined %}
+            {{- raise_exception("Tool response dicts require a 'name' key indicating the name of the called function!") }}
+        {%- endif %}
+        {{- '<|im_start|>' + message.role + '\n<tool_response>\n' }}
+        {{- '{"name": "' }}
+        {{- message.name }}
+        {{- '", "content": ' }}
+        {{- message.content|tojson + '}' }}
+        {{- '\n</tool_response> <|im_end|>\n' }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}
diff --git a/examples/tool_chat_template_mistral.jinja b/examples/tool_chat_template_mistral.jinja
@@ -0,0 +1 @@
+{{ bos_token }}{% set user_messages = messages | selectattr('role', 'equalto', 'user') | list %}{% for message in messages %}{% if message['role'] == 'user' %}{% if message == user_messages[-1] %}{% if tools %}{{ '[AVAILABLE_TOOLS]'+ tools|string + '[/AVAILABLE_TOOLS]' }}{% endif %}{{ '[INST]' + message['content'] + '[/INST]' }}{% else %}{{ '[INST]' + message['content'] + '[/INST]' }}{% endif %}{% elif message['role'] == 'assistant' and message['tool_calls'] and message['tool_calls']|length > 0 %}{{ '[TOOL_CALLS]' + message['tool_calls']|string + eos_token }}{% elif message['role'] == 'assistant' %}{{ ' ' + message['content'] + ' ' + eos_token }}{% elif message['role'] == 'tool' %}{{ '[TOOL_RESULTS]' + message['content']|string + '[/TOOL_RESULTS]' }}{% endif %}{% endfor %}
diff --git a/requirements-common.txt b/requirements-common.txt
@@ -21,4 +21,5 @@ lm-format-enforcer == 0.10.3
 outlines >= 0.0.43, < 0.1 # Requires torch >= 2.1.0
 typing_extensions
 filelock >= 3.10.4 # filelock starts to support `mode` argument from 3.10.4
+partial-json-parser # used for parsing partial JSON outputs
 pyzmq
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{{ bos_token }}{% set user_messages = messages \| selectattr('role', 'equalto', 'user') \| list %}{% for message in messages %}{% if message['role'] == 'user' %}{% if message == user_messages[-1] %}{% if tools %}{{ '[AVAILABLE_TOOLS]'+ tools\|string + '[/AVAILABLE_TOOLS]' }}{% endif %}{{ '[INST]' + message['content'] + '[/INST]' }}{% else %}{{ '[INST]' + message['content'] + '[/INST]' }}{% endif %}{% elif message['role'] == 'assistant' and message['tool_calls'] and message['tool_calls']\|length > 0 %}{{ '[TOOL_CALLS]' + message['tool_calls']\|string + eos_token }}{% elif message['role'] == 'assistant' %}{{ ' ' + message['content'] + ' ' + eos_token }}{% elif message['role'] == 'tool' %}{{ '[TOOL_RESULTS]' + message['content']\|string + '[/TOOL_RESULTS]' }}{% endif %}{% endfor %}
K-Mistele marked this conversation as resolved. Show resolved Hide resolved