Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models #5649

Merged
merged 237 commits into from
Sep 4, 2024
Merged
Show file tree
Hide file tree
Changes from 236 commits
Commits
Show all changes
237 commits
Select commit Hold shift + click to select a range
6b762e5
add hermes 2 pro function calling template
Jun 16, 2024
606ec64
feat(example): add example chat completion with tool usage
Jun 16, 2024
d27446f
feat: add CLI argument for OpenAI API-style tool use system prompt ji…
Jun 17, 2024
bdf48a1
feat: add better validation for tool_choice, support for tool_choice …
Jun 17, 2024
9493c67
fix(cli): set OpenAI tool args in vllm/entrypoints/openai/cli_args.py…
Jun 18, 2024
35c9aa7
fix(types): add "auto" as an option for tool_choice in pydantic models
Jun 18, 2024
8ff80fb
fix: validation - guided decoding not valid with tool_choice = auto
Jun 18, 2024
3c4acb1
feat: ensure guided deoding is only applied when tool_choice is NOT "…
Jun 18, 2024
9c5ef66
feat(cli): update CLI args for auto tool choice and OpenAIServingChat…
Jun 18, 2024
f1a1e7b
feat: add loading in the system prompt jinja template if specified; a…
Jun 18, 2024
07a67d2
wip: add a case for tool choice = auto when handling chat completion …
Jun 19, 2024
590a559
fix: hermes 2 pro prompt template to prevent newlines
Jun 19, 2024
6919acf
feat: handle building the system prompt via template with --enable-au…
Jun 19, 2024
db9c29a
fix: remove debugging log lines
Jun 19, 2024
c16aa9a
fix(template): update hermes 2 pro template with newlines to get newl…
Jun 20, 2024
96b2eb5
merge: main into tool use & resolve conflicts
Jul 1, 2024
9b9d861
feat: add hermes 2 pro and mistral full tool use chat templates
Jul 2, 2024
d221364
chore: delete old template
Jul 2, 2024
33c669b
feat: tool calls are now returned in the chat completion response
Jul 2, 2024
7214e70
fix: mistral chat template. replace huggingface-suggested one with th…
Jul 2, 2024
6a3c61e
feat: add mistral tool parser, and empty hermes tool parser. non-stre…
Jul 2, 2024
73046e6
feat: update example
Jul 2, 2024
d7311e6
chore: update FunctionCall type to allow arguments as a Dict. non-aut…
Jul 2, 2024
344b241
feat: clean up CLI arguments, engine. implement tool parser selection…
Jul 2, 2024
8740525
feat: add methods to FunctionCall and ToolCall in protocol to make it…
Jul 2, 2024
7265be5
fix: ensure finish_reason = "tool_calls" when a tool call is generated
Jul 2, 2024
a1207f2
unfinished: work on hermes 2 tool call parser
Jul 2, 2024
30ffa16
partial: hermes 2 pro tool parsing
Jul 3, 2024
294c99e
fix: hermes tool call parser, work on example
Jul 3, 2024
00be988
fix: tool call arguments should be returned as JSON string not as a l…
Jul 3, 2024
b70e7d7
feat: enable both content and tool_calls if the model allows
Jul 3, 2024
ece6182
feat: update example with tool call
Jul 3, 2024
705ca62
fix: typing-related issues for chat messages
Jul 3, 2024
b3a62e9
feat: fix lots of parsing & extraction issues to ensure tool calls & …
Jul 4, 2024
c2d1afc
chore: refactor tool extraction for non-streaming responses to be in …
Jul 4, 2024
2d4b302
fix: mistral tool calling chat template
Jul 5, 2024
1fcd4f5
fix: make ChatMessage content Optional since it could be an assistant…
Jul 5, 2024
2926c3e
refactor: move tool parsing into the right place in serving_chat, and…
Jul 5, 2024
fa082e0
fix: finish_reason should NEVER be None; OpenAI defualt is "stop"
Jul 7, 2024
8455786
fix: typo introduced in earlier commit
Jul 7, 2024
a97ccc7
feat: signature updates and refactoring to tool parser streaming; pre…
Jul 8, 2024
c697e9f
fix: kind of fixed mistral chat template
Jul 8, 2024
df877f6
feat: make non-streaming tool parsing a static method so that streami…
Jul 8, 2024
8fa57ae
partial: work on streaming tool call parser for mistral
Jul 8, 2024
301b02e
deps: add partial-json-parser for parsing streaming JSON
Jul 10, 2024
1364bc1
fix: protocol stuff, work on mistral streaming
Jul 10, 2024
cbd8919
feat: progress on mistral streaming parser
Jul 10, 2024
d480db6
fix: some tool parser stuff. best its working yet
Jul 10, 2024
305685e
fix: major parsing logic issue when overlapping prefix & suffix due t…
Jul 10, 2024
e47f70f
feat: implement mistral tool calling streaming for ONE TOOL ONLY RIGH…
Jul 10, 2024
d8f4487
feat: update openai client to showcase streaming
Jul 10, 2024
cfa6d03
fix: finish reason & debug logging
Jul 10, 2024
b2cb8fb
fix(docs): CLI argument description was bad
Jul 10, 2024
625584a
fix(parser): mistral tool parser issue that was giving me a stroke. A…
Jul 13, 2024
7a6f6ac
chore: update examples & logging
Jul 13, 2024
9eb2452
fix: accidentally broke non-tool streaming earlier; this fixes it
Jul 13, 2024
08bd8d0
fix: some stuff in the example, and some mistral stuff
Jul 13, 2024
62b9ad4
feat: work on hermes tool parser
Jul 13, 2024
6e53787
fix(serving_chat): issue with deep vs. shallow copy caused bug where …
Jul 17, 2024
26b97dc
feat: change ordering in hermes chat template so that function name i…
Jul 17, 2024
bfd1039
feat(tool_parsers): hermes 2 pro streaming parser
Jul 17, 2024
2ccb893
fix(docs): some type issue that the doc CI check did not like
Jul 18, 2024
45962f9
fix(docs): some type issue that the doc CI check did not like
Jul 18, 2024
de27e65
fix(types): try Optional[str] = None
Jul 18, 2024
862078b
fix: refactor tool chat template and add docs
Jul 18, 2024
f056927
chore: annotate that parallel_tool_calls will be ignored
Aug 1, 2024
9560591
fix: handle un-handled "theoretically unreachable" case because such …
Aug 1, 2024
b2ceb71
fix: hermes tool call template to omit tool-use system prompt if tool…
Aug 1, 2024
558461a
fix: implement access to tool call token IDs via tokenizer vocab in t…
Aug 1, 2024
f7f15fa
fix: hermes tool parser does not extract non-tool-call content the sa…
Aug 1, 2024
4356ec4
fix: grab the name properly from chat completions and fall back to em…
Aug 1, 2024
0abeb53
fix: type annotation
Aug 1, 2024
9c0e6d8
fix: mistral tool extraction when dealing with poor precision
Aug 1, 2024
45ecd68
doc: indicate that temperature should be set to 0 when doing mistral …
Aug 1, 2024
1e39aa0
fix(conflicts): resolve a metric f-ck ton of merge conflicts
Aug 2, 2024
f63efe8
fix: log levels
Aug 2, 2024
dc27bec
fix: more logging changes
Aug 2, 2024
85515c0
fix(ci): formatting
Aug 2, 2024
15aa9b4
fix: formatting
Aug 2, 2024
1cae627
fix: merge main+conflict resolution;fix --tool-call-parser being requ…
Aug 3, 2024
9380ad7
fix: remove unnecessary case that was artifact from previous approach
Aug 3, 2024
0390f8c
fix: validation errors
Aug 3, 2024
e29a62a
fix: hermes prompt template issue that occurred when passing multiple…
Aug 3, 2024
e393c66
fix: formatting & mypy fixes
Aug 3, 2024
8d1eac1
fix: more types
Aug 3, 2024
813c3c5
fix: more mypy fixes
Aug 3, 2024
f9da832
fix: formatting
Aug 3, 2024
08d54b1
fix: more mypy fixes
Aug 3, 2024
c87a6ec
fix: final mypy fixes
Aug 3, 2024
19eab7a
fix: finish_reason behavior was broken for non-streaming calls
Aug 3, 2024
0c72dc6
fix(test): ensure tool_choice="required" throws a BadRequestError, an…
Aug 5, 2024
ee3b6ad
fix: remove deprecated CLI argument
Aug 5, 2024
04ba399
fix: type
Aug 5, 2024
f2c1254
fix: remoev another assertion and replace with if/exception
Aug 5, 2024
eefbee5
fix: bad condition
Aug 5, 2024
d18e9c3
fix: cleaner concat
Aug 5, 2024
5d43a00
fix: clean up vode
Aug 5, 2024
092224c
chore: more cleanup
Aug 5, 2024
8f6029f
chore: clean up conditional and document better
Aug 5, 2024
1d856c7
fix(ci): broken tool streaming when using guided decoding
Aug 6, 2024
38635ad
fix: formatting
Aug 6, 2024
28da76c
chore: refactor tool parsers structure to make it more maintainable
Aug 6, 2024
76a27bd
fix(tests): raise a valueError that was being passed instead of raised
Aug 6, 2024
990a0e5
fix(PEP8): openai chat completion client with tools
Aug 6, 2024
751d5a8
fix(PEP8): chat_utils
Aug 6, 2024
834969f
fix(PEP8): protocol.py
Aug 6, 2024
fcd69d7
fix(PEP8): serving_chat & fix typo in protocol
Aug 7, 2024
c448637
fix(PEP8): Hermes Tool parser
Aug 7, 2024
bd0b3a7
fix(PEP8): format files with ./format --fix
Aug 7, 2024
66049d8
fix: docs; allow specifying the tool_use huggingface template
Aug 7, 2024
c106111
chore: formatting
Aug 7, 2024
643c792
fix: mistral chat template formatting
Aug 7, 2024
55ece00
feat: add official mistral 7B instruct v0.3 chat template
Aug 7, 2024
722501a
fix: patch official mistral template to handle vLLM-generated tool ca…
Aug 7, 2024
a70b013
fix: replace unofficial mistral chat template with official one
Aug 7, 2024
941bd03
chore(docs): update mistral tool calling docs to remove the notes abo…
Aug 7, 2024
0d0b556
fix(test): transformers no longer supports using a default chat templ…
Aug 7, 2024
8ec7588
fix: add chat template path for opt-125m since not specifying this is…
Aug 7, 2024
cd1c095
fix(test): cast posix path to string
Aug 7, 2024
eb8a1ea
fix(test): updated expected token count because of applying chatml te…
Aug 7, 2024
7fc67e5
chore: remove print
Aug 7, 2024
9b657a2
chore: merge main into constellate-ai:vllm/tool-use
Aug 7, 2024
f9ecd60
chore: fix more merge conflicts (I am going to jump off a bridge if I…
Aug 7, 2024
05b366f
fix: add chat template due to bumped transformers version
Aug 7, 2024
b417e2b
fix: tests
Aug 7, 2024
1bf96f7
fix: yapf
Aug 7, 2024
11dbdd7
fix: disable yapf for block conflicting with isort
Aug 7, 2024
869dc50
fix: merge conflicts & formatting
Aug 7, 2024
de33564
fix: tool_call_id was accidentally message content
Aug 7, 2024
122fdc3
fix: use double quotes in example
Aug 7, 2024
3b0589d
fix: use double quotes in chat_utils
Aug 7, 2024
8634184
fix: use double quotes in api_server
Aug 7, 2024
a952c15
fix: single quotes (that I added) in cli_args are now double quotes
Aug 7, 2024
cf85b1c
fix: double quotes in protocol.py
Aug 7, 2024
1f8ea1a
fix: double quotes in serving_chat
Aug 7, 2024
49fb3ae
fix: double quotes in abstracttoolparser
Aug 7, 2024
c45f824
fix: double quotes in hermes tool parser
Aug 7, 2024
9b7cbab
fix: double quotes in mistral tool parser
Aug 7, 2024
f63908f
fix: remove todo
Aug 7, 2024
8db2a0d
fix: remove deprecated to_dict method
Aug 7, 2024
3895fd9
fix: give comments their own line
Aug 7, 2024
d961519
fix: remove unused loader
Aug 7, 2024
43a6318
fix: formatting
Aug 7, 2024
7e90682
fix: indents in hermes tool parser by making cases better
Aug 8, 2024
c4c480c
fix: readability for hermes tool call parser
Aug 8, 2024
f7a0e76
fix: more hermes tool parser readability
Aug 8, 2024
34746fd
fix: more clarify updates to hermes parser
Aug 8, 2024
40dab79
fix: last hermes tool parser formatting & logic tweaks
Aug 8, 2024
ba26f22
fix: mistral tool call parser formatting
Aug 8, 2024
79e8bb3
fix: remove unnecessary else block in mistral tool call parser
Aug 8, 2024
844d265
fix: formatting and control flow for mistral tool parser
Aug 8, 2024
1fa2027
fix: formatting updates to mistral tool call parser
Aug 8, 2024
6fc1f75
fix: catch a silent exception in hermes tool parser that was generati…
Aug 8, 2024
a89e565
fix: refactoring & cleanup serving_chat
Aug 8, 2024
9f0a803
fix: two silent errors in mistral tool parser (not causing problems) …
Aug 8, 2024
b2a0884
fix: CLI args in docs & comments
Aug 9, 2024
698d112
fix: remove line about singlq quotes for mistral
Aug 9, 2024
2e6a48a
fix: update doc in mistral tool parser as well
Aug 9, 2024
9078126
fix: update mistral chat templates and docs for mistral tool calling
Aug 9, 2024
f627b48
format: serving_chat
Aug 9, 2024
ce9afeb
doc: in example, explain how to start server for usage
Aug 9, 2024
6a2757b
chore: docs in serving_chat and swap `or ""` for the correct type ann…
Aug 9, 2024
d83d357
merge vllm:main with constellate-ai/vllm:main with constellate-ai/vll…
Aug 9, 2024
66aa580
fix: patch the hermes chat template which was missing a quote (@Nous …
Aug 10, 2024
b43b8a9
tests: add parametrized pytest fixture, and begin adding tests
Aug 10, 2024
01d528c
fix: formatting
Aug 10, 2024
b9397e3
fix: test formatting that was causing format.sh to crash
Aug 10, 2024
d80ac42
test: add test to do tool choice
Aug 10, 2024
e9a857f
test: add test for tool calling with tool choice; very non-streaming …
Aug 12, 2024
1a2f8b2
fix(tests): download models before starting openai server to prevent …
Aug 13, 2024
d24ae67
try restructuring tests to make it work
Aug 13, 2024
2c8e82f
format: test_tools.py
Aug 13, 2024
8e61cb3
try fixing tests by removing my specific dtype=half and kv-cache-dtyp…
Aug 13, 2024
7605d9f
fix: download model with hf_hub to prevent mistral timeout
Aug 13, 2024
4c13a66
merge: main into tool-use
Aug 13, 2024
f39ae8f
test: parallel tool calls, re-trigger CI that was interrupted by hugg…
Aug 13, 2024
1f50cef
fix: print statements
Aug 14, 2024
b7de9de
fix(tests): allow passing in wait time to remote Open AI server and i…
Aug 14, 2024
bff5e18
test: add final tests for providing tool call responses with parallel…
Aug 14, 2024
1b417ba
refactor(tests): break tests out into multiple files for readability …
Aug 14, 2024
8365a25
fix: add consolidated.safetensor to ignore list for mistral in tool e…
Aug 15, 2024
445cf59
refactor(ci): move tool tests out of entrypoints fastcheck, and creat…
Aug 15, 2024
afc41d0
fix: refactor tool use with CI changes
Aug 15, 2024
1fd4648
doc: update docs to clarify recommended CLI options & available chat …
Aug 15, 2024
e2a1b79
fix: hermes tool template and set fixtures to be session-scoped
Aug 15, 2024
f7f8b92
fix: formatting
Aug 15, 2024
94047c7
chore(tests): move tool tests out of the fastcheck section
Aug 15, 2024
44b2e07
cleanup: range statement
Aug 15, 2024
6d36509
cleanup: unnecessary backslash
Aug 15, 2024
fb40e5f
cleanup: cli args
Aug 15, 2024
c1d3110
fix: examples
Aug 16, 2024
5fb1a41
fix(tests): set max tokens to a lower number that is about 30-50% abo…
Aug 17, 2024
bbb5b27
fix: exceptions in tool validation should be raised, not returned, so…
Aug 17, 2024
17292b0
chore: cleanup debug lines in tool parsers to remove less-relevant ones
Aug 17, 2024
6830f90
fix: remove comments
Aug 17, 2024
c87a81f
refactor: tool call parsers no longer use static methods
Aug 17, 2024
6f5e585
fix: remove cruft
Aug 17, 2024
f2b0ee8
fix: comment on its own line
Aug 17, 2024
fb2db3a
fix: remove print()
Aug 17, 2024
1f4eeff
refactor: types based on earlier refactor of hermes and mistral tool …
Aug 17, 2024
f14e3e5
fix(tests): make max wait timeout for RemoteOpenAIServer an instance …
Aug 18, 2024
e21cfa8
fix: refactor RemoteOpenAIServer to use a default class var to preven…
Aug 18, 2024
211b024
fix(test): dtype in openai/test_oot_registration
Aug 21, 2024
120bc22
merge: main into tool-use
Aug 21, 2024
f6fa6df
fix: problems caused by resolution of merge conflicts
Aug 22, 2024
e4222a5
fix(tests): set max model len
Aug 22, 2024
8caf6f8
fix: hermes system prompt in chat template was missing <|im_start|>sy…
Aug 22, 2024
ebdcef9
fix(tests): double RemoteOpenAIServer start timeout suggested by @mgoin
Aug 22, 2024
bdd01bb
fix: remove cruft
Aug 22, 2024
8f49d9e
fix: spacing in doc strings
Aug 22, 2024
11c751d
fix: make enable_auto_tools a boolean not optional
Aug 22, 2024
dc5db10
fix: whitespace
Aug 22, 2024
13a4fb1
fix: formatting
Aug 23, 2024
30238f2
delete: old file
Aug 23, 2024
e548b2d
refactor: tool parsers to use AnyTokenizer
Aug 23, 2024
ad9a8ff
refactor: util.py -> utils.py
Aug 23, 2024
79ab929
refactor: utils
Aug 23, 2024
bba7394
fix: cruft
Aug 23, 2024
e11536f
fix: unnecessary union in type
Aug 23, 2024
bea0f56
fix: type
Aug 23, 2024
d5b169b
fix: yapf might not need to be disabled
Aug 23, 2024
f73e089
fix: exception in hermes chat template thats unnecessary cc @interste…
Aug 25, 2024
477003c
fix: need to check if tool calls is present regardless of "content" t…
Aug 25, 2024
df85e12
fix: type narrowing & remove unused var
Aug 30, 2024
c6d1bf1
fix: format
Aug 30, 2024
e70160d
refactor: tool arguments in assistant-role messages with tool_calls s…
Aug 31, 2024
d14f42d
fix: merge conflicts in main
Aug 31, 2024
6db7f7b
fix: hermes tool parser issue
Aug 31, 2024
67cd2e1
Merge branch 'main' into tool-use
Aug 31, 2024
cb55c08
fix: mypy stuff in tool parsers
Aug 31, 2024
a70e826
fix: remove cruft
Aug 31, 2024
2022c1e
fix: merge
Sep 3, 2024
165c026
fix: merge conflict + type issues
Sep 3, 2024
ca8c14b
Merge branch 'main' into tool-use
Sep 4, 2024
4972a89
fix(tests): update pytest fixture for client based on #7565
Sep 4, 2024
d6728b2
fix: docs
Sep 4, 2024
a2ed57c
merge: main
Sep 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ steps:
- pytest -v -s entrypoints/openai
- pytest -v -s entrypoints/test_chat_utils.py


- label: Distributed Tests (4 GPUs) # 10min
working_dir: "/vllm-workspace/tests"
num_gpus: 4
Expand Down Expand Up @@ -271,6 +272,15 @@ steps:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- bash ./run-tests.sh -c configs/models-small.txt -t 1

- label: OpenAI-Compatible Tool Use # 20 min
fast_check: false
mirror_hardwares: [ amd ]
source_file_dependencies:
- vllm/
- tests/tool_use
commands:
- pytest -v -s tool_use

##### 1 GPU test #####
##### multi gpus test #####

Expand Down
58 changes: 54 additions & 4 deletions docs/source/serving/openai_compatible_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,14 @@ directory [here](https://github.com/vllm-project/vllm/tree/main/examples/)
:func: create_parser_for_docs
:prog: vllm serve
```
## Tool Calling in the Chat Completion API
### Named Function Calling
vLLM supports only named function calling in the chat completion API by default. It does so using Outlines, so this is
enabled by default, and will work with any supported model. You are guaranteed a validly-parsable function call - not a
high-quality one.

To use a named function, you need to define the functions in the `tools` parameter of the chat completion request, and
specify the `name` of one of the tools in the `tool_choice` parameter of the chat completion request.

### Config file

Expand Down Expand Up @@ -140,10 +148,52 @@ The order of priorities is `command line > config file values > defaults`.
## Tool calling in the chat completion API
vLLM supports only named function calling in the chat completion API. The `tool_choice` options `auto` and `required` are **not yet supported** but on the roadmap.

To use a named function you need to define the function in the `tools` parameter and call it in the `tool_choice` parameter.

It is the callers responsibility to prompt the model with the tool information, vLLM will not automatically manipulate the prompt. **This may change in the future.**
It is the callers responsibility to prompt the model with the tool information, vLLM will not automatically manipulate the prompt.

vLLM will use guided decoding to ensure the response matches the tool parameter object defined by the JSON schema in the `tools` parameter.

Please refer to the OpenAI API reference documentation for more information.

### Automatic Function Calling
To enable this feature, you should set the following flags:
* `--enable-auto-tool-choice` -- **mandatory** Auto tool choice. tells vLLM that you want to enable the model to generate its own tool calls when it
deems appropriate.
* `--tool-call-parser` -- select the tool parser to use - currently either `hermes` or `mistral`. Additional tool parsers
will continue to be added in the future.
* `--chat-template` -- **optional** for auto tool choice. the path to the chat template which handles `tool`-role messages and `assistant`-role messages
that contain previously generated tool calls. Hermes and Mistral models have tool-compatible chat templates in their
`tokenizer_config.json` files, but you can specify a custom template. This argument can be set to `tool_use` if your model has a tool use-specific chat
template configured in the `tokenizer_config.json`. In this case, it will be used per the `transformers` specification. More on this [here](https://huggingface.co/docs/transformers/en/chat_templating#why-do-some-models-have-multiple-templates)
from HuggingFace; and you can find an example of this in a `tokenizer_config.json` [here](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B/blob/main/tokenizer_config.json)

If your favorite tool-calling model is not supported, please feel free to contribute a parser & tool use chat template!

#### Hermes Models
All Nous Research Hermes-series models newer than Hermes 2 Pro should be supported.
* `NousResearch/Hermes-2-Pro-*`
* `NousResearch/Hermes-2-Theta-*`
* `NousResearch/Hermes-3-*`


_Note that the Hermes 2 **Theta** models are known to have degraded tool call quality & capabilities due to the merge
step in their creation_.

Flags: `--tool-call-parser hermes`

#### Mistral Models
Supported models:
* `mistralai/Mistral-7B-Instruct-v0.3` (confirmed)
* Additional mistral function-calling models are compatible as well.

Known issues:
1. Mistral 7B struggles to generate parallel tool calls correctly.
2. Mistral's `tokenizer_config.json` chat template requires tool call IDs that are exactly 9 digits, which is
much shorter than what vLLM generates. Since an exception is thrown when this condition
is not met, the following additional chat templates are provided:

* `examples/tool_chat_template_mistral.jinja` - this is the "official" Mistral chat template, but tweaked so that
it works with vLLM's tool call IDs (provided `tool_call_id` fields are truncated to the last 9 digits)
* `examples/tool_chat_template_mistral_parallel.jinja` - this is a "better" version that adds a tool-use system prompt
when tools are provided, that results in much better reliability when working with parallel tool calling.


Recommended flags: `--tool-call-parser mistral --chat-template examples/tool_chat_template_mistral_parallel.jinja`
162 changes: 162 additions & 0 deletions examples/openai_chat_completion_client_with_tools.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
"""
Set up this example by starting a vLLM OpenAI-compatible server with tool call
options enabled. For example:
IMPORTANT: for mistral, you must use one of the provided mistral tool call
templates, or your own - the model default doesn't work for tool calls with vLLM
See the vLLM docs on OpenAI server & tool calling for more details.
vllm serve --model mistralai/Mistral-7B-Instruct-v0.3 \
--chat-template examples/tool_chat_template_mistral.jinja \
--enable-auto-tool-choice --tool-call-parser mistral
OR
vllm serve --model NousResearch/Hermes-2-Pro-Llama-3-8B \
--chat-template examples/tool_chat_template_hermes.jinja \
--enable-auto-tool-choice --tool-call-parser hermes
"""
import json

from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
# defaults to os.environ.get("OPENAI_API_KEY")
api_key=openai_api_key,
base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

tools = [{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"city": {
"type":
"string",
"description":
"The city to find the weather for, e.g. 'San Francisco'"
},
"state": {
"type":
"string",
"description":
"the two-letter abbreviation for the state that the city is"
" in, e.g. 'CA' which would mean 'California'"
},
"unit": {
"type": "string",
"description": "The unit to fetch the temperature in",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["city", "state", "unit"]
}
}
}]

messages = [{
"role": "user",
"content": "Hi! How are you doing today?"
}, {
"role": "assistant",
"content": "I'm doing well! How can I help you?"
}, {
"role":
"user",
"content":
"Can you tell me what the temperate will be in Dallas, in fahrenheit?"
}]

chat_completion = client.chat.completions.create(messages=messages,
model=model,
tools=tools)

print("Chat completion results:")
print(chat_completion)
print("\n\n")

tool_calls_stream = client.chat.completions.create(messages=messages,
model=model,
tools=tools,
stream=True)

chunks = []
for chunk in tool_calls_stream:
chunks.append(chunk)
if chunk.choices[0].delta.tool_calls:
print(chunk.choices[0].delta.tool_calls[0])
else:
print(chunk.choices[0].delta)

arguments = []
tool_call_idx = -1
for chunk in chunks:

if chunk.choices[0].delta.tool_calls:
tool_call = chunk.choices[0].delta.tool_calls[0]

if tool_call.index != tool_call_idx:
if tool_call_idx >= 0:
print(
f"streamed tool call arguments: {arguments[tool_call_idx]}"
)
tool_call_idx = chunk.choices[0].delta.tool_calls[0].index
arguments.append("")
if tool_call.id:
print(f"streamed tool call id: {tool_call.id} ")

if tool_call.function:
if tool_call.function.name:
print(f"streamed tool call name: {tool_call.function.name}")

if tool_call.function.arguments:
arguments[tool_call_idx] += tool_call.function.arguments

if len(arguments):
print(f"streamed tool call arguments: {arguments[-1]}")

print("\n\n")

messages.append({
"role": "assistant",
"tool_calls": chat_completion.choices[0].message.tool_calls
})


# Now, simulate a tool call
def get_current_weather(city: str, state: str, unit: 'str'):
return ("The weather in Dallas, Texas is 85 degrees fahrenheit. It is "
"partly cloudly, with highs in the 90's.")


available_tools = {"get_current_weather": get_current_weather}

completion_tool_calls = chat_completion.choices[0].message.tool_calls
for call in completion_tool_calls:
tool_to_call = available_tools[call.function.name]
args = json.loads(call.function.arguments)
result = tool_to_call(**args)
print(result)
messages.append({
"role": "tool",
"content": result,
"tool_call_id": call.id,
"name": call.function.name
})

chat_completion_2 = client.chat.completions.create(messages=messages,
model=model,
tools=tools,
stream=False)
print("\n\n")
print(chat_completion_2)
129 changes: 129 additions & 0 deletions examples/tool_chat_template_hermes.jinja
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
{%- macro json_to_python_type(json_spec) %}
{%- set basic_type_map = {
"string": "str",
"number": "float",
"integer": "int",
"boolean": "bool"
} %}

{%- if basic_type_map[json_spec.type] is defined %}
{{- basic_type_map[json_spec.type] }}
{%- elif json_spec.type == "array" %}
{{- "list[" + json_to_python_type(json_spec|items) + "]" }}
{%- elif json_spec.type == "object" %}
{%- if json_spec.additionalProperties is defined %}
{{- "dict[str, " + json_to_python_type(json_spec.additionalProperties) + ']' }}
{%- else %}
{{- "dict" }}
{%- endif %}
{%- elif json_spec.type is iterable %}
{{- "Union[" }}
{%- for t in json_spec.type %}
{{- json_to_python_type({"type": t}) }}
{%- if not loop.last %}
{{- "," }}
{%- endif %}
{%- endfor %}
{{- "]" }}
{%- else %}
{{- "Any" }}
{%- endif %}
{%- endmacro %}


{{- bos_token }}
{{- "<|im_start|>system\nYou are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools> " }}
{%- if tools is iterable and tools | length > 0 %}
{%- for tool in tools %}
{%- if tool.function is defined %}
{%- set tool = tool.function %}
{%- endif %}
{{- '{"type": "function", "function": ' }}
{{- '{"name": "' + tool.name + '", ' }}
{{- '"description": "' + tool.name + '(' }}
{%- for param_name, param_fields in tool.parameters.properties|items %}
{{- param_name + ": " + json_to_python_type(param_fields) }}
{%- if not loop.last %}
{{- ", " }}
{%- endif %}
{%- endfor %}
{{- ")" }}
{%- if tool.return is defined %}
{{- " -> " + json_to_python_type(tool.return) }}
{%- endif %}
{{- " - " + tool.description + "\n\n" }}
{%- for param_name, param_fields in tool.parameters.properties|items %}
{%- if loop.first %}
{{- " Args:\n" }}
{%- endif %}
{{- " " + param_name + "(" + json_to_python_type(param_fields) + "): " + param_fields.description|trim }}
{%- endfor %}
{%- if tool.return is defined and tool.return.description is defined %}
{{- "\n Returns:\n " + tool.return.description }}
{%- endif %}
{{- '"' }}
{{- ', "parameters": ' }}
{%- if tool.parameters.properties | length == 0 %}
{{- "{}" }}
{%- else %}
{{- tool.parameters|tojson }}
{%- endif %}
{{- "}" }}
{%- if not loop.last %}
{{- "\n" }}
{%- endif %}
{%- endfor %}
{%- endif %}
{{- " </tools>" }}
{{- 'Use the following pydantic model json schema for each tool call you will make: {"properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"], "title": "FunctionCall", "type": "object"}}
' }}
{{- "For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
" }}
{{- "<tool_call>
" }}
{{- '{"name": <function-name>, "arguments": <args-dict>}
' }}
{{- '</tool_call><|im_end|>' }}
{%- for message in messages %}
{%- if message.role == "user" or message.role == "system" or (message.role == "assistant" and message.tool_calls is not defined) %}
{{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
{%- elif message.role == "assistant" and message.tool_calls is defined %}
{{- '<|im_start|>' + message.role }}
{%- for tool_call in message.tool_calls %}
{{- '\n<tool_call>\n' }}
{%- if tool_call.function is defined %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '{' }}
{{- '"name": "' }}
{{- tool_call.name }}
{{- '"}' }}
{{- ', ' }}
{%- if tool_call.arguments is defined %}
{{- '"arguments": ' }}
{{- tool_call.arguments|tojson }}
{%- endif %}
{{- '\n</tool_call>' }}
{%- endfor %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if loop.previtem and loop.previtem.role != "tool" %}
{{- '<|im_start|>tool\n' }}
{%- endif %}
{{- '<tool_response>\n' }}
{{- message.content }}
{%- if not loop.last %}
{{- '\n</tool_response>\n' }}
{%- else %}
{{- '\n</tool_response>' }}
{%- endif %}
{%- if not loop.last and loop.nextitem.role != "tool" %}
{{- '<|im_end|>' }}
{%- elif loop.last %}
{{- '<|im_end|>' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- endif %}
Loading