Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models #5649

Merged
merged 237 commits into from
Sep 4, 2024
Merged
Show file tree
Hide file tree
Changes from 92 commits
Commits
Show all changes
237 commits
Select commit Hold shift + click to select a range
6b762e5
add hermes 2 pro function calling template
Jun 16, 2024
606ec64
feat(example): add example chat completion with tool usage
K-Mistele Jun 16, 2024
d27446f
feat: add CLI argument for OpenAI API-style tool use system prompt ji…
K-Mistele Jun 17, 2024
bdf48a1
feat: add better validation for tool_choice, support for tool_choice …
K-Mistele Jun 17, 2024
9493c67
fix(cli): set OpenAI tool args in vllm/entrypoints/openai/cli_args.py…
K-Mistele Jun 18, 2024
35c9aa7
fix(types): add "auto" as an option for tool_choice in pydantic models
K-Mistele Jun 18, 2024
8ff80fb
fix: validation - guided decoding not valid with tool_choice = auto
K-Mistele Jun 18, 2024
3c4acb1
feat: ensure guided deoding is only applied when tool_choice is NOT "…
K-Mistele Jun 18, 2024
9c5ef66
feat(cli): update CLI args for auto tool choice and OpenAIServingChat…
K-Mistele Jun 18, 2024
f1a1e7b
feat: add loading in the system prompt jinja template if specified; a…
K-Mistele Jun 18, 2024
07a67d2
wip: add a case for tool choice = auto when handling chat completion …
K-Mistele Jun 19, 2024
590a559
fix: hermes 2 pro prompt template to prevent newlines
K-Mistele Jun 19, 2024
6919acf
feat: handle building the system prompt via template with --enable-au…
K-Mistele Jun 19, 2024
db9c29a
fix: remove debugging log lines
K-Mistele Jun 19, 2024
c16aa9a
fix(template): update hermes 2 pro template with newlines to get newl…
K-Mistele Jun 20, 2024
96b2eb5
merge: main into tool use & resolve conflicts
K-Mistele Jul 1, 2024
9b9d861
feat: add hermes 2 pro and mistral full tool use chat templates
K-Mistele Jul 2, 2024
d221364
chore: delete old template
K-Mistele Jul 2, 2024
33c669b
feat: tool calls are now returned in the chat completion response
K-Mistele Jul 2, 2024
7214e70
fix: mistral chat template. replace huggingface-suggested one with th…
K-Mistele Jul 2, 2024
6a3c61e
feat: add mistral tool parser, and empty hermes tool parser. non-stre…
K-Mistele Jul 2, 2024
73046e6
feat: update example
K-Mistele Jul 2, 2024
d7311e6
chore: update FunctionCall type to allow arguments as a Dict. non-aut…
K-Mistele Jul 2, 2024
344b241
feat: clean up CLI arguments, engine. implement tool parser selection…
K-Mistele Jul 2, 2024
8740525
feat: add methods to FunctionCall and ToolCall in protocol to make it…
K-Mistele Jul 2, 2024
7265be5
fix: ensure finish_reason = "tool_calls" when a tool call is generated
K-Mistele Jul 2, 2024
a1207f2
unfinished: work on hermes 2 tool call parser
K-Mistele Jul 2, 2024
30ffa16
partial: hermes 2 pro tool parsing
K-Mistele Jul 3, 2024
294c99e
fix: hermes tool call parser, work on example
K-Mistele Jul 3, 2024
00be988
fix: tool call arguments should be returned as JSON string not as a l…
K-Mistele Jul 3, 2024
b70e7d7
feat: enable both content and tool_calls if the model allows
K-Mistele Jul 3, 2024
ece6182
feat: update example with tool call
K-Mistele Jul 3, 2024
705ca62
fix: typing-related issues for chat messages
K-Mistele Jul 3, 2024
b3a62e9
feat: fix lots of parsing & extraction issues to ensure tool calls & …
K-Mistele Jul 4, 2024
c2d1afc
chore: refactor tool extraction for non-streaming responses to be in …
K-Mistele Jul 4, 2024
2d4b302
fix: mistral tool calling chat template
K-Mistele Jul 5, 2024
1fcd4f5
fix: make ChatMessage content Optional since it could be an assistant…
K-Mistele Jul 5, 2024
2926c3e
refactor: move tool parsing into the right place in serving_chat, and…
K-Mistele Jul 5, 2024
fa082e0
fix: finish_reason should NEVER be None; OpenAI defualt is "stop"
K-Mistele Jul 7, 2024
8455786
fix: typo introduced in earlier commit
K-Mistele Jul 7, 2024
a97ccc7
feat: signature updates and refactoring to tool parser streaming; pre…
K-Mistele Jul 8, 2024
c697e9f
fix: kind of fixed mistral chat template
K-Mistele Jul 8, 2024
df877f6
feat: make non-streaming tool parsing a static method so that streami…
K-Mistele Jul 8, 2024
8fa57ae
partial: work on streaming tool call parser for mistral
K-Mistele Jul 8, 2024
301b02e
deps: add partial-json-parser for parsing streaming JSON
K-Mistele Jul 10, 2024
1364bc1
fix: protocol stuff, work on mistral streaming
K-Mistele Jul 10, 2024
cbd8919
feat: progress on mistral streaming parser
K-Mistele Jul 10, 2024
d480db6
fix: some tool parser stuff. best its working yet
K-Mistele Jul 10, 2024
305685e
fix: major parsing logic issue when overlapping prefix & suffix due t…
K-Mistele Jul 10, 2024
e47f70f
feat: implement mistral tool calling streaming for ONE TOOL ONLY RIGH…
K-Mistele Jul 10, 2024
d8f4487
feat: update openai client to showcase streaming
K-Mistele Jul 10, 2024
cfa6d03
fix: finish reason & debug logging
K-Mistele Jul 10, 2024
b2cb8fb
fix(docs): CLI argument description was bad
K-Mistele Jul 10, 2024
625584a
fix(parser): mistral tool parser issue that was giving me a stroke. A…
K-Mistele Jul 13, 2024
7a6f6ac
chore: update examples & logging
K-Mistele Jul 13, 2024
9eb2452
fix: accidentally broke non-tool streaming earlier; this fixes it
K-Mistele Jul 13, 2024
08bd8d0
fix: some stuff in the example, and some mistral stuff
K-Mistele Jul 13, 2024
62b9ad4
feat: work on hermes tool parser
K-Mistele Jul 13, 2024
6e53787
fix(serving_chat): issue with deep vs. shallow copy caused bug where …
K-Mistele Jul 17, 2024
26b97dc
feat: change ordering in hermes chat template so that function name i…
K-Mistele Jul 17, 2024
bfd1039
feat(tool_parsers): hermes 2 pro streaming parser
K-Mistele Jul 17, 2024
2ccb893
fix(docs): some type issue that the doc CI check did not like
K-Mistele Jul 18, 2024
45962f9
fix(docs): some type issue that the doc CI check did not like
K-Mistele Jul 18, 2024
de27e65
fix(types): try Optional[str] = None
K-Mistele Jul 18, 2024
862078b
fix: refactor tool chat template and add docs
K-Mistele Jul 18, 2024
f056927
chore: annotate that parallel_tool_calls will be ignored
K-Mistele Aug 1, 2024
9560591
fix: handle un-handled "theoretically unreachable" case because such …
K-Mistele Aug 1, 2024
b2ceb71
fix: hermes tool call template to omit tool-use system prompt if tool…
K-Mistele Aug 1, 2024
558461a
fix: implement access to tool call token IDs via tokenizer vocab in t…
K-Mistele Aug 1, 2024
f7f15fa
fix: hermes tool parser does not extract non-tool-call content the sa…
K-Mistele Aug 1, 2024
4356ec4
fix: grab the name properly from chat completions and fall back to em…
K-Mistele Aug 1, 2024
0abeb53
fix: type annotation
K-Mistele Aug 1, 2024
9c0e6d8
fix: mistral tool extraction when dealing with poor precision
K-Mistele Aug 1, 2024
45ecd68
doc: indicate that temperature should be set to 0 when doing mistral …
K-Mistele Aug 1, 2024
1e39aa0
fix(conflicts): resolve a metric f-ck ton of merge conflicts
K-Mistele Aug 2, 2024
f63efe8
fix: log levels
K-Mistele Aug 2, 2024
dc27bec
fix: more logging changes
K-Mistele Aug 2, 2024
85515c0
fix(ci): formatting
K-Mistele Aug 2, 2024
15aa9b4
fix: formatting
K-Mistele Aug 2, 2024
1cae627
fix: merge main+conflict resolution;fix --tool-call-parser being requ…
K-Mistele Aug 3, 2024
9380ad7
fix: remove unnecessary case that was artifact from previous approach
K-Mistele Aug 3, 2024
0390f8c
fix: validation errors
K-Mistele Aug 3, 2024
e29a62a
fix: hermes prompt template issue that occurred when passing multiple…
K-Mistele Aug 3, 2024
e393c66
fix: formatting & mypy fixes
K-Mistele Aug 3, 2024
8d1eac1
fix: more types
K-Mistele Aug 3, 2024
813c3c5
fix: more mypy fixes
K-Mistele Aug 3, 2024
f9da832
fix: formatting
K-Mistele Aug 3, 2024
08d54b1
fix: more mypy fixes
K-Mistele Aug 3, 2024
c87a6ec
fix: final mypy fixes
K-Mistele Aug 3, 2024
19eab7a
fix: finish_reason behavior was broken for non-streaming calls
K-Mistele Aug 3, 2024
0c72dc6
fix(test): ensure tool_choice="required" throws a BadRequestError, an…
K-Mistele Aug 5, 2024
ee3b6ad
fix: remove deprecated CLI argument
K-Mistele Aug 5, 2024
04ba399
fix: type
K-Mistele Aug 5, 2024
f2c1254
fix: remoev another assertion and replace with if/exception
K-Mistele Aug 5, 2024
eefbee5
fix: bad condition
K-Mistele Aug 5, 2024
d18e9c3
fix: cleaner concat
K-Mistele Aug 5, 2024
5d43a00
fix: clean up vode
K-Mistele Aug 5, 2024
092224c
chore: more cleanup
K-Mistele Aug 5, 2024
8f6029f
chore: clean up conditional and document better
K-Mistele Aug 5, 2024
1d856c7
fix(ci): broken tool streaming when using guided decoding
K-Mistele Aug 6, 2024
38635ad
fix: formatting
K-Mistele Aug 6, 2024
28da76c
chore: refactor tool parsers structure to make it more maintainable
K-Mistele Aug 6, 2024
76a27bd
fix(tests): raise a valueError that was being passed instead of raised
K-Mistele Aug 6, 2024
990a0e5
fix(PEP8): openai chat completion client with tools
K-Mistele Aug 6, 2024
751d5a8
fix(PEP8): chat_utils
K-Mistele Aug 6, 2024
834969f
fix(PEP8): protocol.py
K-Mistele Aug 6, 2024
fcd69d7
fix(PEP8): serving_chat & fix typo in protocol
K-Mistele Aug 7, 2024
c448637
fix(PEP8): Hermes Tool parser
K-Mistele Aug 7, 2024
bd0b3a7
fix(PEP8): format files with ./format --fix
K-Mistele Aug 7, 2024
66049d8
fix: docs; allow specifying the tool_use huggingface template
K-Mistele Aug 7, 2024
c106111
chore: formatting
K-Mistele Aug 7, 2024
643c792
fix: mistral chat template formatting
K-Mistele Aug 7, 2024
55ece00
feat: add official mistral 7B instruct v0.3 chat template
K-Mistele Aug 7, 2024
722501a
fix: patch official mistral template to handle vLLM-generated tool ca…
K-Mistele Aug 7, 2024
a70b013
fix: replace unofficial mistral chat template with official one
K-Mistele Aug 7, 2024
941bd03
chore(docs): update mistral tool calling docs to remove the notes abo…
K-Mistele Aug 7, 2024
0d0b556
fix(test): transformers no longer supports using a default chat templ…
K-Mistele Aug 7, 2024
8ec7588
fix: add chat template path for opt-125m since not specifying this is…
K-Mistele Aug 7, 2024
cd1c095
fix(test): cast posix path to string
K-Mistele Aug 7, 2024
eb8a1ea
fix(test): updated expected token count because of applying chatml te…
K-Mistele Aug 7, 2024
7fc67e5
chore: remove print
K-Mistele Aug 7, 2024
9b657a2
chore: merge main into constellate-ai:vllm/tool-use
K-Mistele Aug 7, 2024
f9ecd60
chore: fix more merge conflicts (I am going to jump off a bridge if I…
K-Mistele Aug 7, 2024
05b366f
fix: add chat template due to bumped transformers version
K-Mistele Aug 7, 2024
b417e2b
fix: tests
K-Mistele Aug 7, 2024
1bf96f7
fix: yapf
K-Mistele Aug 7, 2024
11dbdd7
fix: disable yapf for block conflicting with isort
K-Mistele Aug 7, 2024
869dc50
fix: merge conflicts & formatting
K-Mistele Aug 7, 2024
de33564
fix: tool_call_id was accidentally message content
K-Mistele Aug 7, 2024
122fdc3
fix: use double quotes in example
K-Mistele Aug 7, 2024
3b0589d
fix: use double quotes in chat_utils
K-Mistele Aug 7, 2024
8634184
fix: use double quotes in api_server
K-Mistele Aug 7, 2024
a952c15
fix: single quotes (that I added) in cli_args are now double quotes
K-Mistele Aug 7, 2024
cf85b1c
fix: double quotes in protocol.py
K-Mistele Aug 7, 2024
1f8ea1a
fix: double quotes in serving_chat
K-Mistele Aug 7, 2024
49fb3ae
fix: double quotes in abstracttoolparser
K-Mistele Aug 7, 2024
c45f824
fix: double quotes in hermes tool parser
K-Mistele Aug 7, 2024
9b7cbab
fix: double quotes in mistral tool parser
K-Mistele Aug 7, 2024
f63908f
fix: remove todo
K-Mistele Aug 7, 2024
8db2a0d
fix: remove deprecated to_dict method
K-Mistele Aug 7, 2024
3895fd9
fix: give comments their own line
K-Mistele Aug 7, 2024
d961519
fix: remove unused loader
K-Mistele Aug 7, 2024
43a6318
fix: formatting
K-Mistele Aug 7, 2024
7e90682
fix: indents in hermes tool parser by making cases better
K-Mistele Aug 8, 2024
c4c480c
fix: readability for hermes tool call parser
K-Mistele Aug 8, 2024
f7a0e76
fix: more hermes tool parser readability
K-Mistele Aug 8, 2024
34746fd
fix: more clarify updates to hermes parser
K-Mistele Aug 8, 2024
40dab79
fix: last hermes tool parser formatting & logic tweaks
K-Mistele Aug 8, 2024
ba26f22
fix: mistral tool call parser formatting
K-Mistele Aug 8, 2024
79e8bb3
fix: remove unnecessary else block in mistral tool call parser
K-Mistele Aug 8, 2024
844d265
fix: formatting and control flow for mistral tool parser
K-Mistele Aug 8, 2024
1fa2027
fix: formatting updates to mistral tool call parser
K-Mistele Aug 8, 2024
6fc1f75
fix: catch a silent exception in hermes tool parser that was generati…
K-Mistele Aug 8, 2024
a89e565
fix: refactoring & cleanup serving_chat
K-Mistele Aug 8, 2024
9f0a803
fix: two silent errors in mistral tool parser (not causing problems) …
K-Mistele Aug 8, 2024
b2a0884
fix: CLI args in docs & comments
K-Mistele Aug 9, 2024
698d112
fix: remove line about singlq quotes for mistral
K-Mistele Aug 9, 2024
2e6a48a
fix: update doc in mistral tool parser as well
K-Mistele Aug 9, 2024
9078126
fix: update mistral chat templates and docs for mistral tool calling
K-Mistele Aug 9, 2024
f627b48
format: serving_chat
K-Mistele Aug 9, 2024
ce9afeb
doc: in example, explain how to start server for usage
K-Mistele Aug 9, 2024
6a2757b
chore: docs in serving_chat and swap `or ""` for the correct type ann…
K-Mistele Aug 9, 2024
d83d357
merge vllm:main with constellate-ai/vllm:main with constellate-ai/vll…
K-Mistele Aug 9, 2024
66aa580
fix: patch the hermes chat template which was missing a quote (@Nous …
K-Mistele Aug 10, 2024
b43b8a9
tests: add parametrized pytest fixture, and begin adding tests
K-Mistele Aug 10, 2024
01d528c
fix: formatting
K-Mistele Aug 10, 2024
b9397e3
fix: test formatting that was causing format.sh to crash
K-Mistele Aug 10, 2024
d80ac42
test: add test to do tool choice
K-Mistele Aug 10, 2024
e9a857f
test: add test for tool calling with tool choice; very non-streaming …
K-Mistele Aug 12, 2024
1a2f8b2
fix(tests): download models before starting openai server to prevent …
K-Mistele Aug 13, 2024
d24ae67
try restructuring tests to make it work
K-Mistele Aug 13, 2024
2c8e82f
format: test_tools.py
K-Mistele Aug 13, 2024
8e61cb3
try fixing tests by removing my specific dtype=half and kv-cache-dtyp…
K-Mistele Aug 13, 2024
7605d9f
fix: download model with hf_hub to prevent mistral timeout
K-Mistele Aug 13, 2024
4c13a66
merge: main into tool-use
K-Mistele Aug 13, 2024
f39ae8f
test: parallel tool calls, re-trigger CI that was interrupted by hugg…
K-Mistele Aug 13, 2024
1f50cef
fix: print statements
K-Mistele Aug 14, 2024
b7de9de
fix(tests): allow passing in wait time to remote Open AI server and i…
K-Mistele Aug 14, 2024
bff5e18
test: add final tests for providing tool call responses with parallel…
K-Mistele Aug 14, 2024
1b417ba
refactor(tests): break tests out into multiple files for readability …
K-Mistele Aug 14, 2024
8365a25
fix: add consolidated.safetensor to ignore list for mistral in tool e…
K-Mistele Aug 15, 2024
445cf59
refactor(ci): move tool tests out of entrypoints fastcheck, and creat…
K-Mistele Aug 15, 2024
afc41d0
fix: refactor tool use with CI changes
K-Mistele Aug 15, 2024
1fd4648
doc: update docs to clarify recommended CLI options & available chat …
K-Mistele Aug 15, 2024
e2a1b79
fix: hermes tool template and set fixtures to be session-scoped
K-Mistele Aug 15, 2024
f7f8b92
fix: formatting
K-Mistele Aug 15, 2024
94047c7
chore(tests): move tool tests out of the fastcheck section
K-Mistele Aug 15, 2024
44b2e07
cleanup: range statement
K-Mistele Aug 15, 2024
6d36509
cleanup: unnecessary backslash
K-Mistele Aug 15, 2024
fb40e5f
cleanup: cli args
K-Mistele Aug 15, 2024
c1d3110
fix: examples
K-Mistele Aug 16, 2024
5fb1a41
fix(tests): set max tokens to a lower number that is about 30-50% abo…
K-Mistele Aug 17, 2024
bbb5b27
fix: exceptions in tool validation should be raised, not returned, so…
K-Mistele Aug 17, 2024
17292b0
chore: cleanup debug lines in tool parsers to remove less-relevant ones
K-Mistele Aug 17, 2024
6830f90
fix: remove comments
K-Mistele Aug 17, 2024
c87a81f
refactor: tool call parsers no longer use static methods
K-Mistele Aug 17, 2024
6f5e585
fix: remove cruft
K-Mistele Aug 17, 2024
f2b0ee8
fix: comment on its own line
K-Mistele Aug 17, 2024
fb2db3a
fix: remove print()
K-Mistele Aug 17, 2024
1f4eeff
refactor: types based on earlier refactor of hermes and mistral tool …
K-Mistele Aug 17, 2024
f14e3e5
fix(tests): make max wait timeout for RemoteOpenAIServer an instance …
K-Mistele Aug 18, 2024
e21cfa8
fix: refactor RemoteOpenAIServer to use a default class var to preven…
K-Mistele Aug 18, 2024
211b024
fix(test): dtype in openai/test_oot_registration
K-Mistele Aug 21, 2024
120bc22
merge: main into tool-use
K-Mistele Aug 21, 2024
f6fa6df
fix: problems caused by resolution of merge conflicts
K-Mistele Aug 22, 2024
e4222a5
fix(tests): set max model len
K-Mistele Aug 22, 2024
8caf6f8
fix: hermes system prompt in chat template was missing <|im_start|>sy…
K-Mistele Aug 22, 2024
ebdcef9
fix(tests): double RemoteOpenAIServer start timeout suggested by @mgoin
K-Mistele Aug 22, 2024
bdd01bb
fix: remove cruft
K-Mistele Aug 22, 2024
8f49d9e
fix: spacing in doc strings
K-Mistele Aug 22, 2024
11c751d
fix: make enable_auto_tools a boolean not optional
K-Mistele Aug 22, 2024
dc5db10
fix: whitespace
K-Mistele Aug 22, 2024
13a4fb1
fix: formatting
K-Mistele Aug 23, 2024
30238f2
delete: old file
K-Mistele Aug 23, 2024
e548b2d
refactor: tool parsers to use AnyTokenizer
K-Mistele Aug 23, 2024
ad9a8ff
refactor: util.py -> utils.py
K-Mistele Aug 23, 2024
79ab929
refactor: utils
K-Mistele Aug 23, 2024
bba7394
fix: cruft
K-Mistele Aug 23, 2024
e11536f
fix: unnecessary union in type
K-Mistele Aug 23, 2024
bea0f56
fix: type
K-Mistele Aug 23, 2024
d5b169b
fix: yapf might not need to be disabled
K-Mistele Aug 23, 2024
f73e089
fix: exception in hermes chat template thats unnecessary cc @interste…
K-Mistele Aug 25, 2024
477003c
fix: need to check if tool calls is present regardless of "content" t…
K-Mistele Aug 25, 2024
df85e12
fix: type narrowing & remove unused var
K-Mistele Aug 30, 2024
c6d1bf1
fix: format
K-Mistele Aug 30, 2024
e70160d
refactor: tool arguments in assistant-role messages with tool_calls s…
K-Mistele Aug 31, 2024
d14f42d
fix: merge conflicts in main
K-Mistele Aug 31, 2024
6db7f7b
fix: hermes tool parser issue
K-Mistele Aug 31, 2024
67cd2e1
Merge branch 'main' into tool-use
K-Mistele Aug 31, 2024
cb55c08
fix: mypy stuff in tool parsers
K-Mistele Aug 31, 2024
a70e826
fix: remove cruft
K-Mistele Aug 31, 2024
2022c1e
fix: merge
K-Mistele Sep 3, 2024
165c026
fix: merge conflict + type issues
K-Mistele Sep 3, 2024
ca8c14b
Merge branch 'main' into tool-use
K-Mistele Sep 4, 2024
4972a89
fix(tests): update pytest fixture for client based on #7565
K-Mistele Sep 4, 2024
d6728b2
fix: docs
K-Mistele Sep 4, 2024
a2ed57c
merge: main
K-Mistele Sep 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 55 additions & 6 deletions docs/source/serving/openai_compatible_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,14 +110,63 @@ directory [here](https://github.com/vllm-project/vllm/tree/main/examples/)
:func: create_parser_for_docs
:prog: vllm serve
```
## Tool Calling in the Chat Completion API
### Named Function Calling
vLLM supports only named function calling in the chat completion API by default. It does so using Outlines, so this is
enabled by default, and will work with any supported model. You are guaranteed a validly-parsable function call - not a
high-quality one.

## Tool calling in the chat completion API
vLLM supports only named function calling in the chat completion API. The `tool_choice` options `auto` and `required` are **not yet supported** but on the roadmap.
To use a named function, you need to define the functions in the `tools` parameter of the chat completion request, and
specify the `name` of one of the tools in the `tool_choice` parameter of the chat completion request.

To use a named function you need to define the function in the `tools` parameter and call it in the `tool_choice` parameter.

It is the callers responsibility to prompt the model with the tool information, vLLM will not automatically manipulate the prompt. **This may change in the future.**
It is the callers responsibility to prompt the model with the tool information, vLLM will not automatically manipulate the prompt.

vLLM will use guided decoding to ensure the response matches the tool parameter object defined by the JSON schema in the `tools` parameter.

Please refer to the OpenAI API reference documentation for more information.

### Automatic Function Calling
_This feature is in **beta**. It has limited model support, is not guaranteed to be stable, and does not have
well-defined failure modes._ As such, it must be explicitly enabled when desired.

To enable this feature, you must set the following flags:
* `--enable-api-tools` -- **mandatory** for Auto tool choice. tells vLLM that you want to enable tool templating and extraction.
* `--enable-auto-toolchoice` -- **mandatory** Auto tool choice. tells vLLM that you want to enable the model to generate its' own tool scalls when it
deems appropriate.
* `--chat-template` -- **optional** for auto tool choice. the path to the chat template which handles `tool`-role messages and `assistant`-role messages
that contain previously generated tool calls.This argument can be set to `tool_use` if your model has a tool use chat
template configured in the `tokenizer_config.json`. In this case, it will be used per the `transformers` specification. More on this [here](https://huggingface.co/docs/transformers/en/chat_templating#why-do-some-models-have-multiple-templates)
from HuggingFace; and you can find an example of this in a `tokenizer_config.json` [here]()
K-Mistele marked this conversation as resolved.
Show resolved Hide resolved
* `--tool-parser` -- select the tool parser to use - currently either `hermes` or `mistral`.
K-Mistele marked this conversation as resolved.
Show resolved Hide resolved

If your favorite tool-calling model is not supported, please feel free to contribute a parser & tool use chat template!

#### Hermes Models
Supported models in this series:
* `NousResearch/Hermes-2-Pro-Llama-3-8B`
* `NousResearch/Hermes-2-Theta-Llama-3-70B`
* `NousResearch/Hermes-2-Pro-Llama-3-70B`
* `NousResearch/Hermes-2-Theta-Llama-3-8B`
* `NousResearch/Hermes-2-Pro-Mistral-7B`

_Note that the Hermes 2 **Theta** models are known to have degraded tool call quality & capabilities due to the merge
step in their creation_. It is recommended to use the Hermes 2 **Pro** models.

Recommended flags: `--tool-parser hermes --chat-template examples/tool_chat_template_hermes.jinja`

#### Mistral Models
Supported models:
* `mistralai/Mistral-7B-Instruct-v0.3`

There are several known issues with tool-calling in Mistral models:
* Attempting to generate > 1 tool call at a time usually results in a parser failure, since the model generates the calls
in an unpredictable format due to the aforementioned chat template issue. **This can be mitigated by setting the
`temperature` to `0` in the OpenAI-style API call** - do this, and tool calls (including parallel ones) are **far** more
consistent
* Mistral function-calling / tool use generates calls with _single_ quotes `'` instead of double quotes `"`. As a
result, tool call generations can't be handled as JSON by the parser automatically without using `eval`, which would
present security issues for vLLM users. As a result, to support Mistral tool calls, we find-and-replace single-quotes
with double-quotes in mistral-generated tool calls. Therefore, **it is important to ensure that your tool call
arguments do not contain single quotes.** Escaped double quotes may be handled properly, but otherwise you should
expect parser issues.

Recommended flags: `--tool-parser mistral --chat-template examples/tool_chat_template_mistral.jinja`
143 changes: 143 additions & 0 deletions examples/openai_chat_completion_client_with_tools.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
from openai import OpenAI
K-Mistele marked this conversation as resolved.
Show resolved Hide resolved
import json

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
# defaults to os.environ.get("OPENAI_API_KEY")
api_key=openai_api_key,
base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

tools = [{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"city": {
"type":
"string",
"description":
"The city to find the weather for, e.g. 'San Francisco'"
},
"state": {
"type":
"string",
"description":
"the two-letter abbreviation for the state that the city is in, e.g. 'CA' which would mean 'California'"
},
"unit": {
"type": "string",
"description": "The unit to fetch the temperature in",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["city", "state", "unit"]
}
}
}]

messages = [{
"role": "user",
"content": "Hi! How are you doing today?"
}, {
"role": "assistant",
"content": "I'm doing well! How can I help you?"
}, {
"role":
"user",
"content":
"Can you tell me what the temperate will be in Dallas and San Francisco, in fahrenheit?"
}]

chat_completion = client.chat.completions.create(messages=messages,
model=model,
tools=tools)

print("Chat completion results:")
print(chat_completion)
print('\n\n')

tool_calls_stream = client.chat.completions.create(messages=messages,
model=model,
tools=tools,
stream=True)

chunks = []
for chunk in tool_calls_stream:
chunks.append(chunk)
if chunk.choices[0].delta.tool_calls:
print(chunk.choices[0].delta.tool_calls[0])
else:
print(chunk.choices[0].delta)

arguments = []
tool_call_idx = -1
for chunk in chunks:

if chunk.choices[0].delta.tool_calls:
if chunk.choices[0].delta.tool_calls[0].index != tool_call_idx:
if tool_call_idx >= 0:
print(
f'streamed tool call arguments: {arguments[tool_call_idx]}\n\n'
)
tool_call_idx = chunk.choices[0].delta.tool_calls[0].index
arguments.append('')
if chunk.choices[0].delta.tool_calls[0].id:
print(
f'streamed tool call id: {chunk.choices[0].delta.tool_calls[0].id}'
)
if chunk.choices[0].delta.tool_calls[0].function:
if chunk.choices[0].delta.tool_calls[0].function.name:
print(
f'streamed tool call name: {chunk.choices[0].delta.tool_calls[0].function.name}'
)
if chunk.choices[0].delta.tool_calls[0].function.arguments:
arguments[tool_call_idx] += chunk.choices[0].delta.tool_calls[
0].function.arguments

if len(arguments):
print(f'streamed tool call arguments: {arguments[-1]}')

print('\n\n')

messages.append({
"role": "assistant",
"tool_calls": chat_completion.choices[0].message.tool_calls
})


# Now, simulate a tool call
def get_current_weather(city: str, state: str, unit: 'str'):
return "The weather in Dallas, Texas is 85 degrees fahrenheit. It is partly cloudly, with highs in the 90's."


available_tools = {"get_current_weather": get_current_weather}

completion_tool_calls = chat_completion.choices[0].message.tool_calls
for call in completion_tool_calls:
tool_to_call = available_tools[call.function.name]
args = json.loads(call.function.arguments)
result = tool_to_call(**args)
print(result)
messages.append({
"role": "tool",
"content": result,
"tool_call_id": call.id,
"name": call.function.name
})

chat_completion_2 = client.chat.completions.create(messages=messages,
model=model,
tools=tools,
stream=False)
print('\n\n')
print(chat_completion_2)
123 changes: 123 additions & 0 deletions examples/tool_chat_template_hermes.jinja
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
{%- macro json_to_python_type(json_spec) %}
{%- set basic_type_map = {
"string": "str",
"number": "float",
"integer": "int",
"boolean": "bool"
} %}

{%- if basic_type_map[json_spec.type] is defined %}
{{- basic_type_map[json_spec.type] }}
{%- elif json_spec.type == "array" %}
{{- "list[" + json_to_python_type(json_spec|items) + "]" }}
{%- elif json_spec.type == "object" %}
{%- if json_spec.additionalProperties is defined %}
{{- "dict[str, " + json_to_python_type(json_spec.additionalProperties) + ']' }}
{%- else %}
{{- "dict" }}
{%- endif %}
{%- elif json_spec.type is iterable %}
{{- "Union[" }}
{%- for t in json_spec.type %}
{{- json_to_python_type({"type": t}) }}
{%- if not loop.last %}
{{- "," }}
{%- endif %}
{%- endfor %}
{{- "]" }}
{%- else %}
{{- "Any" }}
{%- endif %}
{%- endmacro %}


{{- bos_token }}
{%- if tools is iterable and tools | length > 0 %}
{{- "You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: <tools> " }}
{%- for tool in tools %}
{%- if tool.function is defined %}
{%- set tool = tool.function %}
{%- endif %}
{{- '{"type": "function", "function": ' }}
{{- '{"name": ' + tool.name + '", ' }}
{{- '"description": "' + tool.name + '(' }}
{%- for param_name, param_fields in tool.parameters.properties|items %}
{{- param_name + ": " + json_to_python_type(param_fields) }}
{%- if not loop.last %}
{{- ", " }}
{%- endif %}
{%- endfor %}
{{- ")" }}
{%- if tool.return is defined %}
{{- " -> " + json_to_python_type(tool.return) }}
{%- endif %}
{{- " - " + tool.description + "\n\n" }}
{%- for param_name, param_fields in tool.parameters.properties|items %}
{%- if loop.first %}
{{- " Args:\n" }}
{%- endif %}
{{- " " + param_name + "(" + json_to_python_type(param_fields) + "): " + param_fields.description|trim }}
{%- endfor %}
{%- if tool.return is defined and tool.return.description is defined %}
{{- "\n Returns:\n " + tool.return.description }}
{%- endif %}
{{- '"' }}
{{- ', "parameters": ' }}
{%- if tool.parameters.properties | length == 0 %}
{{- "{}" }}
{%- else %}
{{- tool.parameters|tojson }}
{%- endif %}
{{- "}" }}
{%- if not loop.last %}
{{- "\n" }}
{%- endif %}
{%- endfor %}
{{- " </tools>" }}
{{- 'Use the following pydantic model json schema for each tool call you will make: {"properties": {"arguments": {"title": "Arguments", "type": "object"}, "name": {"title": "Name", "type": "string"}}, "required": ["arguments", "name"], "title": "FunctionCall", "type": "object"}
' }}
{{- "For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
" }}
{{- "<tool_call>
" }}
{{- '{"name": <function-name>, "arguments": <args-dict>}
' }}
{{- '</tool_call><|im_end|>' }}
{%- endif %}
{%- for message in messages %}
{%- if message.role == "user" or message.role == "system" or (message.role == "assistant" and message.tool_calls is not defined) %}
{{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
{%- elif message.role == "assistant" %}
{{- '<|im_start|>' + message.role }}
{%- for tool_call in message.tool_calls %}
{{- '\n<tool_call>\n' }}
{%- if tool_call.function is defined %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '{ ' }}
{%- if tool_call.arguments is defined %}
{{- '"arguments": ' }}
{{- tool_call.arguments|tojson }}
{{- ', ' }}
{%- endif %}
{{- '"name": "' }}
{{- tool_call.name }}
{{- '"}' }}
{{- '\n</tool_call> ' }}
{%- endfor %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if not message.name is defined %}
{{- raise_exception("Tool response dicts require a 'name' key indicating the name of the called function!") }}
{%- endif %}
{{- '<|im_start|>' + message.role + '\n<tool_response>\n' }}
{{- '{"name": "' }}
{{- message.name }}
{{- '", "content": ' }}
{{- message.content|tojson + '}' }}
{{- '\n</tool_response> <|im_end|>\n' }}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- endif %}
1 change: 1 addition & 0 deletions examples/tool_chat_template_mistral.jinja
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{{ bos_token }}{% set user_messages = messages | selectattr('role', 'equalto', 'user') | list %}{% for message in messages %}{% if message['role'] == 'user' %}{% if message == user_messages[-1] %}{% if tools %}{{ '[AVAILABLE_TOOLS]'+ tools|string + '[/AVAILABLE_TOOLS]' }}{% endif %}{{ '[INST]' + message['content'] + '[/INST]' }}{% else %}{{ '[INST]' + message['content'] + '[/INST]' }}{% endif %}{% elif message['role'] == 'assistant' and message['tool_calls'] and message['tool_calls']|length > 0 %}{{ '[TOOL_CALLS]' + message['tool_calls']|string + eos_token }}{% elif message['role'] == 'assistant' %}{{ ' ' + message['content'] + ' ' + eos_token }}{% elif message['role'] == 'tool' %}{{ '[TOOL_RESULTS]' + message['content']|string + '[/TOOL_RESULTS]' }}{% endif %}{% endfor %}
K-Mistele marked this conversation as resolved.
Show resolved Hide resolved
1 change: 1 addition & 0 deletions requirements-common.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,5 @@ lm-format-enforcer == 0.10.3
outlines >= 0.0.43, < 0.1 # Requires torch >= 2.1.0
typing_extensions
filelock >= 3.10.4 # filelock starts to support `mode` argument from 3.10.4
partial-json-parser # used for parsing partial JSON outputs
pyzmq
Loading