You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# -rw-rw-r-- 1 mgoin mgoin 205M Jun 9 18:03 flashinfer_python-0.2.6.post1-cp39-abi3-linux_x86_64.whl
369
-
# $ # upload the wheel to a public location, e.g. https://wheels.vllm.ai/flashinfer/v0.2.6.post1/flashinfer_python-0.2.6.post1-cp39-abi3-linux_x86_64.whl
Copy file name to clipboardExpand all lines: docs/features/tool_calling.md
+9-2Lines changed: 9 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -145,7 +145,7 @@ Supported models:
145
145
Known issues:
146
146
147
147
1. Mistral 7B struggles to generate parallel tool calls correctly.
148
-
2. Mistral's `tokenizer_config.json` chat template requires tool call IDs that are exactly 9 digits, which is
148
+
2.**For Transformers tokenization backend only**: Mistral's `tokenizer_config.json` chat template requires tool call IDs that are exactly 9 digits, which is
149
149
much shorter than what vLLM generates. Since an exception is thrown when this condition
150
150
is not met, the following additional chat templates are provided:
151
151
@@ -154,7 +154,14 @@ Known issues:
154
154
*<gh-file:examples/tool_chat_template_mistral_parallel.jinja> - this is a "better" version that adds a tool-use system prompt
155
155
when tools are provided, that results in much better reliability when working with parallel tool calling.
Copy file name to clipboardExpand all lines: requirements/cuda.txt
+2Lines changed: 2 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -11,3 +11,5 @@ torchaudio==2.8.0
11
11
torchvision==0.23.0 # Required for phi3v processor. See https://github.com/pytorch/vision?tab=readme-ov-file#installation for corresponding version
0 commit comments