-
Notifications
You must be signed in to change notification settings - Fork 225
Open
Labels
Description
Repro steps:
- Install AI Toolkit, download qwen2.5 7b cpu model
- Run the testing scripts
response.choices[0].delta.tool_callsis always empty. Butcontentcontains xml tool call info.
Expected:
By OpenAI documentation, the chat completion API tool call response should be in response.choices[0].delta.tool_calls. Documentation.
Testing script
import os
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:5272/v1",
api_key="unused",
)
# Define the function schema
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g., San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use"
}
},
"required": ["location"]
}
}
}
]
# Make the initial streaming request
stream = client.chat.completions.create(
model='qwen2.5-7b-instruct-generic-cpu:4',
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the weather like in San Francisco?"}
],
tools=tools,
tool_choice="auto",
stream=True
)
print("Streaming Response:")
for chunk in stream:
print(chunk.choices[0].model_dump_json())
Version:
Foundry Local in AI Toolkit: 0.26.4
0.7.122
1.23.0-dev-20250918-0932-9c85c395b0
0.9.1-rc1
0.10.0
0.8.4