-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Open
Labels
bugSomething isn't workingSomething isn't workinggpt-ossRelated to GPT-OSS modelsRelated to GPT-OSS models
Description
Your current environment
The output of python collect_env.py
==============================
Python Environment
==============================
Python version : 3.12.11 (main, Jun 4 2025, 08:56:18) [GCC 11.4.0] (64-bit runtime)
Python platform : Linux-3.10.0-1160.92.1.el7.x86_64-x86_64-with-glibc2.35
==============================
CUDA / GPU Info
==============================
Is CUDA available : True
CUDA runtime version : 12.8.93
CUDA_MODULE_LOADING set to : LAZY
GPU models and configuration :
GPU 0: NVIDIA L20
GPU 1: NVIDIA L20
GPU 2: NVIDIA L20
GPU 3: NVIDIA L20
GPU 4: NVIDIA L20
GPU 5: NVIDIA L20
GPU 6: NVIDIA L20
GPU 7: NVIDIA L20
Nvidia driver version : 550.90.07
cuDNN version : Could not collect
HIP runtime version : N/A
MIOpen runtime version : N/A
Is XNNPACK available : True
==============================
vLLM Info
==============================
ROCM Version : Could not collect
Neuron SDK Version : N/A
vLLM Version : 0.10.1.1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X PIX NODE NODE SYS SYS SYS SYS 0-31,64-95 0 N/A
GPU1 PIX X NODE NODE SYS SYS SYS SYS 0-31,64-95 0 N/A
GPU2 NODE NODE X PIX SYS SYS SYS SYS 0-31,64-95 0 N/A
GPU3 NODE NODE PIX X SYS SYS SYS SYS 0-31,64-95 0 N/A
GPU4 SYS SYS SYS SYS X PIX NODE NODE 32-63,96-127 1 N/A
GPU5 SYS SYS SYS SYS PIX X NODE NODE 32-63,96-127 1 N/A
GPU6 SYS SYS SYS SYS NODE NODE X PIX 32-63,96-127 1 N/A
GPU7 SYS SYS SYS SYS NODE NODE PIX X 32-63,96-127 1 N/A
🐛 Describe the bug
after serving gpt-oss-120b using vllm, I tried streaming function call examples according to openai cookbook streaming function call.
- if set
stream=Truein CLIENT.responses.create(...), the output event contains reasoning_text, but function tool call is not included. like this,
ResponseCreatedEvent(response=Response(id='resp_f76035824b624b3da83ce3cb6eefdf8f', created_at=1756783814.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='gpt-oss-120b', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'City and country e.g. Bogotá, Colombia'}}, 'required': ['location'], 'additionalProperties': False}, strict=None, type='function', description='Get current temperature for a given location.')], top_p=1.0, background=False, max_output_tokens=130933, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=0, truncation='disabled', usage=None, user=None), sequence_number=0, type='response.created')
ResponseInProgressEvent(response=Response(id='resp_f76035824b624b3da83ce3cb6eefdf8f', created_at=1756783814.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='gpt-oss-120b', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'City and country e.g. Bogotá, Colombia'}}, 'required': ['location'], 'additionalProperties': False}, strict=None, type='function', description='Get current temperature for a given location.')], top_p=1.0, background=False, max_output_tokens=130933, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=0, truncation='disabled', usage=None, user=None), sequence_number=1, type='response.in_progress')
ResponseOutputItemDoneEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=[Content(text='The user asks: "What\'s the weather like in Paris today?" Need to fetch weather via function get_weather with location "Paris, France". Use function.', type='reasoning_text')], encrypted_content=None, status='completed'), output_index=1, sequence_number=36, type='response.output_item.done')
- elif set
stream=Falsein CLIENT.responses.create(...), the model can output both reasoning_text (ResponseReasoningItem) and function tool call (ResponseFunctionToolCall). like this,
Response(id='resp_d51ea05de80048f39ce97fa56f88d9c4', created_at=1756783924.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='gpt-oss-120b', object='response', output=[ResponseReasoningItem(id='rs_88edd5e0ff8143068093e9eb2bd3fdf1', summary=[], type='reasoning', content=[Content(text='We need to get weather. Use function get_weather with location "Paris, France".', type='reasoning_text')], encrypted_content=None, status=None), ResponseFunctionToolCall(arguments='{\n "location": "Paris, France"\n}', call_id='call_bc1a92fab0b44fcf8874ec261e5b06f2', name='get_weather', type='function_call', id='ft_bc1a92fab0b44fcf8874ec261e5b06f2', status=None)], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'City and country e.g. Bogotá, Colombia'}}, 'required': ['location'], 'additionalProperties': False}, strict=None, type='function', description='Get current temperature for a given location.')], top_p=1.0, background=False, max_output_tokens=130933, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='completed', text=None, top_logprobs=0, truncation='disabled', usage=ResponseUsage(input_tokens=0, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=0, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=0), user=None)
here is the python code
from openai import OpenAI
client = OpenAI(
base_url='',
api_key=''
)
tools = [{
"type": "function",
"name": "get_weather",
"description": "Get current temperature for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country e.g. Bogotá, Colombia"
}
},
"required": [
"location"
],
"additionalProperties": False
}
}]
# when stream=True
stream = client.responses.create(
model="gpt-oss-120b",
input=[{"role": "user", "content": "What's the weather like in Paris today?"}],
tools=tools,
stream=True
)
for event in stream:
print(event)
# when stream=False
responses = CLIENT.responses.create(
model=MODEL_NAME,
input=[{"role": "user", "content": "What's the weather like in Paris today?"}],
tools=tools,
)
print(responses)
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinggpt-ossRelated to GPT-OSS modelsRelated to GPT-OSS models
Type
Projects
Status
In progress