-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
Confirm this is a feature request for the Python library and not the underlying OpenAI API.
- This is a feature request for the Python library
Describe the feature or improvement you're requesting
Hi, I am working on supporting streaming in GPT-OSS in vLLM. With GPT-OSS, we can show the full reasoning text (as opposed to the summary version). However, I think there are some gaps in the OpenAI library for supporting streaming.
With GPT-5 API, the stream response would look like
ReasoningSummaryPartAdded
ReasoningSummaryTextDelta
ReasoningSummaryTextDelta
…
ReasoningSummaryTextDone
ReasoningSummaryPartDone
But in GPTOSS, we could get ResponseReasoningText. So it should look like
ResponseReasoningPartAdded # or should it be ResponseContentPartAdded?
ResponseReasoningTextDelta
ResponseReasoningTextDelta
…
ResponseReasoningTextDone
ResponseReasoningPartDone
The types ResponseReasoningPartAdded, ResponseReasoningPartDone don't currently exist.
ResponseContentPartDone currently does not support Part as a ResponseReasoningItem (https://github.com/openai/openai-python/blob/main/src/openai/types/responses/response_content_part_done_event.py#L13).
So I'm wondering, would the path forward be to support ResponseReasoningPartAdded, ResponseReasoningPartDone? (or amend ResponseContentPartDone?)
Thanks in advance!
Additional context
vllm-project/vllm#24938
https://community.openai.com/t/purpose-of-response-reasoning-text/1344924/3