[gpt-oss][1][bugfix] fix streaming final output #24466

qandrew · 2025-09-08T21:26:35Z

Purpose

currently, response.output in streaming doesn't output a final output because the logic for _messages in StreamingHarmonyContext and HarmonyContext is different. This PR changes StreamingHarmonyContext so it also respects num_init_messages.

per @chaunceyjiang's comments, I've also split up this PR into a couple follow ups:

[gpt-oss][1a] create_responses stream outputs BaseModel type, api server is SSE still #24759 changes the output of created_responses to AsyncGenerator[BaseModel] from AsyncGenerator[str] and only the API server outputs in SSE format
[gpt-oss][1b] streaming add item id, content id #24788: add item_id, content_index_id properly

Test Plan

Spun up server, sent curl requests to confirm the above features work accordingly
edited unit test

Test Result

Before

[axia@devvm30969.cln0 ~]$ curl http://localhost:20001/v1/responses   -H "Content-Type: application/json"   -N   -d '{
    "model": "/data/users/axia/checkpoints/gpt-oss-120b",
    "input": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Multiply 234.234 * 2342.123."
        }
    ],
    "temperature": 0.7,
    "reasoning": {
        "effort": "medium",
        "summary": null
    },
    "stream": true
}' | jq

{
  "response": {
    "id": "resp_d0aa5a27926f4e87aef046739cbe30c0",
    "created_at": 1757365874.0,
    "error": null,
    "incomplete_details": null,
    "instructions": null,
    "metadata": null,
    "model": "/data/users/axia/checkpoints/gpt-oss-120b",
    "object": "response",
    "output": [],
    "parallel_tool_calls": true,
    "temperature": 0.7,
    "tool_choice": "auto",
    "tools": [],
    "top_p": 1.0,
    "background": false,
    "conversation": null,
    "max_output_tokens": 130977,
    "max_tool_calls": null,
    "previous_response_id": null,
    "prompt": null,
    "prompt_cache_key": null,
    "reasoning": {
      "effort": "medium",
      "generate_summary": null,
      "summary": null
    },
    "safety_identifier": null,
    "service_tier": "auto",
    "status": "completed",
    "text": null,
    "top_logprobs": null,
    "truncation": "disabled",
    "usage": {
      "input_tokens": 95,
      "input_tokens_details": {
        "cached_tokens": 80
      },
      "output_tokens": 706,
      "output_tokens_details": {
        "reasoning_tokens": 668,
        "tool_output_tokens": 0
      },
      "total_tokens": 801
    },
    "user": null
  },
  "sequence_number": 706,
  "type": "response.completed"
}

^ note in this final response, output is an empty array. This is not what we want.

After:

{
  "id": "resp_c538e484fa114961b45cec59d0244673",
  "created_at": 1757437205,
  "instructions": null,
  "metadata": null,
  "model": "/data/users/axia/checkpoints/gpt-oss-120b",
  "object": "response",
  "output": [
    {
      "id": "rs_232ea61291404b888e705fd0c5d13f5b",
      "summary": [],
      "type": "reasoning",
      "content": [
        {
          "text": "User asks: \"Write a short sentence about a robot learning to dance.\" Straightforward. Provide a short sentence.",
          "type": "reasoning_text"
        }
      ],
      "encrypted_content": null,
      "status": null
    },
    {
      "id": "msg_1c136508c9844941be70926b34681538",
      "content": [
        {
          "annotations": [],
          "text": "The robot twirled its metal limbs, discovering rhythm with every clank as it learned to dance.",
          "type": "output_text",
          "logprobs": null
        }
      ],
      "role": "assistant",
      "status": "completed",
      "type": "message"
    }
  ],
  "parallel_tool_calls": true,
  "temperature": 0.7,
  "tool_choice": "auto",
  "tools": [],
  "top_p": 1.0,
  "background": false,
  "max_output_tokens": 256,
  "max_tool_calls": null,
  "previous_response_id": null,
  "prompt": null,
  "reasoning": {
    "effort": "medium",
    "generate_summary": null,
    "summary": null
  },
  "service_tier": "auto",
  "status": "completed",
  "text": null,
  "top_logprobs": null,
  "truncation": "disabled",
  "usage": {
    "input_tokens": 88,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 54,
    "output_tokens_details": {
      "reasoning_tokens": 24,
      "tool_output_tokens": 0
    },
    "total_tokens": 142
  },
  "user": null
}

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

qandrew · 2025-09-10T16:38:12Z

@aarnphm @DarkLight1337 @robertgshaw2-redhat @simon-mo this PR is ready for review :)

heheda12345 · 2025-09-10T18:14:11Z

Also CC @yeqcharlotte

lacora

LG! saw you define StreamingResponsesResponse in later PR, do we plan on update BaseModel in this diff as well

qandrew · 2025-09-10T20:11:44Z

LG! saw you define StreamingResponsesResponse in later PR, do we plan on update BaseModel in this diff as well

I have a follow up PR here: #24556.
I wanted to stack the PRs but it seems like I can't do it with forked repos, this is the relevant commit to replace BaseModel with StreamingResponses types: 7c642b0

i thought it would be easier for review to split them into 2 PRs but could combine them too :)

vllm/entrypoints/openai/serving_responses.py

chaunceyjiang · 2025-09-12T10:09:55Z

vllm/entrypoints/openai/api_server.py

This PR addresses three different changes. I recommend splitting it into multiple separate PRs. This PR should focus only on fixing the issue where the output of the last event is empty.

This reverts commit c87ca3325edbd5e80800df6e4151cee6a9c8c923. Signed-off-by: Andrew Xia <axia@meta.com>

Signed-off-by: Andrew Xia <axia@meta.com>

yeqcharlotte · 2025-09-13T22:31:09Z

vllm/entrypoints/context.py

            # Check if the current token is part of reasoning content
            self._update_num_reasoning_tokens()
            self.last_tok = tok
+            if len(self._messages) - self.num_init_messages < len(


let's also add a unit test covering this behavior. the test can be constructed similar to https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/test_context.py#L313

ty for the suggestion, just added

ready for re-review @chaunceyjiang

Signed-off-by: Andrew Xia <axia@meta.com>

chaunceyjiang

Thanks~

Signed-off-by: Andrew Xia <axia@meta.com>

Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: charlifu <charlifu@amd.com>

Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Andrew Xia <axia@meta.com>

Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Sep 8, 2025

qandrew force-pushed the andrew/gpt-oss-streaming-1 branch from 0fd7f79 to 016bafd Compare September 9, 2025 16:23

qandrew marked this pull request as ready for review September 9, 2025 17:48

qandrew requested a review from aarnphm as a code owner September 9, 2025 17:48

qandrew force-pushed the andrew/gpt-oss-streaming-1 branch from 02e414b to 0c31b0b Compare September 9, 2025 22:34

qandrew requested review from DarkLight1337, robertgshaw2-redhat and simon-mo as code owners September 9, 2025 23:14

qandrew mentioned this pull request Sep 9, 2025

[Feature]: GPT-OSS harmony format support #23217

Open

1 task

qandrew changed the title ~~[gpt-oss] streaming uses request_id, fix return format and final output~~ [gpt-oss][1] streaming uses request_id, fix return format and final output Sep 10, 2025

lacora reviewed Sep 10, 2025

View reviewed changes

vllm/entrypoints/openai/serving_responses.py Outdated Show resolved Hide resolved

qandrew mentioned this pull request Sep 11, 2025

[gpt-oss][2] fix types for streaming #24556

Merged

5 tasks

chaunceyjiang reviewed Sep 12, 2025

View reviewed changes

qandrew force-pushed the andrew/gpt-oss-streaming-1 branch from 82fd949 to c1098b4 Compare September 12, 2025 16:37

qandrew requested a review from NickLucche as a code owner September 12, 2025 16:37

qandrew force-pushed the andrew/gpt-oss-streaming-1 branch 2 times, most recently from 7ec9669 to 309699c Compare September 12, 2025 16:40

qandrew changed the title ~~[gpt-oss][1] streaming uses request_id, fix return format and final output~~ [gpt-oss][1] fix streaming final output Sep 12, 2025

qandrew force-pushed the andrew/gpt-oss-streaming-1 branch from 309699c to f4e284d Compare September 12, 2025 16:54

qandrew changed the title ~~[gpt-oss][1] fix streaming final output~~ [gpt-oss][1][bugfix] fix streaming final output Sep 12, 2025

qandrew force-pushed the andrew/gpt-oss-streaming-1 branch from f4e284d to 9a71956 Compare September 12, 2025 17:27

qandrew requested a review from chaunceyjiang September 12, 2025 17:53

qandrew mentioned this pull request Sep 12, 2025

[gpt-oss] Add IncompleteDetails to ResponsesRepsonse #24561

Merged

5 tasks

qandrew added 2 commits September 13, 2025 10:32

code for fixing final output in stream

688f229

This reverts commit c87ca3325edbd5e80800df6e4151cee6a9c8c923. Signed-off-by: Andrew Xia <axia@meta.com>

comments

9b19217

Signed-off-by: Andrew Xia <axia@meta.com>

qandrew force-pushed the andrew/gpt-oss-streaming-1 branch from 9a71956 to 9b19217 Compare September 13, 2025 17:32

yeqcharlotte reviewed Sep 13, 2025

View reviewed changes

yeqcharlotte added this to gpt-oss Issues & Enhancements Sep 14, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Sep 14, 2025

yeqcharlotte moved this from To Triage to In progress in gpt-oss Issues & Enhancements Sep 14, 2025

yeqcharlotte mentioned this pull request Sep 14, 2025

[Bug]: While serving GPT-OSS, Streaming function calls output only reasoning_text, without function tool call #24076

Open

1 task

unit test

92f19c2

Signed-off-by: Andrew Xia <axia@meta.com>

qandrew requested a review from yeqcharlotte September 15, 2025 17:45

chaunceyjiang approved these changes Sep 16, 2025

View reviewed changes

github-project-automation bot moved this from In progress to Ready in gpt-oss Issues & Enhancements Sep 16, 2025

chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 16, 2025

Merge branch 'main' into andrew/gpt-oss-streaming-1

3560e49

mgoin approved these changes Sep 16, 2025

View reviewed changes

mgoin merged commit 86daa87 into vllm-project:main Sep 16, 2025
42 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Sep 16, 2025

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[gpt-oss][1][bugfix] fix streaming final output (vllm-project#24466)

fdd42f4

Signed-off-by: Andrew Xia <axia@meta.com>

charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025

[gpt-oss][1][bugfix] fix streaming final output (vllm-project#24466)

d69d14a

Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: charlifu <charlifu@amd.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[gpt-oss][1][bugfix] fix streaming final output (vllm-project#24466)

69b4801

Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[gpt-oss][1][bugfix] fix streaming final output (vllm-project#24466)

85ed315

Signed-off-by: Andrew Xia <axia@meta.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[gpt-oss][1][bugfix] fix streaming final output (vllm-project#24466)

676a9d5

Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[gpt-oss][1][bugfix] fix streaming final output #24466

[gpt-oss][1][bugfix] fix streaming final output #24466

qandrew commented Sep 8, 2025 •

edited by github-actions bot

Loading

Uh oh!

qandrew commented Sep 10, 2025

Uh oh!

heheda12345 commented Sep 10, 2025

Uh oh!

lacora left a comment •

edited

Loading

Uh oh!

qandrew commented Sep 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

chaunceyjiang Sep 12, 2025

Uh oh!

yeqcharlotte Sep 13, 2025

Uh oh!

chaunceyjiang Sep 15, 2025

Uh oh!

qandrew Sep 15, 2025

Uh oh!

qandrew Sep 15, 2025

Uh oh!

chaunceyjiang left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Uh oh!

[gpt-oss][1][bugfix] fix streaming final output #24466

[gpt-oss][1][bugfix] fix streaming final output #24466

Conversation

qandrew commented Sep 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

qandrew commented Sep 10, 2025

Uh oh!

heheda12345 commented Sep 10, 2025

Uh oh!

lacora left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qandrew commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

chaunceyjiang Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

yeqcharlotte Sep 13, 2025

Choose a reason for hiding this comment

Uh oh!

chaunceyjiang Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

qandrew Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

qandrew Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

chaunceyjiang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

qandrew commented Sep 8, 2025 •

edited by github-actions bot

Loading

lacora left a comment •

edited

Loading

qandrew commented Sep 10, 2025 •

edited

Loading