Skip to content

Conversation

@qandrew
Copy link
Contributor

@qandrew qandrew commented Sep 8, 2025

Purpose

  • currently, response.output in streaming doesn't output a final output because the logic for _messages in StreamingHarmonyContext and HarmonyContext is different. This PR changes StreamingHarmonyContext so it also respects num_init_messages.

per @chaunceyjiang's comments, I've also split up this PR into a couple follow ups:

Test Plan

  • Spun up server, sent curl requests to confirm the above features work accordingly
  • edited unit test

Test Result

Before

[axia@devvm30969.cln0 ~]$ curl http://localhost:20001/v1/responses   -H "Content-Type: application/json"   -N   -d '{
    "model": "/data/users/axia/checkpoints/gpt-oss-120b",
    "input": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Multiply 234.234 * 2342.123."
        }
    ],
    "temperature": 0.7,
    "reasoning": {
        "effort": "medium",
        "summary": null
    },
    "stream": true
}' | jq

{
  "response": {
    "id": "resp_d0aa5a27926f4e87aef046739cbe30c0",
    "created_at": 1757365874.0,
    "error": null,
    "incomplete_details": null,
    "instructions": null,
    "metadata": null,
    "model": "/data/users/axia/checkpoints/gpt-oss-120b",
    "object": "response",
    "output": [],
    "parallel_tool_calls": true,
    "temperature": 0.7,
    "tool_choice": "auto",
    "tools": [],
    "top_p": 1.0,
    "background": false,
    "conversation": null,
    "max_output_tokens": 130977,
    "max_tool_calls": null,
    "previous_response_id": null,
    "prompt": null,
    "prompt_cache_key": null,
    "reasoning": {
      "effort": "medium",
      "generate_summary": null,
      "summary": null
    },
    "safety_identifier": null,
    "service_tier": "auto",
    "status": "completed",
    "text": null,
    "top_logprobs": null,
    "truncation": "disabled",
    "usage": {
      "input_tokens": 95,
      "input_tokens_details": {
        "cached_tokens": 80
      },
      "output_tokens": 706,
      "output_tokens_details": {
        "reasoning_tokens": 668,
        "tool_output_tokens": 0
      },
      "total_tokens": 801
    },
    "user": null
  },
  "sequence_number": 706,
  "type": "response.completed"
}

^ note in this final response, output is an empty array. This is not what we want.

After:

{
  "id": "resp_c538e484fa114961b45cec59d0244673",
  "created_at": 1757437205,
  "instructions": null,
  "metadata": null,
  "model": "/data/users/axia/checkpoints/gpt-oss-120b",
  "object": "response",
  "output": [
    {
      "id": "rs_232ea61291404b888e705fd0c5d13f5b",
      "summary": [],
      "type": "reasoning",
      "content": [
        {
          "text": "User asks: \"Write a short sentence about a robot learning to dance.\" Straightforward. Provide a short sentence.",
          "type": "reasoning_text"
        }
      ],
      "encrypted_content": null,
      "status": null
    },
    {
      "id": "msg_1c136508c9844941be70926b34681538",
      "content": [
        {
          "annotations": [],
          "text": "The robot twirled its metal limbs, discovering rhythm with every clank as it learned to dance.",
          "type": "output_text",
          "logprobs": null
        }
      ],
      "role": "assistant",
      "status": "completed",
      "type": "message"
    }
  ],
  "parallel_tool_calls": true,
  "temperature": 0.7,
  "tool_choice": "auto",
  "tools": [],
  "top_p": 1.0,
  "background": false,
  "max_output_tokens": 256,
  "max_tool_calls": null,
  "previous_response_id": null,
  "prompt": null,
  "reasoning": {
    "effort": "medium",
    "generate_summary": null,
    "summary": null
  },
  "service_tier": "auto",
  "status": "completed",
  "text": null,
  "top_logprobs": null,
  "truncation": "disabled",
  "usage": {
    "input_tokens": 88,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 54,
    "output_tokens_details": {
      "reasoning_tokens": 24,
      "tool_output_tokens": 0
    },
    "total_tokens": 142
  },
  "user": null
}

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added frontend gpt-oss Related to GPT-OSS models labels Sep 8, 2025
@qandrew qandrew force-pushed the andrew/gpt-oss-streaming-1 branch from 0fd7f79 to 016bafd Compare September 9, 2025 16:23
@qandrew qandrew marked this pull request as ready for review September 9, 2025 17:48
@qandrew qandrew requested a review from aarnphm as a code owner September 9, 2025 17:48
@qandrew qandrew force-pushed the andrew/gpt-oss-streaming-1 branch from 02e414b to 0c31b0b Compare September 9, 2025 22:34
@qandrew qandrew changed the title [gpt-oss] streaming uses request_id, fix return format and final output [gpt-oss][1] streaming uses request_id, fix return format and final output Sep 10, 2025
@qandrew
Copy link
Contributor Author

qandrew commented Sep 10, 2025

@aarnphm @DarkLight1337 @robertgshaw2-redhat @simon-mo this PR is ready for review :)

@heheda12345
Copy link
Collaborator

Also CC @yeqcharlotte

Copy link
Contributor

@lacora lacora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG! saw you define StreamingResponsesResponse in later PR, do we plan on update BaseModel in this diff as well

@qandrew
Copy link
Contributor Author

qandrew commented Sep 10, 2025

LG! saw you define StreamingResponsesResponse in later PR, do we plan on update BaseModel in this diff as well

I have a follow up PR here: #24556.
I wanted to stack the PRs but it seems like I can't do it with forked repos, this is the relevant commit to replace BaseModel with StreamingResponses types: 7c642b0

i thought it would be easier for review to split them into 2 PRs but could combine them too :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR addresses three different changes. I recommend splitting it into multiple separate PRs. This PR should focus only on fixing the issue where the output of the last event is empty.

@qandrew qandrew force-pushed the andrew/gpt-oss-streaming-1 branch from 82fd949 to c1098b4 Compare September 12, 2025 16:37
@qandrew qandrew requested a review from NickLucche as a code owner September 12, 2025 16:37
@qandrew qandrew force-pushed the andrew/gpt-oss-streaming-1 branch 2 times, most recently from 7ec9669 to 309699c Compare September 12, 2025 16:40
@qandrew qandrew changed the title [gpt-oss][1] streaming uses request_id, fix return format and final output [gpt-oss][1] fix streaming final output Sep 12, 2025
@qandrew qandrew force-pushed the andrew/gpt-oss-streaming-1 branch from 309699c to f4e284d Compare September 12, 2025 16:54
@qandrew qandrew changed the title [gpt-oss][1] fix streaming final output [gpt-oss][1][bugfix] fix streaming final output Sep 12, 2025
@qandrew qandrew force-pushed the andrew/gpt-oss-streaming-1 branch from f4e284d to 9a71956 Compare September 12, 2025 17:27
This reverts commit c87ca3325edbd5e80800df6e4151cee6a9c8c923.

Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@meta.com>
@qandrew qandrew force-pushed the andrew/gpt-oss-streaming-1 branch from 9a71956 to 9b19217 Compare September 13, 2025 17:32
# Check if the current token is part of reasoning content
self._update_num_reasoning_tokens()
self.last_tok = tok
if len(self._messages) - self.num_init_messages < len(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also add a unit test covering this behavior. the test can be constructed similar to https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/test_context.py#L313

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty for the suggestion, just added

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ready for re-review @chaunceyjiang

Signed-off-by: Andrew Xia <axia@meta.com>
Copy link
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks~

@github-project-automation github-project-automation bot moved this from In progress to Ready in gpt-oss Issues & Enhancements Sep 16, 2025
@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 16, 2025
@mgoin mgoin merged commit 86daa87 into vllm-project:main Sep 16, 2025
42 checks passed
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: charlifu <charlifu@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants