[Model] Support SeedOss Reason Parser #24263

LuYanFCP · 2025-09-04T17:02:18Z

Purpose

Added native reason parser support for SeedOss model in vllm.
Refactored and added BaseThinkingReasoningParser, which abstracts and merges the common implementation of Qwen3/DeepseekR1/SeedOss, providing a way to quickly implement ReasonParser by inheriting BaseThinkingReasoningParser and adding start_token/end_token variables

Test Plan

add unittest for SeedOss and BaseThinkingReasoningParser
unittest pass for origin Qwen3/DeepseekR1 parser test.

Test Result

All Pass

root@d240c896f494:/workspaces/vllm-backup# pytest  tests/reasoning/test_qwen3_reasoning_parser.py tests/reasoning/test_deepseekr1_reasoning_parser.py tests/reasoning/test_base_thinking_reasoning_parser.py 
/usr/local/lib/python3.12/dist-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
======================================================================================== test session starts ========================================================================================
platform linux -- Python 3.12.11, pytest-8.3.5, pluggy-1.5.0
rootdir: /workspaces/vllm-backup
configfile: pyproject.toml
plugins: hypothesis-6.131.0, rerunfailures-14.0, asyncio-0.24.0, schemathesis-3.39.15, shard-0.1.2, mock-3.14.0, hydra-core-1.3.2, forked-1.6.0, timeout-2.3.1, subtests-0.14.1, buildkite-test-collector-0.1.9, anyio-4.6.2.post1
asyncio: mode=Mode.STRICT, default_loop_scope=None
collected 55 items                                                                                                                                                                                  
Running 55 items in this shard

tests/reasoning/test_qwen3_reasoning_parser.py ..........                                                                                                                                     [ 18%]
tests/reasoning/test_deepseekr1_reasoning_parser.py ........................                                                                                                                  [ 61%]
tests/reasoning/test_base_thinking_reasoning_parser.py .....................                                                                                                                  [100%]

========================================================================================= warnings summary ==========================================================================================
../../usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305
  /usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

tests/reasoning/test_base_thinking_reasoning_parser.py:13
  /workspaces/vllm-backup/tests/reasoning/test_base_thinking_reasoning_parser.py:13: PytestCollectionWarning: cannot collect test class 'TestThinkingReasoningParser' because it has a __init__ constructor (from: tests/reasoning/test_base_thinking_reasoning_parser.py)
    class TestThinkingReasoningParser(BaseThinkingReasoningParser):

tests/reasoning/test_base_thinking_reasoning_parser.py:19
  /workspaces/vllm-backup/tests/reasoning/test_base_thinking_reasoning_parser.py:19: PytestCollectionWarning: cannot collect test class 'TestThinkingReasoningParserAlt' because it has a __init__ constructor (from: tests/reasoning/test_base_thinking_reasoning_parser.py)
    class TestThinkingReasoningParserAlt(BaseThinkingReasoningParser):

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================== 55 passed, 3 warnings in 14.68s ==================================================================================
root@d240c896f494:/workspaces/vllm-backup#

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2025-09-04T17:02:30Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request introduces a BaseThinkingReasoningParser to abstract common logic for parsing reasoning content, which is a great refactoring. It also adds support for the SeedOss model.

The refactoring simplifies the DeepSeekR1ReasoningParser and Qwen3ReasoningParser by having them inherit from the new base class. However, I've found a critical issue in the implementation of BaseThinkingReasoningParser's streaming logic that would break existing functionality for deepseek_r1 and the new seed_oss parser. Please see my detailed comment.

After fixing the base class, the streaming behavior for qwen3 might change and become inconsistent with its non-streaming behavior. You may want to consider overriding extract_reasoning_content_streaming in Qwen3ReasoningParser to maintain its specific logic (treating everything as content if no start token is present).

vllm/reasoning/basic_parsers.py

Signed-off-by: Yan Lu <luyan@nvidia.com>

WojtekMatula · 2025-09-06T06:57:02Z

git clone git@github.com:LuYanFCP/vllm.git
cd vllm
git checkout feat/seed_oss_parse_support
VLLM_USE_PRECOMPILED=1 uv pip install --editable .

VLLM_WORKER_MULTIPROC_METHOD=spawn \
VLLM_LOGGING_LEVEL=DEBUG
vllm serve Intel/Seed-OSS-36B-Instruct-int4-AutoRound
--enable-auto-tool-choice
--tool-call-parser seed_oss
--trust-remote-code
--tensor-parallel-size 2
--dtype bfloat16
--max_model_len 68000
--port 1234
--served-model-name seed-oss
--gpu-memory-utilization 0.85
--reasoning-parser seed_oss

(APIServer pid=20370) INFO: Started server process [20370]
(APIServer pid=20370) INFO: Waiting for application startup.
(APIServer pid=20370) INFO: Application startup complete.
(APIServer pid=20370) INFO 09-06 08:54:41 [chat_utils.py:507] Detected the chat template content format to be 'string'. You can set --chat-template-content-format to override this.
(APIServer pid=20370) INFO 09-06 08:54:41 [seed_oss_tool_parser.py:79] vLLM Seed-Oss XML tool parser loaded (SeedOssToolParser).
(APIServer pid=20370) INFO: 127.0.0.1:45372 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=20370) INFO 09-06 08:54:41 [seed_oss_tool_parser.py:79] vLLM Seed-Oss XML tool parser loaded (SeedOssToolParser).
(EngineCore_0 pid=20507) DEBUG 09-06 08:54:41 [core.py:753] EngineCore loop active.
(APIServer pid=20370) DEBUG 09-06 08:54:49 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 11.9%, Prefix cache hit rate: 0.0%
(APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] Error in chat completion stream generator.
(APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] Traceback (most recent call last):
(APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] File "/home/wojtek/Applications/vllm/vllm/entrypoints/openai/serving_chat.py", line 845, in chat_completion_stream_generator
(APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] extract_reasoning_content_streaming(
(APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] File "/home/wojtek/Applications/vllm/vllm/reasoning/abs_reasoning_parsers.py", line 206, in extract_reasoning_content_streaming
(APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] return DeltaMessage(reasoning_content=delta_text)
(APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] File "/home/wojtek/miniconda3/lib/python3.12/typing.py", line 532, in new
(APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] raise TypeError("Any cannot be instantiated")
(APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] TypeError: Any cannot be instantiated
(EngineCore_0 pid=20507) DEBUG 09-06 08:54:49 [core.py:747] EngineCore waiting for work.

LuYanFCP · 2025-09-06T13:48:49Z

@WojtekMatula This Issue have been resolved. You can try it in latest commit.

git clone git@github.com:LuYanFCP/vllm.git cd vllm git checkout feat/seed_oss_parse_support VLLM_USE_PRECOMPILED=1 uv pip install --editable .

VLLM_WORKER_MULTIPROC_METHOD=spawn \ VLLM_LOGGING_LEVEL=DEBUG vllm serve Intel/Seed-OSS-36B-Instruct-int4-AutoRound --enable-auto-tool-choice --tool-call-parser seed_oss --trust-remote-code --tensor-parallel-size 2 --dtype bfloat16 --max_model_len 68000 --port 1234 --served-model-name seed-oss --gpu-memory-utilization 0.85 --reasoning-parser seed_oss

(APIServer pid=20370) INFO: Started server process [20370] (APIServer pid=20370) INFO: Waiting for application startup. (APIServer pid=20370) INFO: Application startup complete. (APIServer pid=20370) INFO 09-06 08:54:41 [chat_utils.py:507] Detected the chat template content format to be 'string'. You can set --chat-template-content-format to override this. (APIServer pid=20370) INFO 09-06 08:54:41 [seed_oss_tool_parser.py:79] vLLM Seed-Oss XML tool parser loaded (SeedOssToolParser). (APIServer pid=20370) INFO: 127.0.0.1:45372 - "POST /v1/chat/completions HTTP/1.1" 200 OK (APIServer pid=20370) INFO 09-06 08:54:41 [seed_oss_tool_parser.py:79] vLLM Seed-Oss XML tool parser loaded (SeedOssToolParser). (EngineCore_0 pid=20507) DEBUG 09-06 08:54:41 [core.py:753] EngineCore loop active. (APIServer pid=20370) DEBUG 09-06 08:54:49 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 11.9%, Prefix cache hit rate: 0.0% (APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] Error in chat completion stream generator. (APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] Traceback (most recent call last): (APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] File "/home/wojtek/Applications/vllm/vllm/entrypoints/openai/serving_chat.py", line 845, in chat_completion_stream_generator (APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] extract_reasoning_content_streaming( (APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] File "/home/wojtek/Applications/vllm/vllm/reasoning/abs_reasoning_parsers.py", line 206, in extract_reasoning_content_streaming (APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] return DeltaMessage(reasoning_content=delta_text) (APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] File "/home/wojtek/miniconda3/lib/python3.12/typing.py", line 532, in new (APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] raise TypeError("Any cannot be instantiated") (APIServer pid=20370) ERROR 09-06 08:54:49 [serving_chat.py:1136] TypeError: Any cannot be instantiated (EngineCore_0 pid=20507) DEBUG 09-06 08:54:49 [core.py:747] EngineCore waiting for work.

…BaseThinkingReasoningParser base implementation. Signed-off-by: Yan Lu <luyan@nvidia.com>

Signed-off-by: Yan Lu <luyan@nvidia.com>

WojtekMatula · 2025-09-06T17:57:35Z

Looks like there is problem with tool use with reasoning parsing enabled.

Without reasoning parsing:
VLLM_WORKER_MULTIPROC_METHOD=spawn \
VLLM_LOGGING_LEVEL=DEBUG
vllm serve Intel/Seed-OSS-36B-Instruct-int4-AutoRound
--enable-auto-tool-choice
--tool-call-parser seed_oss
--trust-remote-code
--tensor-parallel-size 2
--dtype bfloat16
--max_model_len 68000
--port 1234
--served-model-name seed-oss
--gpu-memory-utilization 0.85
Started server process [74664]
(APIServer pid=74664) INFO: Waiting for application startup.
(APIServer pid=74664) INFO: Application startup complete.
(APIServer pid=74664) DEBUG 09-06 19:37:00 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=74664) INFO 09-06 19:37:02 [chat_utils.py:507] Detected the chat template content format to be 'string'. You can set --chat-template-content-format to override this.
(APIServer pid=74664) INFO 09-06 19:37:02 [seed_oss_tool_parser.py:79] vLLM Seed-Oss XML tool parser loaded (SeedOssToolParser).
(EngineCore_0 pid=74742) DEBUG 09-06 19:37:02 [core.py:753] EngineCore loop active.
(APIServer pid=74664) INFO: 127.0.0.1:35946 - "POST /v1/chat/completions HTTP/1.1" 200 OK

With reasoning parsing:

vllm feat/seed_oss_parse_support ❯❯❯ VLLM_WORKER_MULTIPROC_METHOD=spawn \
VLLM_LOGGING_LEVEL=DEBUG
vllm serve Intel/Seed-OSS-36B-Instruct-int4-AutoRound
--enable-auto-tool-choice
--tool-call-parser seed_oss
--trust-remote-code
--tensor-parallel-size 2
--dtype bfloat16
--max_model_len 68000
--port 1234
--served-model-name seed-oss
--gpu-memory-utilization 0.85
--reasoning-parser seed_oss
Started server process [90350]
(APIServer pid=90350) INFO: Waiting for application startup.
(APIServer pid=90350) INFO: Application startup complete.
(APIServer pid=90350) DEBUG 09-06 19:54:18 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=90350) DEBUG 09-06 19:54:28 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=90350) INFO 09-06 19:54:35 [chat_utils.py:507] Detected the chat template content format to be 'string'. You can set --chat-template-content-format to override this.
(APIServer pid=90350) INFO 09-06 19:54:35 [seed_oss_tool_parser.py:79] vLLM Seed-Oss XML tool parser loaded (SeedOssToolParser).
(EngineCore_0 pid=90427) DEBUG 09-06 19:54:35 [core.py:753] EngineCore loop active.
(APIServer pid=90350) INFO: 127.0.0.1:35900 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Btw this model is amazing, great work.

LuYanFCP · 2025-09-07T03:25:55Z

Can you give me some example prompts? I suspect it's a problem with toolparse, and I will solve it in another PR

Looks like there is problem with tool use with reasoning parsing enabled.

Without reasoning parsing: VLLM_WORKER_MULTIPROC_METHOD=spawn \ VLLM_LOGGING_LEVEL=DEBUG vllm serve Intel/Seed-OSS-36B-Instruct-int4-AutoRound --enable-auto-tool-choice --tool-call-parser seed_oss --trust-remote-code --tensor-parallel-size 2 --dtype bfloat16 --max_model_len 68000 --port 1234 --served-model-name seed-oss --gpu-memory-utilization 0.85 Started server process [74664] (APIServer pid=74664) INFO: Waiting for application startup. (APIServer pid=74664) INFO: Application startup complete. (APIServer pid=74664) DEBUG 09-06 19:37:00 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0% (APIServer pid=74664) INFO 09-06 19:37:02 [chat_utils.py:507] Detected the chat template content format to be 'string'. You can set --chat-template-content-format to override this. (APIServer pid=74664) INFO 09-06 19:37:02 [seed_oss_tool_parser.py:79] vLLM Seed-Oss XML tool parser loaded (SeedOssToolParser). (EngineCore_0 pid=74742) DEBUG 09-06 19:37:02 [core.py:753] EngineCore loop active. (APIServer pid=74664) INFO: 127.0.0.1:35946 - "POST /v1/chat/completions HTTP/1.1" 200 OK
**With reasoning parsing:** vllm feat/seed_oss_parse_support ❯❯❯ VLLM_WORKER_MULTIPROC_METHOD=spawn \ VLLM_LOGGING_LEVEL=DEBUG vllm serve Intel/Seed-OSS-36B-Instruct-int4-AutoRound --enable-auto-tool-choice --tool-call-parser seed_oss --trust-remote-code --tensor-parallel-size 2 --dtype bfloat16 --max_model_len 68000 --port 1234 --served-model-name seed-oss --gpu-memory-utilization 0.85 --reasoning-parser seed_oss Started server process [90350] (APIServer pid=90350) INFO: Waiting for application startup. (APIServer pid=90350) INFO: Application startup complete. (APIServer pid=90350) DEBUG 09-06 19:54:18 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0% (APIServer pid=90350) DEBUG 09-06 19:54:28 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0% (APIServer pid=90350) INFO 09-06 19:54:35 [chat_utils.py:507] Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this. (APIServer pid=90350) INFO 09-06 19:54:35 [seed_oss_tool_parser.py:79] vLLM Seed-Oss XML tool parser loaded (SeedOssToolParser). (EngineCore_0 pid=90427) DEBUG 09-06 19:54:35 [core.py:753] EngineCore loop active. (APIServer pid=90350) INFO: 127.0.0.1:35900 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Btw this model is amazing, great work.

WojtekMatula · 2025-09-07T09:43:33Z

curl -X POST http://localhost:1234/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "seed-oss", "max_tokens": 32000, "messages": [ { "role": "system", "content": "You are helpful assistant. Use tools to assist user. Answer concisely (<4 lines)." }, { "role": "user", "content": "execute ls -la" } ], "tools": [ { "type": "function", "function": { "name": "bash", "description": "Run bash commands. Quote paths with spaces. Prefer rg over grep. Describe command in 5-10 words.", "parameters": { "type": "object", "properties": { "command": {"type": "string", "description": "Command to execute"}, "timeout": {"type": "number", "description": "Timeout in ms"}, "description": {"type": "string", "description": "5-10 word description"} }, "required": ["command", "description"] } } } ], "tool_choice": "auto", "stream": true }'

Looks like tools are not parsed only when reasoning is enabled and streaming is enabled. With streaming disabled everything is fine.

LuYanFCP · 2025-09-07T13:27:13Z

Thinks for your reply, i will resolve this issue in current pr.

curl -X POST http://localhost:1234/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "seed-oss", "max_tokens": 32000, "messages": [ { "role": "system", "content": "You are helpful assistant. Use tools to assist user. Answer concisely (<4 lines)." }, { "role": "user", "content": "execute ls -la" } ], "tools": [ { "type": "function", "function": { "name": "bash", "description": "Run bash commands. Quote paths with spaces. Prefer rg over grep. Describe command in 5-10 words.", "parameters": { "type": "object", "properties": { "command": {"type": "string", "description": "Command to execute"}, "timeout": {"type": "number", "description": "Timeout in ms"}, "description": {"type": "string", "description": "5-10 word description"} }, "required": ["command", "description"] } } } ], "tool_choice": "auto", "stream": true }'

Looks like tools are not parsed only when reasoning is enabled and streaming is enabled. With streaming disabled everything is fine.

WojtekMatula · 2025-09-08T11:30:31Z

I think there is one more issue with the tool parsing. I told you that tool parsing works, even with streaming enabled, as long as reasoning is disabled but mow I think it is not 100% true.
Yes, tool usage is parsed but I think all parameters are always wrapped with double quotes when streaming is enabled.

In my curl example:
curl -X POST http://localhost:1234/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "seed-oss", "max_tokens": 32000, "messages": [ { "role": "system", "content": "You are helpful assistant. Use tools to assist user. Answer concisely (<4 lines)." }, { "role": "user", "content": "execute ls -la" } ], "tools": [ { "type": "function", "function": { "name": "bash", "description": "Run bash commands. Quote paths with spaces. Prefer rg over grep. Describe command in 5-10 words.", "parameters": { "type": "object", "properties": { "command": {"type": "string", "description": "Command to execute"}, "timeout": {"type": "number", "description": "Timeout in ms"}, "description": {"type": "string", "description": "5-10 word description"} }, "required": ["command", "description"] } } } ], "tool_choice": "auto", "stream": true }'

Tool invocation is always failing in stream mode because model output:
{
"timeout": "10000",
}
instead of:
{
"timeout": 10000
}
This is working like that across all json types, for example arrays are also wrapped with double quotes.

First I thought that this was a problem with the model but when I was testing the model with curls, without "stream": true flag, the tool call json was fine.

So to sum up:
Tool parsing not working in stream mode if reasoning parsing is on.
Tool parsing is working partially (it is warping all parameters with double quotes) in stream mode if reasoning parsing is off.

vllm/reasoning/basic_parsers.py

Signed-off-by: Yan Lu <luyan@nvidia.com>

vllm/reasoning/abs_reasoning_parsers.py

chaunceyjiang

Overall, LGTM.

/cc @aarnphm @gaocegege PTAL.

chaunceyjiang · 2025-09-15T05:57:19Z

vllm/reasoning/seedoss_reasoning_parser.py

Could you paste your local test results, especially with Stream=True?

OK，I will submit some case result in local

unittest file already test this case

LuYanFCP · 2025-09-15T16:59:34Z

This issue in
.

When stream=True If delta_text include both tool start and end token, it will be return error. I also noticed this issue when I turned off the reason parser.

I think there is one more issue with the tool parsing. I told you that tool parsing works, even with streaming enabled, as long as reasoning is disabled but mow I think it is not 100% true. Yes, tool usage is parsed but I think all parameters are always wrapped with double quotes when streaming is enabled.

In my curl example: curl -X POST http://localhost:1234/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "seed-oss", "max_tokens": 32000, "messages": [ { "role": "system", "content": "You are helpful assistant. Use tools to assist user. Answer concisely (<4 lines)." }, { "role": "user", "content": "execute ls -la" } ], "tools": [ { "type": "function", "function": { "name": "bash", "description": "Run bash commands. Quote paths with spaces. Prefer rg over grep. Describe command in 5-10 words.", "parameters": { "type": "object", "properties": { "command": {"type": "string", "description": "Command to execute"}, "timeout": {"type": "number", "description": "Timeout in ms"}, "description": {"type": "string", "description": "5-10 word description"} }, "required": ["command", "description"] } } } ], "tool_choice": "auto", "stream": true }'

Tool invocation is always failing in stream mode because model output: { "timeout": "10000", } instead of: { "timeout": 10000 } This is working like that across all json types, for example arrays are also wrapped with double quotes.

First I thought that this was a problem with the model but when I was testing the model with curls, without "stream": true flag, the tool call json was fine.

So to sum up: Tool parsing not working in stream mode if reasoning parsing is on. Tool parsing is working partially (it is warping all parameters with double quotes) in stream mode if reasoning parsing is off.

gaocegege

It includes a refactor and a new feature (SeedOSS support), which adds complexity and modifies the Mistral, DeepSeek, and Qwen3 code paths. Still, we have reasoning tests for Mistral, DeepSeek, and Qwen3, so I think it should work.

Signed-off-by: Yan Lu <luyan@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

vanshilshah97 · 2025-10-01T16:21:16Z

Hi @LuYanFCP
Great work !
are the issues with solved ? or is there a follow up work to be done ? a MR maybe a open issue ?

I think there is one more issue with the tool parsing. I told you that tool parsing works, even with streaming enabled, as long as reasoning is disabled but mow I think it is not 100% true. Yes, tool usage is parsed but I think all parameters are always wrapped with double quotes when streaming is enabled.

In my curl example: curl -X POST http://localhost:1234/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "seed-oss", "max_tokens": 32000, "messages": [ { "role": "system", "content": "You are helpful assistant. Use tools to assist user. Answer concisely (<4 lines)." }, { "role": "user", "content": "execute ls -la" } ], "tools": [ { "type": "function", "function": { "name": "bash", "description": "Run bash commands. Quote paths with spaces. Prefer rg over grep. Describe command in 5-10 words.", "parameters": { "type": "object", "properties": { "command": {"type": "string", "description": "Command to execute"}, "timeout": {"type": "number", "description": "Timeout in ms"}, "description": {"type": "string", "description": "5-10 word description"} }, "required": ["command", "description"] } } } ], "tool_choice": "auto", "stream": true }'

Tool invocation is always failing in stream mode because model output: { "timeout": "10000", } instead of: { "timeout": 10000 } This is working like that across all json types, for example arrays are also wrapped with double quotes.

First I thought that this was a problem with the model but when I was testing the model with curls, without "stream": true flag, the tool call json was fine.

So to sum up: Tool parsing not working in stream mode if reasoning parsing is on. Tool parsing is working partially (it is warping all parameters with double quotes) in stream mode if reasoning parsing is off.

Signed-off-by: Yan Lu <luyan@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

Signed-off-by: Yan Lu <luyan@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: gaojc <1055866782@qq.com>

CallmeZhangChenchen · 2025-10-10T02:32:07Z

When stream=True, the Seed-OSS tool call returns an incorrect structure

ChatCompletionChunk(id='chatcmpl-19b9a38b01f8455eb92c413125dad057', choices=[Choice(delta=ChoiceDelta(content='</seed:tool_call>', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, token_ids=None)], created=1760060258, model='Seed-OSS-36B-Instruct-AWQ', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)

It should be in tool_calls, but it ended up in content

Signed-off-by: Yan Lu <luyan@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Yan Lu <luyan@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

Signed-off-by: Yan Lu <luyan@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Yan Lu <luyan@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

LuYanFCP requested a review from aarnphm as a code owner September 4, 2025 17:02

mergify bot added deepseek Related to DeepSeek models qwen Related to Qwen models labels Sep 4, 2025

gemini-code-assist bot reviewed Sep 4, 2025

View reviewed changes

vllm/reasoning/basic_parsers.py Outdated Show resolved Hide resolved

feat: add seed_oss thinking parser

1cb0d80

Signed-off-by: Yan Lu <luyan@nvidia.com>

LuYanFCP force-pushed the feat/seed_oss_parse_support branch 3 times, most recently from b04c83a to 32e4fd4 Compare September 5, 2025 01:47

LuYanFCP changed the title ~~Support SeedOss Reason Parser~~ [feat] Support SeedOss Reason Parser Sep 5, 2025

LuYanFCP changed the title ~~[feat] Support SeedOss Reason Parser~~ [Model] Support SeedOss Reason Parser Sep 5, 2025

LuYanFCP force-pushed the feat/seed_oss_parse_support branch 5 times, most recently from 3bb52fc to 77a4ec1 Compare September 5, 2025 13:31

LuYanFCP closed this Sep 6, 2025

LuYanFCP reopened this Sep 6, 2025

LuYanFCP force-pushed the feat/seed_oss_parse_support branch from 245d6de to 3065687 Compare September 6, 2025 14:50

refactor: Refactor Qwen3/deepseek_r1/seedoss parsers to use a common …

a733746

…BaseThinkingReasoningParser base implementation. Signed-off-by: Yan Lu <luyan@nvidia.com>

LuYanFCP force-pushed the feat/seed_oss_parse_support branch from a1e6c1e to a733746 Compare September 6, 2025 15:15

LuYanFCP and others added 2 commits September 6, 2025 23:15

Merge branch 'main' into feat/seed_oss_parse_support

666c629

ci: fix pre-commmit check issue

110e1eb

Signed-off-by: Yan Lu <luyan@nvidia.com>

LuYanFCP force-pushed the feat/seed_oss_parse_support branch from bfaed5d to 110e1eb Compare September 6, 2025 16:37

jeejeelee requested a review from chaunceyjiang September 12, 2025 10:22

chaunceyjiang reviewed Sep 12, 2025

View reviewed changes

vllm/reasoning/basic_parsers.py Outdated Show resolved Hide resolved

ci: fix pre-commmit check issue

6f15f64

Signed-off-by: Yan Lu <luyan@nvidia.com>

chaunceyjiang reviewed Sep 15, 2025

View reviewed changes

vllm/reasoning/abs_reasoning_parsers.py Show resolved Hide resolved

chaunceyjiang approved these changes Sep 15, 2025

View reviewed changes

chaunceyjiang reviewed Sep 15, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 17, 2025

Merge branch 'main' into feat/seed_oss_parse_support

3821c0a

gaocegege approved these changes Sep 22, 2025

View reviewed changes

mgoin merged commit be0bb56 into vllm-project:main Sep 24, 2025
40 checks passed

bfroemel mentioned this pull request Sep 24, 2025

[Model] Added reasoning parser for Seed-Oss model #23486

Closed

4 tasks

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Model] Support SeedOss Reason Parser (vllm-project#24263)

aaf758a

Signed-off-by: Yan Lu <luyan@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[Model] Support SeedOss Reason Parser (#24263)

3c62d28

Signed-off-by: Yan Lu <luyan@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025

[Model] Support SeedOss Reason Parser (vllm-project#24263)

76a01f3

Signed-off-by: Yan Lu <luyan@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: gaojc <1055866782@qq.com>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[Model] Support SeedOss Reason Parser (vllm-project#24263)

a279a6e

Signed-off-by: Yan Lu <luyan@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Model] Support SeedOss Reason Parser (vllm-project#24263)

96f2135

Signed-off-by: Yan Lu <luyan@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Model] Support SeedOss Reason Parser (vllm-project#24263)

90bbe4f

Signed-off-by: Yan Lu <luyan@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>

Uh oh!

[Model] Support SeedOss Reason Parser #24263

[Model] Support SeedOss Reason Parser #24263

Uh oh!

Conversation

LuYanFCP commented Sep 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Sep 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

WojtekMatula commented Sep 6, 2025

Uh oh!

LuYanFCP commented Sep 6, 2025

Uh oh!

WojtekMatula commented Sep 6, 2025

Uh oh!

LuYanFCP commented Sep 7, 2025

Uh oh!

WojtekMatula commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LuYanFCP commented Sep 7, 2025

Uh oh!

WojtekMatula commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chaunceyjiang left a comment

Choose a reason for hiding this comment

Uh oh!

chaunceyjiang Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

LuYanFCP Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

LuYanFCP Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

LuYanFCP commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gaocegege left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vanshilshah97 commented Oct 1, 2025

Uh oh!

CallmeZhangChenchen commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

LuYanFCP commented Sep 4, 2025 •

edited by github-actions bot

Loading

WojtekMatula commented Sep 7, 2025 •

edited

Loading

WojtekMatula commented Sep 8, 2025 •

edited

Loading

LuYanFCP commented Sep 15, 2025 •

edited

Loading