Skip to content

Conversation

@ExtReMLapin
Copy link
Contributor

@ExtReMLapin ExtReMLapin commented Sep 2, 2025

Purpose

fixed reasoning not being sent to client when tool_choice="required"

closes #14429

Test Plan

Added one test to ensure it's returned in streamed data pytest ./tests/entrypoints/openai/test_completion_with_function_calling.py

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the frontend label Sep 2, 2025
@ExtReMLapin ExtReMLapin force-pushed the streaming_tool_required_true branch from a0c7ec3 to e95416c Compare September 3, 2025 13:12
@ExtReMLapin ExtReMLapin marked this pull request as ready for review September 3, 2025 13:13
@ExtReMLapin ExtReMLapin requested a review from aarnphm as a code owner September 3, 2025 13:13
@ExtReMLapin
Copy link
Contributor Author

Tested on multiple qwen models + tool

  • Qwen 3 with reasoning
  • Qwen 3 with reasoning disabled
  • Qwen 2.5

Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
@ExtReMLapin ExtReMLapin force-pushed the streaming_tool_required_true branch from 58cfd8a to 17853a1 Compare September 4, 2025 13:47
@mergify
Copy link

mergify bot commented Sep 8, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ExtReMLapin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 8, 2025
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>
CNE Pierre FICHEPOIL added 2 commits September 12, 2025 12:08
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
@ExtReMLapin ExtReMLapin force-pushed the streaming_tool_required_true branch from 02a8dde to 4d8d81c Compare September 12, 2025 12:09
@ExtReMLapin
Copy link
Contributor Author

@DarkLight1337 @heheda12345
@simon-mo

I'm not sure exactly who to ping to get it reviewed

@DarkLight1337
Copy link
Member

cc @aarnphm @chaunceyjiang

Copy link
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reasoning not being sent to client when tool_choice="required"

Could you provide a reproduction step?

The combination of stream + enable_thinking + required has been continuously tested in e2e.

https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_completion_with_function_calling.py#L165-L177

@ExtReMLapin
Copy link
Contributor Author

ExtReMLapin commented Sep 12, 2025

@chaunceyjiang there is no assert/check/test in the stream mode for reasoning

https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_completion_with_function_calling.py#L218

master/HEAD :

image image

(See it directly goes into tool call , something like 10 seconds between first message and tool call start)

query.js


This branch :

image

@ExtReMLapin
Copy link
Contributor Author

ExtReMLapin commented Sep 12, 2025

Also this PR cover both forced reasoning models (like Qwen3 2507 that doesn't output opening reasoning tag) and original ones that outputs both opening and closing reasoning tags

Copy link
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CNE Pierre FICHEPOIL added 2 commits September 15, 2025 15:20
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
@ExtReMLapin
Copy link
Contributor Author

Got it for the changes.

In the tests i'm having a weird issue where

        output = []
        reasoning = []
        async for chunk in output_stream:
            if chunk.choices:
                if enable_thinking and chunk.choices[0].delta.reasoning_content:
                    reasoning.append(chunk.choices[0].delta.reasoning_content)
                if chunk.choices[0].delta.tool_calls:
                    output.extend(chunk.choices[0].delta.tool_calls)

        assert len(output) > 0
        if enable_thinking:
            assert len(reasoning) > 0

Doesn't work because the openai class doesn't have this param, and I don't understand why it doesn't error with the non stream part.

So instead I moved to if enable_thinking and getattr(chunk.choices[0].delta, "reasoning_content", None): not sure if it's good.

@ExtReMLapin ExtReMLapin marked this pull request as draft September 15, 2025 16:14
@ExtReMLapin
Copy link
Contributor Author

And something's broken with non reasoning models so i'll fix it when I get back from vacations

@ExtReMLapin
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an issue where reasoning content was not streamed correctly when tool_choice="required". The fix involves using the correct streaming-aware function for extracting reasoning content. The associated test is also updated to verify this behavior.

My review focuses on the maintainability of the fix. While the fix is correct, it introduces code duplication for handling reasoning streaming across different tool_choice scenarios. I've suggested refactoring this duplicated logic into a helper function to improve code clarity and reduce the risk of future inconsistencies.

@ExtReMLapin
Copy link
Contributor Author

Considering the precommit warning linked to the values of reasoning_end_arr in

        if tool_choice_auto or self.reasoning_parser:
            # These are only required in "auto" tool choice case
            all_previous_token_ids = [[]] * num_choices
            # For reasoning parser and tool call all enabled
            added_content_delta_arr = [False] * num_choices
            reasoning_end_arr = [False] * num_choices
        else:
            all_previous_token_ids = None
            reasoning_end_arr = None

Would you be fine with reasoning_end_arr = [False] * num_choices being initialized either way outside of the if ? @chaunceyjiang

@chaunceyjiang
Copy link
Collaborator

Would you be fine with reasoning_end_arr = [False] * num_choices being initialized either way outside of the if ? @chaunceyjiang

I haven’t reviewed your PR carefully yet, but my understanding is that reasoning_end_arr should only be used when if self.reasoning_parser is True.

CNE Pierre FICHEPOIL and others added 3 commits September 18, 2025 07:11
@mergify
Copy link

mergify bot commented Oct 8, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ExtReMLapin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 8, 2025
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>
@mergify mergify bot removed the needs-rebase label Oct 8, 2025
CNE Pierre FICHEPOIL and others added 2 commits October 9, 2025 06:44
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
@ExtReMLapin
Copy link
Contributor Author

Could it be possible to have a merge on that ? It's honestly not that complicated to see it's fixed and as of right now with reasoning model it's not possible to know what's going on if it's a long generation loop, all you'll see in the server console is it generated token (Avg generation throughput: 51.6 tokens/s,) but you can't know if it's actualy legitimate reasonining tokens or stuck in an infinite loop.

And obviously with non streamer queries, you don't know anything, you don't even know if you went past the reasoning part.

cc @chaunceyjiang

Copy link
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks~

@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 21, 2025
@chaunceyjiang chaunceyjiang enabled auto-merge (squash) October 21, 2025 06:11
@chaunceyjiang
Copy link
Collaborator

Hi @ExtReMLapin, there are currently some issues with the CI on the main branch. Let's wait for them to be fixed before proceeding with this PR.

@chaunceyjiang chaunceyjiang merged commit a4c29e6 into vllm-project:main Oct 22, 2025
48 checks passed
@ExtReMLapin
Copy link
Contributor Author

Hooray !

usberkeley pushed a commit to usberkeley/vllm that referenced this pull request Oct 23, 2025
…4108)

Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 23, 2025
…4108)

Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025
…4108)

Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…4108)

Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…4108)

Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: support tool and reasoning together

3 participants