Skip to content

Conversation

@cedonley
Copy link
Contributor

@cedonley cedonley commented Dec 7, 2024

REDO of PR #10782

FIX #10781
FIX #10589
FIX #10821
FIX #10900

Key fixed issues:

  • UTF-8 escaping mismatches result in argument corruption
  • Final delta often not returned when chunks are small
  • Short initial arguments can corrupt entire arguments return
  • Mistral tool call id generation is inconsistent between stream/non-stream and causes failures

To be reviewed in future PRs:

  • Simplifying overall tool streaming -- reduce reliance on json rewriting and diff's
  • Identify similar issues in other tool parsers
  • When using speculative decoding, final delta can sometimes still be dropped

@K-Mistele @gcalmettes

Signed-off-by: cedonley <clayton@donley.io>
@github-actions
Copy link

github-actions bot commented Dec 7, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added the frontend label Dec 7, 2024
@cedonley
Copy link
Contributor Author

cedonley commented Dec 7, 2024

Regarding failure in entrypoints-test:

  • Pythonic tool test failure is not a regression
  • Appears to be an intermittent issue with that parser similar to the ones I've fixed with hermes and mistral
    • Final delta of arguments seems to be missing
  • The issue appears in fastcheck runs from other recent non-tool PRs, so it is not caused by this PR
  • My local run of pytest on this branch passes for the same tests

I looked at that parser to see if there was an easy fix, but the logic is somewhat different from the others and I'd rather not try to overload this PR.

Copy link
Contributor

@gcalmettes gcalmettes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this.

@gcalmettes
Copy link
Contributor

I agree that the overall tool streaming extraction logic could/should be simplified, and should be addressed in another PR.

@K-Mistele
Copy link
Contributor

I agree that the overall tool streaming extraction logic could/should be simplified, and should be addressed in another PR.

Yes, I have been of half a mind to rewrite from scratch using finite state machines which seems like the most appropriate & clean approach; just haven't had time to dig into it yet

@K-Mistele K-Mistele mentioned this pull request Dec 9, 2024
1 task
@cedonley
Copy link
Contributor Author

cedonley commented Dec 9, 2024

@K-Mistele Yeah. I was thinking the same thing.

It doesn't help that the OpenAI tool interfaces have changed multiple times and the underlying models don't even stay consistent within families. Going to be hard to completely future-proof this, so may want to let things settle a bit before the rewrite.

@cedonley
Copy link
Contributor Author

@DarkLight1337 @mgoin This is good to merge per reviews from @K-Mistele and @gcalmettes

Thanks!

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) December 11, 2024 22:37
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 11, 2024
@DarkLight1337
Copy link
Member

Can you merge from main to fix the CI failures?

@DarkLight1337 DarkLight1337 merged commit 7439a8b into vllm-project:main Dec 12, 2024
51 checks passed
@cedonley cedonley deleted the fix_tool_streaming branch December 12, 2024 01:12
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
@erkintelnyx
Copy link

erkintelnyx commented Jun 3, 2025

Hi @cedonley

Can I ask which of the issues does this part solve?

This introduced a bug for streaming parser on Qwen/Qwen3-235B-A22B where:
current_text : <tool_call>{"name": "shuffle_deck", "arguments": {"deck_count": 1
delta_text : 1

prev_arguments : {}
cur_arguments : {"deck_count": 1}

and because cur_arguments json string [:-2] removes 1 it gets into that if statement and returns None.

I'm trying to understand what does [:-2] supposed to eliminate, maybe meant to remove where the partial json parser adds "} ? If so, why not just check for "} ?

Would you remember the context here maybe @DarkLight1337 ?

@cedonley
Copy link
Contributor Author

cedonley commented Jun 3, 2025

Good question. At the time the surrounding code was always feeding "}, so I probably didn't check for it. Seems like something changed since then, or perhaps I didn't test with a numeric (non-string) value in the final value.

@erkintelnyx
Copy link

I see, so it makes sense to check if it ends with "} first as far as I can see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

5 participants