- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 11k
 
[Bugfix]: Fix the incompatibility issue with Structured Outputs when Thinking is disabled #18879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix]: Fix the incompatibility issue with Structured Outputs when Thinking is disabled #18879
Conversation
| 
           👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run  Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add  🚀  | 
    
f4267d7    to
    4f20677      
    Compare
  
    …Thinking is disabled Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
4f20677    to
    2156cfa      
    Compare
  
    | 
           Qwen3 and DeepSeek_R1 # vllm serve /home/jovyan/public-models/Deepseek-R1-Distill-Qwen-14B  --enable-auto-tool-choice --tool-call-parser hermes --reasoning-parser deepseek_r1
# vllm serve /home/jovyan/qwen3-32b-awq  --enable-auto-tool-choice --tool-call-parser hermes --reasoning-parser qwen3python test.py  | 
    
| 
           /cc @aarnphm PTAL.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
| 
           @DarkLight1337 PTAL。  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stamp
…Thinking is disabled (vllm-project#18879) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: amit <amit.man@gmail.com>
…Thinking is disabled (vllm-project#18879) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: amit <amit.man@gmail.com>
| 
           Hello! How can i fix the problem in V0 Engine? So I want to fix the problem in V0 Engine.  | 
    
| 
           @AlphaINF V0 has already been deprecated. I suggest upgrading to v1.  | 
    
          
 However, V1 has some reliability problem, it will crash if you send more requests.  | 
    
PART 1: Eagle + Structured Output FSM Validation Fix ===================================================== ISSUE: Eagle speculative decoding with structured output crashes with AssertionError when FSM rejects tokens in scheduled_spec_decode_tokens list. SYMPTOMS: - Error: "Failed to advance FSM for request ... for tokens XXX" - Followed by: AssertionError at vllm/v1/structured_output/__init__.py:263 - Crashes entire vLLM engine under load with Eagle + tool calling ROOT CAUSE (partially identified): - FSM can terminate mid-validation when accepting stop token - Remaining spec tokens still attempted for validation - Original code asserts all scheduled tokens must be valid - Assertion fails when FSM rejects tokens after termination SOLUTION: Implemented defensive fix in grammar_bitmask() method: - Replace assertion with conditional check - If token rejected, log debug message and continue - Still fill bitmasks for all positions (required by downstream code) - Makes code resilient to FSM state mismatches IMPLEMENTATION: - New patch: mantle_extensions/patches/eagle_structured_output_fix.py - Monkey-patches StructuredOutputManager.grammar_bitmask() - Registered as 12th patch in plugin system - Enabled by default in patch_config.json TESTING: ✓ Plugin loads successfully with all 12 patches ✓ No more AssertionError crashes ✓ No more 500 Internal Server errors ✓ Eagle + structured output + penalties works correctly ⚠ Expected warnings from xgrammar about terminated FSM (benign) NOTES: - This is a defensive fix without full root cause understanding - Possible causes: FSM state mismatch, xgrammar rollback bug, concurrency - Upstreamable: Yes - should be contributed to vLLM upstream - Bug exists since PR vllm-project#18879 (May 2025) PART 2: Clean Up Unused Patch Files ==================================== Removed 3 unused patch files: 1. pr26291_streaming_method.py - Unused reference implementation 2. streaming_patches.py - Unused streaming patch loader 3. qwen3_tool_parser_fix_complete.py - Now implemented in-tree Updated files: - mantle_extensions/patches/__init__.py - Removed streaming_patches export - mantle_extensions/plugin.py - Added note about qwen3 in-tree fix Rationale: - pr26291 and streaming_patches were never used in production - qwen3 fix moved to in-tree (line 523) due to APIServer plugin limitation - Keeping unused files adds maintenance burden and confusion SUMMARY: - Added: 1 new critical fix (eagle_structured_output_fix) - Removed: 3 unused patch files - Total active patches: 12 (all enabled and working) Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com>
FIX #18821 (comment)
Introduced by PR #16577.