-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Description
Your current environment
- vllm 0.8.3
- xgrammar
🐛 Describe the bug
Hi team,
I’d like to propose removing the current fallback mechanism to outlines for int/number range constraints and pattern-based validations in Guided JSON mode.
vllm/vllm/model_executor/guided_decoding/utils.py
Lines 6 to 23 in ee378f3
| def has_xgrammar_unsupported_json_features(schema: dict) -> bool: | |
| """Check if JSON schema contains features unsupported by xgrammar.""" | |
| def check_object(obj: dict) -> bool: | |
| if not isinstance(obj, dict): | |
| return False | |
| # Check for pattern restrictions | |
| if "pattern" in obj: | |
| return True | |
| # Check for numeric ranges | |
| if obj.get("type") in ("integer", "number") and any( | |
| key in obj for key in [ | |
| "minimum", "maximum", "exclusiveMinimum", | |
| "exclusiveMaximum", "multipleOf" | |
| ]): | |
| return True |
Previously, vLLM defaulted to Outlines when encountering complex JSON schema elements (like pattern, minimum, or maximum) due to limitations in xgrammar. #10899
However, xgrammar has now added full support for these schema constraints, which makes this fallback no longer necessary.
- [Feature] Support Regex for GrammarCompiler mlc-ai/xgrammar#185
- Float range support for xgrammar mlc-ai/xgrammar#289
- https://github.com/mlc-ai/xgrammar/blob/8fa47978e37970865a6630a9533f2e1db7dc8f46/cpp/json_schema_converter.cc#L1645-L1655
The main motivation behind this PR is not only to simplify and unify the backend behavior, but also to address a serious performance concern. When fallback to Outlines occurs in production, we've experienced:
- Significantly higher memory usage, leading to OOM errors, and
- Much slower response times, which in some cases rendered Guided JSON unusable for real-time services.
Additionally, this proposal is strongly inspired by the recent PR [Bugfix][v1] xgrammar structured output supports Enum, which brought long-awaited support for enum in xgrammar. Just as that change allowed us to move away from unnecessary fallbacks for enum types, we believe it's now equally reasonable and timely to eliminate fallbacks related to numeric ranges and regex patterns as well.
This change would allow Guided JSON to be more reliable and performant under real-world workloads, especially when schemas include numeric ranges or regex patterns.
Looking forward to your feedback!
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.