-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[https://nvbugs/5361178][fix]: Json schema support in trtllm-serve using xgrammar #6197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
adding the changes to support the json_schema as one of the supported type Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com>
adding flags related to the lora_request else it will give 400 request code Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com>
WalkthroughSupport for a new Changes
Estimated code review effort2 (~20 minutes) Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tensorrt_llm/serve/openai_protocol.py
(3 hunks)tensorrt_llm/serve/openai_server.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
tensorrt_llm/serve/openai_server.py
294-294: Local variable lora_request
is assigned to but never used
Remove assignment to unused variable lora_request
(F841)
🔇 Additional comments (3)
tensorrt_llm/serve/openai_protocol.py (3)
55-59
: LGTM! Clean extension of ResponseFormat for JSON schema support.The implementation correctly extends the existing response format types to include
"json_schema"
and adds the appropriate optional field to hold the schema data. The type annotations and field definitions follow the established patterns.
148-149
: LGTM! Correct implementation of JSON schema guided decoding.The conversion logic properly handles the new
"json_schema"
type by passing the schema data toGuidedDecodingParams(json=response_format.json_schema)
. This follows the expected pattern and integrates well with the existing guided decoding framework.
211-211
: LGTM! Documentation accurately reflects the new capability.The description correctly includes
{'type': 'json_schema'}
alongside the existing supported format types, maintaining consistency with the documentation style.
fixing typo with `lora_request` Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com>
@mayani-nv #6000 added overlap scheduler support for guided decoding. Would that be sufficient to run |
Fixing the `lora_request` typo Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com>
PR description has both json_schema and structural tag example. Since this PR only adds json_schema, can we remove the structural_tag example to keep the description concise? @mayani-nv |
structures: Optional[List[StructuralTag]] = None | ||
triggers: Optional[List[str]] = None | ||
json_schema: Optional[Dict[str, Any]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a simple test case for json_schema. can be based on the test for structural_tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, please review.
removing the `lora_request` as the latest main branch contains this defined. Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com>
Adding the unit test for the json_schema support in xgrammar Signed-off-by: mayani-nv <67936769+mayani-nv@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/unittest/llmapi/apps/_test_openai_json_schema.py (2)
15-17
: Fix inconsistent fixture ID.The fixture ID "TinyLlama-1.1B-Chat" doesn't match the actual model name "llama-3.1-model/Llama-3.1-8B-Instruct". This inconsistency could be confusing.
-@pytest.fixture(scope="module", ids=["TinyLlama-1.1B-Chat"]) +@pytest.fixture(scope="module", ids=["Llama-3.1-8B-Instruct"])
57-67
: Consider more inclusive regex pattern for city names.The regex pattern
^\w+$
for the capital name field might be too restrictive, as it only allows word characters and would reject valid city names with spaces, hyphens, or apostrophes (e.g., "New York", "Saint-Denis").Consider a more inclusive pattern:
- name: str = Field(..., - pattern=r"^\w+$", - description="The name of the capital city") + name: str = Field(..., + pattern=r"^[\w\s\-'\.]+$", + description="The name of the capital city")However, since this test specifically asks for Paris (which matches the current pattern), the current implementation works for this test case.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/unittest/llmapi/apps/_test_openai_json_schema.py
(1 hunks)
🔇 Additional comments (5)
tests/unittest/llmapi/apps/_test_openai_json_schema.py (5)
1-12
: LGTM! Well-organized imports and appropriate test configuration.The imports are logically grouped and the thread leak detection disable is appropriate for OpenAI server integration tests.
20-34
: LGTM! Proper resource management and configuration.The fixture correctly creates a temporary configuration file with appropriate cleanup in the finally block. The xgrammar backend configuration aligns with the JSON schema support requirements.
36-44
: LGTM! Clean server setup with proper resource management.The server fixture correctly configures the RemoteOpenAIServer with the necessary backend and options.
47-54
: LGTM! Clean client fixture setup.Both synchronous and asynchronous client fixtures are properly configured with appropriate scope.
70-103
: LGTM! Comprehensive test with good validation.The test function thoroughly validates both the OpenAI API integration and JSON schema compliance. The assertions cover response structure, content parsing, and expected values.
Consider setting
temperature=0.0
for more deterministic results in unit tests, though the current configuration should work fine for the specific prompt used.
The main commiit |
Description
The PR will help leverage
json_schema
support intrtllm-serve
. Currently, in order to run this successfuly the sequence of steps is as followswhere the
extra-llm-api-config.yaml
needs to contain theguided_decoding_backend
. Secondly, thedisable_overlap_scheduler
needs to beTrue
in order for this to workThen running the following request
gives the output as
name='Paris' population=2148271
. On the server side, you can see the logs can be seen as wellSummary by CodeRabbit
New Features
Tests