-
Notifications
You must be signed in to change notification settings - Fork 690
feat: Add frontend support for min_tokens and ignore_eos (outside of nvext) and Structured Output / Guided Decoding
#2380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis change introduces a new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API
participant RequestStruct
participant CommonExt
participant NvExt
Client->>API: Sends request with ignore_eos/min_tokens (root/nvext)
API->>RequestStruct: Deserialize request
RequestStruct->>CommonExt: Access ignore_eos/min_tokens
RequestStruct->>NvExt: Access ignore_eos (if present)
CommonExt->>RequestStruct: Provide effective values (nvext takes precedence for ignore_eos)
RequestStruct->>API: Return effective ignore_eos/min_tokens
API->>Client: Responds using effective values
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Signed-off-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (8)
lib/llm/src/entrypoint/input/text.rs (1)
121-123: Explicit CommonExt initialization: LGTM; consider wiring min_tokens via common hereSetting
common: Default::default()is correct and keeps behavior unchanged whilenvext.ignore_eosremains authoritative per precedence rules.Optionally, you can capitalize on the new root-level support and remove the prior limitation about
min_tokensby setting it viacommon(no need to touch async-openai builder). For example, if you want to apply a large min_tokens when inspecting templates:- let req = NvCreateChatCompletionRequest { - inner, - common: Default::default(), - nvext: Some(nvext), - }; + // Use a pre-built `common` if you want to affect root-level fields like `min_tokens` + let req = NvCreateChatCompletionRequest { + inner, + common, + nvext: Some(nvext), + };Then, outside this range (illustrative only), compute
commonconditionally:// Above, after building `inner` let mut common = Default::default(); // If you want to force longer streaming regardless of EOS for inspection: if /* inspect_template */ false { // e.g., common.min_tokens = Some(8192); }lib/llm/src/entrypoint/input/batch.rs (1)
225-229: CommonExt initialization in batch evaluate: LGTM; consider optional min_tokens for batchInitialization is correct. If batch runs benefit from discouraging early stops, consider setting
common.min_tokens(e.g., to an evaluation-specific value) to avoid premature termination by EOS without relying on provider-specificnvext.lib/llm/src/protocols/openai.rs (1)
66-69: Doc mismatch: default impl only checks NvExtThe comment says the method considers both CommonExt and NvExt, but the default impl reads only from NvExt. Either adjust the comment or have implementors override (as you do for completions).
- /// Get the effective ignore_eos value, considering both CommonExt and NvExt. + /// Get the effective ignore_eos value. + /// + /// Default implementation reads from NvExt only. Types embedding CommonExt + /// should override to apply precedence (e.g., NvExt overrides CommonExt). fn get_ignore_eos(&self) -> Option<bool> { self.nvext().and_then(|nv| nv.ignore_eos) }lib/llm/tests/test_common_ext.rs (1)
109-146: Serialization shape check (chat) — strongVerifies flattened JSON placement and effective precedence. Consider adding a similar serialization round-trip test for completions for symmetry, though optional.
lib/llm/src/protocols/openai/common_ext.rs (3)
34-35: Redundant range validation on au32
min_tokensis alreadyu32, so it cannot be negative. The#[validate(range(min = 0))]check adds no value and just slows down validation.
45-54: Consider returning a struct instead of a tuple for clarity
merge_with_nvextreturns a naked(Option<bool>, Option<u32>), which is easy to misuse (argument order confusion). Returning a small struct such asMergedCommon { ignore_eos, min_tokens }makes call sites self-documenting and extensible.
57-67:common_ext()need not returnOptionEvery implementor so far owns a real
CommonExtvalue, neverNone. Returning&CommonExtdirectly would remove one level ofOptionhandling everywhere.lib/llm/src/protocols/openai/chat_completions.rs (1)
147-167: Avoid duplicating precedence logic – delegate tomerge_with_nvext
effective_ignore_eos/effective_min_tokensre-implement the merging already provided byCommonExt::merge_with_nvext. Call that helper instead to keep a single source of truth.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
lib/llm/src/entrypoint/input/batch.rs(1 hunks)lib/llm/src/entrypoint/input/text.rs(1 hunks)lib/llm/src/http/service/openai.rs(2 hunks)lib/llm/src/protocols/openai.rs(3 hunks)lib/llm/src/protocols/openai/chat_completions.rs(5 hunks)lib/llm/src/protocols/openai/common_ext.rs(1 hunks)lib/llm/src/protocols/openai/completions.rs(4 hunks)lib/llm/src/protocols/openai/responses.rs(1 hunks)lib/llm/tests/http-service.rs(3 hunks)lib/llm/tests/openai_completions.rs(1 hunks)lib/llm/tests/preprocessor.rs(1 hunks)lib/llm/tests/test_common_ext.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-06-24T20:59:35.725Z
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.
Applied to files:
lib/llm/src/entrypoint/input/batch.rslib/llm/tests/preprocessor.rslib/llm/src/protocols/openai/common_ext.rslib/llm/src/protocols/openai/completions.rslib/llm/src/protocols/openai/chat_completions.rs
📚 Learning: 2025-06-16T20:02:54.935Z
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/protocols.rs:85-112
Timestamp: 2025-06-16T20:02:54.935Z
Learning: When using derive_builder::Builder macro, the macro generates the builder struct and its methods, but does NOT generate a `builder()` method on the original struct. A manual `impl StructName { pub fn builder() -> StructNameBuilder { StructNameBuilder::default() } }` is required to provide the convenient `StructName::builder()` API pattern.
Applied to files:
lib/llm/tests/openai_completions.rs
📚 Learning: 2025-07-14T21:25:56.930Z
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from `Send + Sync + Debug` to `Send + Debug` because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.
Applied to files:
lib/llm/tests/http-service.rslib/llm/tests/preprocessor.rs
📚 Learning: 2025-06-13T22:07:24.843Z
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
Applied to files:
lib/llm/tests/http-service.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (20)
lib/llm/src/http/service/openai.rs (2)
1240-1242: Tests updated for CommonExt presence: LGTMAdding
common: Default::default()aligns tests with the new struct shape and preserves behavior.
1267-1269: Tests updated for CommonExt presence: LGTMConsistent initialization of
commonacross tests ensures future fields (e.g.,min_tokens) remain tested/serializable without breaking shapes.lib/llm/tests/openai_completions.rs (1)
39-43: Include CommonExt in test request construction: LGTMThis keeps samples aligned with the new request schema and avoids future breakage when asserting JSON snapshots.
lib/llm/tests/http-service.rs (3)
768-770: NvCustom client test: CommonExt initialization is correctGood consistency with the updated struct. No behavior changes.
806-808: NvCustom client test (failing model path): LGTMExplicit
commoninit keeps schema stable across success/error flows.
846-847: NvCustom client test with context: LGTMMaintains parity across context-aware path as well.
lib/llm/src/protocols/openai/responses.rs (1)
176-191: Initializecommonon conversion — goodExplicitly setting
common: Default::default()keeps the struct well-formed after the CommonExt addition and avoids serde surprises later.lib/llm/tests/preprocessor.rs (1)
269-273: Test struct construction kept in syncAdding
common: Default::default()ensures the testRequest::fromhelper builds a valid request post-refactor. Looks good.lib/llm/src/protocols/openai.rs (2)
25-25: Exportingcommon_extmoduleGood addition; keeps the module graph coherent with the new CommonExt plumbing.
151-153: Chat-completions override for get_ignore_eos already in placeI’ve confirmed that
chat_completions.rsimplements:fn get_ignore_eos(&self) -> Option<bool> { self.effective_ignore_eos() }so the indirection via
self.get_ignore_eos()is honored and no further changes are needed.lib/llm/tests/test_common_ext.rs (6)
12-29: Covers root-level fields (chat) — goodDeserialization and effective getters for
ignore_eosandmin_tokensare validated. Solid baseline.
30-49: Precedence test (chat) is correctConfirms NvExt overrides CommonExt for
ignore_eosand preservesmin_tokens. Exactly the intended behavior.
51-69: Backward compatibility (chat) validatedEnsures legacy
nvext.ignore_eosstill works without root fields. Nice.
70-87: Root-level fields (completions) — goodMirrors the chat tests for completions; correct assertions.
88-107: Precedence test (completions) — goodNvExt override is enforced;
min_tokensremains from CommonExt. Looks correct.
148-162:min_tokensroot-only behavior — goodConfirms
min_tokensis surfaced via CommonExt only. Matches design.lib/llm/src/protocols/openai/completions.rs (4)
138-153: Precedence logic is correct
nvext.ignore_eosoverridescommon.ignore_eos;min_tokenssourced fromcommon. Matches PR intent and tests.
160-163: Stop conditions now surfacemin_tokensDelegating to
effective_min_tokens()wiresmin_tokensthrough to StopConditions. Good.
172-174: Overrideget_ignore_eosto honor precedenceReturning
effective_ignore_eos()ensures StopConditions respect NvExt-over-CommonExt. Good.
41-46: serde(flatten) integration is correct and CommonExt implements Default
- In lib/llm/src/protocols/openai/common_ext.rs (lines 21–23),
CommonExtis annotated with#[derive(..., Default)], so default construction paths are supported as expected.- The
min_tokens: Option<u32>field and its precedence logic are fully tested inmerge_with_nvext, covering bothNoneandSomecases.No changes required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flipping precedence to top level -> nvext based on offline discussion, then should be ready for re-review (and making clippy and related checks pass)
min_tokens and ignore_eos (outside of nvext)min_tokens and ignore_eos (outside of nvext) and Structured Output / Guided Decoding
… of `nvext`) and Structured Output / Guided Decoding (#2380) Signed-off-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Co-authored-by: Ayush Agarwal <ayushag@nvidia.com> Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Overview:
For structured output / guided decoding change specifically, see #2404 that was included in this PR:
For min_tokens/ignore_eos at top level, see the rest of this PR and description.
With the changes proposed in this PR, we aim to provide support to pass the
ignore_eosandmin_tokensfield at the root level of the json body of an inference request. This should aid with compatibility with third-party benchmarking tools and improve the UX of our API.Current State:
We send parameters in the json body as:
Proposal:
Details:
Note:
min_tokensisn't supported, so no conflict in adding support for this.ignore_eosis supported withinnvext, support for existing workflow is maintained, while also enabling root-level access forignore_eosSince
ignore_eoscan be specified in two locations, we currently allow theignore_eosinnvextto override any value set by the root-level field.Summary by CodeRabbit
New Features
ignore_eosandmin_tokensparameters in API requests, with clear precedence rules between common and extended fields.Bug Fixes
Tests