-
Notifications
You must be signed in to change notification settings - Fork 691
feat: enable --dyn-reasoning-parser flag to set reasoning parser for … #2700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enable --dyn-reasoning-parser flag to set reasoning parser for … #2700
Conversation
WalkthroughIntroduces runtime-config-driven behavior for chat completion delta generation. Adds ModelRuntimeConfig to ModelDeploymentCard and OpenAIPreprocessor, updates response_generator to accept runtime_config, and selects the reasoning parser by name at runtime. Adjusts engine/test call sites. Also expands CLI help text for --dyn-reasoning-parser. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Preprocessor as OpenAIPreprocessor
participant Card as ModelDeploymentCard
participant Request as NvCreateChatCompletionRequest
participant DeltaGen as DeltaGenerator
participant Parsers as ReasoningParserType
Client->>Preprocessor: init with ModelDeploymentCard
Preprocessor->>Card: read runtime_config
Card-->>Preprocessor: runtime_config (includes reasoning_parser)
Client->>Preprocessor: create chat completion request
Preprocessor->>Request: build request
Preprocessor->>Request: response_generator(runtime_config)
Request->>DeltaGen: new(options{ runtime_config, ... })
DeltaGen->>Parsers: get_reasoning_parser_from_name(runtime_config.reasoning_parser)
Parsers-->>DeltaGen: ReasoningParserWrapper (fallback=Basic if unknown)
DeltaGen-->>Client: stream deltas
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
237-244: Role emission gating may be off (msg_counter is never incremented).create_choice sets role = Some(Assistant) when self.msg_counter == 0, but self.msg_counter is never incremented in this file. That means every chunk could include role, which diverges from OpenAI’s streaming shape (role only on the first delta). Suggest incrementing a counter on first emission.
One option (local to this method):
- role: if self.msg_counter == 0 { + role: if self.msg_counter == 0 { Some(dynamo_async_openai::types::Role::Assistant) } else { None },and then after building the response:
- dynamo_async_openai::types::CreateChatCompletionStreamResponse { + let resp = dynamo_async_openai::types::CreateChatCompletionStreamResponse { /* ...existing fields... */ - } + }; + // Increment message counter after emitting the first chunk. + self.msg_counter = self.msg_counter.saturating_add(1); + respIf the counter is intentionally incremented elsewhere, please point me to it; otherwise, this change will align with the expected stream schema.
🧹 Nitpick comments (10)
components/backends/vllm/src/dynamo/vllm/args.py (1)
117-121: Add argparsechoicesto guard against typosVerified that in
lib/parsers/src/reasoning/mod.rsthe only supported parser names inget_reasoning_parser_from_nameare"deepseek_r1","basic", and"gpt_oss"(lines 121–124). To fail fast on invalid input and keep the CLI aligned with Rust, add thechoicesparameter:parser.add_argument( "--dyn-reasoning-parser", type=str, default=None, + choices=["basic", "deepseek_r1", "gpt_oss"], help="Reasoning parser name for the model. Available options: 'basic', 'deepseek_r1', 'gpt_oss'.", )lib/parsers/src/reasoning/mod.rs (1)
119-133: Normalize parser name (trim, case, hyphen/underscore) before matchingAccept minor variants like "DeepSeek-R1" or " basic " to reduce footguns. Keep the same fallback behavior.
Apply this diff:
- pub fn get_reasoning_parser_from_name(name: &str) -> ReasoningParserWrapper { - tracing::debug!("Selected reasoning parser: {}", name); - match name.to_lowercase().as_str() { + /// Return a ReasoningParser by string name. + /// + /// Accepted values (case/whitespace-insensitive; '-' and '_' equivalent): + /// - "basic" + /// - "deepseek_r1" (or "deepseek-r1") + /// - "gpt_oss" (or "gpt-oss") + pub fn get_reasoning_parser_from_name(name: &str) -> ReasoningParserWrapper { + tracing::debug!("Selected reasoning parser: {}", name); + let key = name.trim().to_ascii_lowercase().replace('-', "_"); + match key.as_str() { "deepseek_r1" => Self::DeepseekR1.get_reasoning_parser(), "basic" => Self::Basic.get_reasoning_parser(), "gpt_oss" => Self::GptOss.get_reasoning_parser(), _ => { tracing::warn!( "Unknown reasoning parser type '{}', falling back to Basic Reasoning Parser", - name + name ); Self::Basic.get_reasoning_parser() } } }I can add unit tests covering the normalization and fallback path if you want.
lib/llm/src/model_card.rs (2)
142-144: New MDC fieldruntime_configwith serde(default) — good for backwards compatThis ensures older, serialized MDCs deserialize cleanly. Consider skipping serialization when it’s default to reduce etcd/NATS payload size if you expect most deployments to use defaults.
If ModelRuntimeConfig can expose an
is_default(&self) -> bool, you could annotate as:
- #[serde(default, skip_serializing_if = "ModelRuntimeConfig::is_default")]
I can wire up an
is_defaulthelper in runtime_config.rs and add a serde round-trip test.
448-449: Explicit default init is fine; consider a helper to centralize defaultsBoth constructors set
runtime_config: ModelRuntimeConfig::default(). That’s clear and explicit. If you find more call sites doing the same, consider a small constructor/helper (e.g.,ModelDeploymentCard::with_runtime_defaults(...)) to reduce duplication.Also applies to: 490-491
lib/llm/src/local_model.rs (2)
205-218: Avoid double clone of runtime_config in echo_full pathMinor nit: you clone into both the MDC and LocalModel. Since the builder is consumed, you can move once and clone only once.
Apply this diff:
- card.migration_limit = self.migration_limit; - card.user_data = self.user_data.take(); - card.runtime_config = self.runtime_config.clone(); + card.migration_limit = self.migration_limit; + card.user_data = self.user_data.take(); + let runtime_config = std::mem::take(&mut self.runtime_config); + card.runtime_config = runtime_config.clone(); @@ - return Ok(LocalModel { + return Ok(LocalModel { card, full_path: PathBuf::new(), endpoint_id, template, http_host: self.http_host.take(), http_port: self.http_port, tls_cert_path: self.tls_cert_path.take(), tls_key_path: self.tls_key_path.take(), router_config: self.router_config.take().unwrap_or_default(), - runtime_config: self.runtime_config.clone(), + runtime_config, });
278-293: Same nit in main path: move once, reuseMirror the optimization for the “main” (non-echo) path.
Apply this diff:
- card.migration_limit = self.migration_limit; - card.user_data = self.user_data.take(); - card.runtime_config = self.runtime_config.clone(); + card.migration_limit = self.migration_limit; + card.user_data = self.user_data.take(); + let runtime_config = std::mem::take(&mut self.runtime_config); + card.runtime_config = runtime_config.clone(); @@ - Ok(LocalModel { + Ok(LocalModel { card, full_path, endpoint_id, template, http_host: self.http_host.take(), http_port: self.http_port, tls_cert_path: self.tls_cert_path.take(), tls_key_path: self.tls_key_path.take(), router_config: self.router_config.take().unwrap_or_default(), - runtime_config: self.runtime_config.clone(), + runtime_config, })If you want to ensure we don’t drift between the two copies, we can also remove
runtime_configfrom LocalModel and read fromself.card.runtime_configat use sites. I can open a follow-up refactor PR if that direction works for you.lib/llm/src/engines.rs (1)
187-188: Echo chat path uses default runtime_config; consider threading actual config when availableFor a pure echo engine this is okay. If you ever use EchoEngineFull to validate reasoning parser selection end-to-end, prefer passing the model’s runtime_config instead of a default so misconfigurations don’t get masked.
Potential follow-ups:
- Thread ModelRuntimeConfig via PreprocessedRequest/ExecutionContext, and use it here if present; otherwise fall back to default.
- Add a simple unit test that asserts the passed
reasoning_parsername influences the delta selection path.Want me to draft a small patch to plumb runtime_config through the context for the echo path?
lib/llm/tests/http-service.rs (1)
102-104: Updated generator call matches the new API; consider exercising a non-default parser in at least one test.The switch to request.response_generator(runtime_config::ModelRuntimeConfig::default()) is correct. To guard against regressions in runtime parser selection, add a test that passes a non-default reasoning_parser (e.g., "gpt_oss" or "deepseek_r1") and asserts the stream is produced successfully (even a smoke test is fine).
You could do this inline as a one-off in CounterEngine’s generate for a dedicated test:
- let mut generator = - request.response_generator(runtime_config::ModelRuntimeConfig::default()); + let mut generator = request.response_generator(runtime_config::ModelRuntimeConfig { + reasoning_parser: Some("gpt_oss".to_string()), + ..Default::default() + });lib/llm/src/protocols/openai/chat_completions/delta.rs (2)
103-110: Use runtime-configured parser; update stale comments and centralize the default.The logic is correct. The surrounding comments still say “hardcoded” and point to a TODO that is no longer accurate. Recommend refreshing the comment and extracting the "basic" fallback into a module-level constant for clarity.
Apply within this block:
- // Reasoning parser type - // This is hardcoded for now, but can be made configurable later. - // TODO: Make parser type configurable once front-end integration is determined - // Change to GptOss to test GptOSS parser - // Reasoning parser wrapper - let reasoning_parser = ReasoningParserType::get_reasoning_parser_from_name( + // Select reasoning parser from runtime config; fall back to the Basic parser on missing/unknown values. + let reasoning_parser = ReasoningParserType::get_reasoning_parser_from_name( options .runtime_config .reasoning_parser .as_deref() .unwrap_or("basic"), );Additionally, consider adding at the top of this module (outside the changed block):
const DEFAULT_REASONING_PARSER: &str = "basic";and use unwrap_or(DEFAULT_REASONING_PARSER).
335-352: Reasoning parse invoked only when text/token_ids present — good check; consider explicit disable knob.The presence check prevents unnecessary parser work. A future nicety: support disabling reasoning parsing via a sentinel value (e.g., reasoning_parser="none") to fully bypass parsing when desired.
Happy to wire the "none" option through get_reasoning_parser_from_name and runtime_config if you want it in this PR or a follow-up.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
lib/bindings/python/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
components/backends/vllm/src/dynamo/vllm/args.py(1 hunks)lib/llm/src/engines.rs(2 hunks)lib/llm/src/local_model.rs(2 hunks)lib/llm/src/model_card.rs(4 hunks)lib/llm/src/preprocessor.rs(4 hunks)lib/llm/src/protocols/openai/chat_completions/delta.rs(4 hunks)lib/llm/tests/http-service.rs(2 hunks)lib/parsers/src/reasoning/mod.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-22T19:55:41.592Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.592Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.
Applied to files:
lib/llm/src/engines.rslib/llm/src/protocols/openai/chat_completions/delta.rs
📚 Learning: 2025-08-22T19:55:41.592Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.592Z
Learning: The create_choice method exists on multiple different objects in the codebase. The DeltaGenerator::create_choice in lib/llm/src/protocols/openai/chat_completions/delta.rs has its own signature that was updated to include reasoning_content, but other objects in lib/llm/src/engines.rs have their own separate create_choice methods with different signatures that are not related to chat completions.
Applied to files:
lib/llm/src/protocols/openai/chat_completions/delta.rs
🧬 Code graph analysis (3)
lib/llm/src/protocols/openai/chat_completions/delta.rs (4)
lib/llm/src/protocols/openai/completions/delta.rs (1)
response_generator(22-29)lib/bindings/python/src/dynamo/_core.pyi (1)
ModelRuntimeConfig(463-467)lib/bindings/python/rust/llm/local_model.rs (1)
reasoning_parser(76-78)lib/parsers/src/reasoning/mod.rs (1)
get_reasoning_parser_from_name(119-133)
lib/llm/src/preprocessor.rs (1)
lib/llm/src/protocols/openai/chat_completions/delta.rs (2)
new(80-123)response_generator(19-31)
lib/llm/tests/http-service.rs (1)
lib/bindings/python/src/dynamo/_core.pyi (1)
ModelRuntimeConfig(463-467)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: Build and Test - dynamo
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (10)
lib/llm/src/model_card.rs (1)
22-22: Import looks correctBrings ModelRuntimeConfig into scope for the new MDC field. No issues.
lib/llm/src/engines.rs (1)
17-18: Import of runtime_config module: LGTMKeeps the call site explicit and avoids naming collisions.
lib/llm/tests/http-service.rs (1)
28-42: Importing runtime_config for tests is appropriate and minimal.Bringing in local_model::runtime_config here is the right move to construct a default ModelRuntimeConfig in tests without reaching into production wiring.
lib/llm/src/preprocessor.rs (5)
25-25: Plumbing ModelRuntimeConfig into the preprocessor: good addition.Importing the type here keeps ownership localized and avoids leaking runtime concerns into callers.
98-99: Stateful preprocessor now carries runtime_config — correct scope.Storing ModelRuntimeConfig on OpenAIPreprocessor is appropriate since it’s part of deployment-time behavior rather than per-request user input.
133-134: Constructor wiring looks good.Passing runtime_config into self ensures downstream operators can use it without additional lookups.
502-503: Chat path correctly forwards runtime_config to the generator.This aligns with the intent of runtime-config-driven parser selection. Leaving the Completions path (non-chat) unchanged is also correct, since reasoning parsing is only relevant to chat completions (per prior learnings).
I’m referencing the retrieved learnings that reasoning parsing is only wired for chat completions and not for text completions; your changes match that design.
126-127: Review comment resolved: ModelDeploymentCard.runtime_config correctly defaultsVerified that in
lib/llm/src/model_card.rstheModelDeploymentCardstruct is annotated with#[serde(default)]on theruntime_configfield (around line 142) and itsDefaultimplementations initializeruntime_configviaModelRuntimeConfig::default()(around lines 448 and 490). Cloningmdc.runtime_configis therefore safe and older cards will deserialize cleanly.lib/llm/src/protocols/openai/chat_completions/delta.rs (2)
8-11: New import of runtime_config is appropriate.Keeps the type fully qualified without polluting the broader namespace.
41-43: ModelRuntimeConfig implements Default – no action requiredThe
ModelRuntimeConfigtype already derivesDefault(seelib/bindings/python/rust/llm/local_model.rswhere it’s annotated with#[derive(Clone, Default)]), so theDeltaGeneratorOptions::default()implementation remains sound with the new non-Optionruntime_configfield. Everything compiles and behaves as expected.
paulhendricks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
…vllm deployments (#2700) Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
…vllm deployments (#2700) Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>
…vllm deployments (#2700) Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
…vllm deployments (#2700) Signed-off-by: nnshah1 <neelays@nvidia.com>
…one deployment
Overview:
enable --dyn-reasoning-parser flag to set reasoning parser for any dynamo.vllm deployment
changes the signature of one function, updates ModelDeploymentCard to bring these changes over etcd
-- See the propagation of ModelRuntimeConfig
Summary by CodeRabbit
New Features
Documentation