Skip to content

Conversation

@grahamking
Copy link
Contributor

@grahamking grahamking commented Aug 11, 2025

The backend worker will need to know which model it is serving, at least for LoRA. Add it.

Summary by CodeRabbit

  • New Features
    • Ensured the selected model is explicitly carried through request processing for consistent behavior across chat and completion flows.
    • Improved compatibility for requests that specify a model, aligning preprocessing and execution with the chosen model.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 11, 2025

Walkthrough

Propagates a model identifier and configuration through the request/preprocessing pipeline and local model builder. Adds model to PreprocessedRequest, requires model() on OAIChatLikeRequest, implements it for OAI templates, sets model in preprocessing, and wires a new Flags.model_config into LocalModelBuilder.

Changes

Cohort / File(s) Summary
Preprocessor model plumbing
lib/llm/src/preprocessor.rs, lib/llm/src/protocols/common/preprocessor.rs
PreprocessedRequest gains pub model: String; preprocess_request now sets builder.model(request.model()).
OAI trait and template implementations
lib/llm/src/preprocessor/prompt.rs, lib/llm/src/preprocessor/prompt/template/oai.rs
OAIChatLikeRequest trait adds fn model(&self) -> String; NvCreateChatCompletionRequest and NvCreateCompletionRequest implement model() by returning inner.model.clone().
Launch flags and builder wiring
launch/dynamo-run/src/flags.rs, launch/dynamo-run/src/lib.rs
Flags adds pub model_config: Option; LocalModelBuilder chain updated to .model_config(flags.model_config.clone()).

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant OAIRequest as OAIChatLikeRequest
  participant Preprocessor
  participant Builder as PreprocessedRequestBuilder
  participant Engine

  Client->>OAIRequest: Submit request (includes model)
  OAIRequest->>Preprocessor: preprocess_request(request)
  Preprocessor->>Builder: builder.model(request.model())
  Preprocessor->>Builder: set token_ids, stops, annotations, ...
  Builder->>Engine: PreprocessedRequest { model, ... }
Loading
sequenceDiagram
  participant CLI as Flags
  participant LMB as LocalModelBuilder
  participant Runtime

  CLI->>LMB: model_name(...), model_config(flags.model_config)
  LMB->>Runtime: build()
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

A whisk of code, a hop so bold,
The model’s name now plainly told.
Flags in paw, configs in tow,
Preprocessors know where tokens go.
I thump with joy—no guess, no fuss—
The path of models runs through us. 🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
launch/dynamo-run/src/lib.rs (1)

38-38: LGTM: Wire model_config into LocalModelBuilder

Passing flags.model_config.clone() keeps build configuration cohesive with the new model propagation. If ModelConfig is large, consider wrapping in Arc to reduce cloning, but as an Option this is typically fine.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4385473 and f1d386c.

📒 Files selected for processing (5)
  • launch/dynamo-run/src/lib.rs (1 hunks)
  • lib/llm/src/preprocessor.rs (1 hunks)
  • lib/llm/src/preprocessor/prompt.rs (1 hunks)
  • lib/llm/src/preprocessor/prompt/template/oai.rs (2 hunks)
  • lib/llm/src/protocols/common/preprocessor.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-09T06:10:00.214Z
Learnt from: tzulingk
PR: ai-dynamo/dynamo#2389
File: components/metrics/src/main.rs:63-66
Timestamp: 2025-08-09T06:10:00.214Z
Learning: In components/metrics/src/main.rs, the Args struct's model_name field should be a required String (not Option<String>) because the metrics aggregator must be tied to a specific model when collecting metrics for a component/endpoint.

Applied to files:

  • lib/llm/src/preprocessor/prompt.rs
📚 Learning: 2025-06-24T20:59:35.725Z
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.

Applied to files:

  • lib/llm/src/preprocessor.rs
📚 Learning: 2025-06-16T20:02:54.935Z
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/protocols.rs:85-112
Timestamp: 2025-06-16T20:02:54.935Z
Learning: When using derive_builder::Builder macro, the macro generates the builder struct and its methods, but does NOT generate a `builder()` method on the original struct. A manual `impl StructName { pub fn builder() -> StructNameBuilder { StructNameBuilder::default() } }` is required to provide the convenient `StructName::builder()` API pattern.

Applied to files:

  • lib/llm/src/protocols/common/preprocessor.rs
🔇 Additional comments (6)
lib/llm/src/preprocessor/prompt.rs (1)

61-61: Make the model accessor ergonomic; future ergonomic improvements optional

All current implementors of OAIChatLikeRequest (in
lib/llm/src/preprocessor/prompt/template/oai.rs) define:

fn model(&self) -> String { … }

so the API is consistent and correct. Returning a String is fine given its usage today. If, down the road, this method becomes a performance hotspot, you can introduce a non-breaking follow-up to return &str or Cow<'_, str>, at the cost of adding lifetimes to the trait.

lib/llm/src/preprocessor.rs (2)

156-157: Propagate model early — good placement

Setting builder.model(request.model()) before branching on input type ensures the model is always present. This aligns with the PR objective and avoids accidental omissions for token/text paths.


156-157: All PreprocessedRequest builders include .model(...)

I searched for every PreprocessedRequest::builder() and PreprocessedRequestBuilder::default() invocation and confirmed:

  • The only PreprocessedRequest::builder() call lives in lib/llm/src/preprocessor.rs and immediately invokes builder.model(request.model()) (line 156).
  • The other builder site at line 308 also calls builder.model(request.inner.model.clone()).
  • No other construction paths of PreprocessedRequest were found without a .model(…) call.

Any missing .model(...) would fail to compile, so all required paths are covered.

— Consider adding unit tests to lock this in:
• For chat/completions, assert preprocess_request(...).model == request.inner.model
• For token/text inputs (both branches), assert model is populated.

lib/llm/src/protocols/common/preprocessor.rs (1)

14-16: Required model on PreprocessedRequest — good and consistent

Adding pub model: String enforces that every backend-bound request carries the model identifier. This aligns with prior learning that model fields should be required Strings to avoid ambiguity.

Note: Since derive_builder won’t supply a default for this field, all callsites must set it (compile-time enforced). Good choice.

lib/llm/src/preprocessor/prompt/template/oai.rs (2)

28-31: LGTM: model() for NvCreateChatCompletionRequest

Simple, correct pass-through via self.inner.model.clone(). Matches the trait expectation.


69-72: LGTM: model() for NvCreateCompletionRequest

Same correct pass-through. Consistent with chat implementation.

The backend worker will need to know which model it is serving, at least
for LoRA. Add it.
Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Will the LoRA model name variants be visible under /v1/models? Will backends call register_llm multiple times with each model name variant for each lora?

@grahamking grahamking merged commit c443528 into main Aug 11, 2025
12 of 13 checks passed
@grahamking grahamking deleted the gk-model-name-in-preproc branch August 11, 2025 19:07
@grahamking
Copy link
Contributor Author

LGTM. Will the LoRA model name variants be visible under /v1/models? Will backends call register_llm multiple times with each model name variant for each lora?

Yes they will appear as multiple models. From the front-end's point of view, they are different models served by the same endpoint.

I'm thinking of implementing it to allow , or register_llm(['model-id-1', 'model-id-2', 'model-id-3'], ... or python -m dynamo.frontend --model-name model-id-1,model-id-2,model-id-3 --static-endpoint x.y.z. NIM does not use etcd.

Dynamo will turn that into multiple model registrations in etcd, multiple model cards. There will be an optimization that the model cards share the NATS URL to the tokenizer, because that can be quite big so we don't want to upload it to NATS multiple times.

@rmccorm4 rmccorm4 changed the title fix(preprocessor): Populate model ID in PreprocessedRequest fix(preprocessor): Populate model ID in PreprocessedRequest (for LoRA support) Aug 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants