-
Notifications
You must be signed in to change notification settings - Fork 690
chore: Remove service_name from ModelDeploymentCard #2349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis change consolidates and restructures the Model Deployment Card (MDC) system for LLMs by replacing the previous modular implementation with a comprehensive, single-file approach in Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ModelDeploymentCard
participant FileSystem
participant NATS
participant Tokenizer
User->>ModelDeploymentCard: load(config_path)
ModelDeploymentCard->>FileSystem: Read config, tokenizer, prompt files
ModelDeploymentCard->>Tokenizer: Initialize tokenizer (if present)
User->>ModelDeploymentCard: move_to_nats(nats_client)
ModelDeploymentCard->>NATS: Upload model files, update URIs
User->>ModelDeploymentCard: move_from_nats(nats_client)
ModelDeploymentCard->>NATS: Download model files
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
lib/runtime/src/slug.rs (1)
24-25: DerivingDefaultpermitsSlug("")– confirm that an empty slug is acceptable
#[derive(Default)]automatically createsimpl Default for Slug { fn default() -> Self { Slug(String::default()) } }, i.e. an empty string.
Nothing else in this type currently prevents""from propagating;TryFromvalidates characters but not length, andslugify_*always return a non-empty string. If downstream code relies onSlugbeing non-empty (e.g. for NATS subjects, filenames, URLs) this silent default could surface subtle runtime errors.If a non-empty value is required, consider a manual impl:
-#[derive(Serialize, Clone, Debug, Eq, PartialEq, Default)] +#[derive(Serialize, Clone, Debug, Eq, PartialEq)] + +impl Default for Slug { + fn default() -> Self { + // use a clearly-invalid token so problems explode early + Slug("_unset".into()) + } +}Otherwise, add explicit documentation that
Slug::default()produces an empty string and is valid in all contexts.lib/llm/src/model_card.rs (1)
748-759: Consider using standard library methods for capitalization.The current implementation works but could be simplified using Rust's built-in string manipulation methods.
fn capitalize(s: &str) -> String { - s.chars() - .enumerate() - .map(|(i, c)| { - if i == 0 { - c.to_uppercase().to_string() - } else { - c.to_lowercase().to_string() - } - }) - .collect() + let mut chars = s.chars(); + match chars.next() { + None => String::new(), + Some(first) => first.to_uppercase().collect::<String>() + &chars.as_str().to_lowercase(), + } }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
lib/llm/src/backend.rs(1 hunks)lib/llm/src/local_model.rs(1 hunks)lib/llm/src/migration.rs(1 hunks)lib/llm/src/model_card.rs(1 hunks)lib/llm/src/model_card/create.rs(0 hunks)lib/llm/src/model_card/model.rs(0 hunks)lib/llm/src/preprocessor.rs(1 hunks)lib/llm/src/preprocessor/prompt/template.rs(1 hunks)lib/llm/tests/backend.rs(1 hunks)lib/llm/tests/model_card.rs(1 hunks)lib/llm/tests/preprocessor.rs(1 hunks)lib/runtime/src/slug.rs(1 hunks)
💤 Files with no reviewable changes (2)
- lib/llm/src/model_card/create.rs
- lib/llm/src/model_card/model.rs
🧰 Additional context used
🧠 Learnings (10)
📚 Learning: in rust async code, when an arc> is used solely to transfer ownership of a resource (like a...
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.
Applied to files:
lib/llm/src/migration.rs
📚 Learning: in lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_so...
Learnt from: grahamking
PR: ai-dynamo/dynamo#1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.
Applied to files:
lib/llm/src/migration.rs
📚 Learning: in lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister ...
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Applied to files:
lib/llm/src/migration.rs
📚 Learning: in lib/llm/src/kv_router/scoring.rs, peabrane prefers panic-based early failure over result-based er...
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.
Applied to files:
lib/llm/src/migration.rs
📚 Learning: the codebase uses async-nats version 0.40, not the older nats crate. error handling should use async...
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
Applied to files:
lib/llm/src/migration.rs
📚 Learning: in the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the `...
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1429
File: lib/runtime/src/utils/leader_worker_barrier.rs:69-72
Timestamp: 2025-06-08T03:12:03.985Z
Learning: In the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the `wait_for_key_count` function correctly uses exact equality (`==`) instead of greater-than-or-equal (`>=`) because worker IDs must be unique (enforced by etcd create-only operations), ensuring exactly the expected number of workers can register.
Applied to files:
lib/llm/src/migration.rs
📚 Learning: in lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating lo...
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.
Applied to files:
lib/llm/src/migration.rs
📚 Learning: in lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is pl...
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.
Applied to files:
lib/llm/src/migration.rslib/llm/tests/preprocessor.rslib/llm/src/preprocessor/prompt/template.rslib/llm/src/preprocessor.rs
📚 Learning: the asyncenginecontextprovider trait in lib/runtime/src/engine.rs was intentionally changed from `se...
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from `Send + Sync + Debug` to `Send + Debug` because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.
Applied to files:
lib/runtime/src/slug.rs
📚 Learning: when using derive_builder::builder macro, the macro generates the builder struct and its methods, bu...
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/protocols.rs:85-112
Timestamp: 2025-06-16T20:02:54.935Z
Learning: When using derive_builder::Builder macro, the macro generates the builder struct and its methods, but does NOT generate a `builder()` method on the original struct. A manual `impl StructName { pub fn builder() -> StructNameBuilder { StructNameBuilder::default() } }` is required to provide the convenient `StructName::builder()` API pattern.
Applied to files:
lib/runtime/src/slug.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - vllm
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/bindings/python)
🔇 Additional comments (9)
lib/llm/src/migration.rs (1)
14-15: Updated import path aligns with new module structure – looks good
No further action needed.lib/llm/tests/model_card.rs (1)
4-4: Import simplification correct and compiles against refactored modulelib/llm/tests/preprocessor.rs (1)
6-7: Import path adjusted correctly aftermodel_cardflatteninglib/llm/src/preprocessor/prompt/template.rs (1)
9-10: Import update matches new public API – no issues spottedlib/llm/src/preprocessor.rs (1)
25-25: LGTM!The import path simplification correctly reflects the module consolidation, removing the intermediate
modelsegment as intended by the PR.lib/llm/src/backend.rs (1)
24-24: LGTM!Import path correctly updated to reflect the consolidated module structure.
lib/llm/tests/backend.rs (1)
5-5: LGTM!Test import path correctly updated to match the new module structure.
lib/llm/src/local_model.rs (2)
254-257: Good documentation addition!The doc comment clearly describes the method's purpose.
259-262: Implementation correctly updated to use slug from ModelDeploymentCard.The change aligns with the PR objective of removing
service_namefield and using a slugified version ofdisplay_nameinstead. The documentation clearly explains this is for use in NATS, etcd, etc.
fc5ac75 to
a7cbbe7
Compare
We had two names in the ModelDeploymentCard: display_name and service_name. They were either always identical, or service_name was a [slug](https://sentry.io/answers/slug-in-django/)-ified version of the display_name. Having two names is confusing, especially for #2267 . Now we only have `display_name`. The HTTP server gets to decide how to translate that to the `model` HTTP field (if at all), it's not in the card. In practice it still uses a slug. Most of the code change in here is because I merged `model_card/create.rs` and `model_card/model.rs`. It always confused me why the object and it's implementation should be in separate files (modules). It confused Rust too, we had to make fields `pub` just to access them. I suspect it was a pattern from a different language mis-applied to Rust. Is fixed.
Thank you Code Rabbit for noticing, and Gemini Pro 2.5 for the detailed perf comparison. What a beautiful future.
a7cbbe7 to
44f7889
Compare
| /// A slugified version of the model's name, for use in NATS, etcd, etc. | ||
| pub fn service_name(&self) -> &str { | ||
| &self.card.service_name | ||
| self.card.slug().as_ref() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for now ... would like this to be an entity descriptor in the future.
We had two names in the ModelDeploymentCard: display_name and service_name. They were either always identical, or service_name was a slug-ified version of the display_name.
Having two names is confusing, especially for #2267 .
Now we only have
display_name. The HTTP server gets to decide how to translate that to themodelHTTP field (if at all), it's not in the card. In practice it still uses a slug.Most of the code change in here is because I merged
model_card/create.rsandmodel_card/model.rs. It always confused me why the object and it's implementation should be in separate files (modules). It confused Rust too, we had to make fieldspubjust to access them. I suspect it was a pattern from a different language mis-applied to Rust. Is fixed.Summary by CodeRabbit
Documentation
Refactor
Slugtype to support default initialization.Style