-
Notifications
You must be signed in to change notification settings - Fork 688
refactor: Switch ModelManager locks from std::sync::Mutex to parking_lot::Mutex
#2696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Switch ModelManager locks from std::sync::Mutex to parking_lot::Mutex
#2696
Conversation
WalkthroughReplaced std::sync::Mutex with parking_lot::Mutex in ModelManager for entries and kv_choosers, updating lock calls accordingly. Added parking_lot = 0.12.4 to lib/llm/Cargo.toml. No new features; synchronization primitive and imports adjusted. No method signatures changed; internal field types updated. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
lib/llm/Cargo.toml (1)
67-67: Confirmed parking_lot v0.12.4 — no known advisories; optional workspace‐management & deadlock detectionVerified that
parking_lot0.12.4 is the latest published version on crates.io and there are no OSV advisories against it.Optional refinements to consider:
- File
lib/llm/Cargo.toml(line 67):
• Moveparking_lot = "0.12.4"into the workspace’s[workspace.dependencies]block, then reference it here asfor consistent versioning across crates.parking_lot = { workspace = true }- In this crate’s
Cargo.toml, enable thedeadlock_detectionfeature in debug builds—e.g.,to catch locking issues during development without impacting release‐build performance.[profile.dev.package.parking_lot] features = ["deadlock_detection"]lib/llm/src/discovery/model_manager.rs (2)
4-10: Mutex swap to parking_lot applied correctly; consider consistency with RwLockGood move to
parking_lot::Mutexand the simplified.lock()paths. For consistency and to avoid poisoning panics still present onstd::sync::RwLock, consider migrating the threeRwLocks toparking_lot::RwLockin a follow-up. This would also remove the remaining.unwrap()calls on.read()/.write().
250-257: Remove extra String clone in get_model_tool_call_parser
clone()on theOption<String>followed byto_string()performs two allocations. Use anas_ref().cloned()chain to clone only once.Apply this diff:
- self.entries - .lock() - .values() - .find(|entry| entry.name == model) - .and_then(|entry| entry.runtime_config.as_ref()) - .and_then(|config| config.tool_call_parser.clone()) - .map(|parser| parser.to_string()) + self.entries + .lock() + .values() + .find(|entry| entry.name == model) + .and_then(|entry| entry.runtime_config.as_ref()) + .and_then(|config| config.tool_call_parser.as_ref()) + .cloned()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
lib/llm/Cargo.toml(1 hunks)lib/llm/src/discovery/model_manager.rs(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - dynamo
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (5)
lib/llm/src/discovery/model_manager.rs (5)
41-44: Behavioral change: no lock poisoning with parking_lot::MutexWith
parking_lot::Mutex, locks won’t be poisoned on panic. If a panic can occur while mutatingentries/kv_choosers, state invariants won’t be guarded by poisoning. If you rely on that signal, document the trade-off or add targeted validation on next lock acquisition.
63-65: LGTM — concise, lock held for minimal scope
get_model_entriesclones while holding the mutex only for the iteration. Reasonable for the expected access pattern.
173-174: LGTM — lock usage updated cleanlyThe switch to
parking_lot::Mutexon save/remove/get paths is applied correctly and keeps the critical section tight.Also applies to: 178-179, 205-207
4-44: No std::sync::Mutex usage detected; migration complete
The scan confirms there are no lingeringstd::sync::Mutexreferences and theuse parking_lot::Mutex;import is present inlib/llm/src/discovery/model_manager.rs. Everything looks correctly migrated.
181-203: Prevent duplicate KV routers under concurrent callsI noticed a race condition: two or more tasks can call
kv_chooser_for(…)at the same time, both observe that no router exists formodel_name, and each invokecreate_kv_chooser, resulting in multipleKvRouterinstances (and etcd keys) for the same model. The current insertion increate_kv_chooserhappens unconditionally after creation, so it doesn’t guard against races.Minimal double-checked locking fix in
create_kv_chooser:--- a/lib/llm/src/discovery/model_manager.rs +++ b/lib/llm/src/discovery/model_manager.rs @@ pub async fn create_kv_chooser( - let new_kv_chooser = Arc::new(chooser); - self.kv_choosers - .lock() - .insert(model_name.to_string(), new_kv_chooser.clone()); - Ok(new_kv_chooser) + let new_kv_chooser = Arc::new(chooser); + // Double-check under lock to avoid racing inserts + let mut map = self.kv_choosers.lock(); + if let Some(existing) = map.get(model_name) { + // Another task won the race – use that instance + return Ok(existing.clone()); + } + map.insert(model_name.to_string(), new_kv_chooser.clone()); + Ok(new_kv_chooser)Longer-term, for stronger once-only initialization, consider switching from
Mutex<HashMap<…>>toDashMap<String, OnceCell<Arc<KvRouter>>>and usingget_or_try_initfor each key.To detect any duplicates in a live system, grep your logs for multiple
kv_create(or similar identifiers) for the samemodel_namewithin a short window.
…ng_lot::Mutex` (#2696) Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
…ng_lot::Mutex` (#2696) Signed-off-by: Jason Zhou <jasonzho@jasonzho-mlt.client.nvidia.com>
…ng_lot::Mutex` (#2696) Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
…ng_lot::Mutex` (#2696) Signed-off-by: nnshah1 <neelays@nvidia.com>
Overview:
This PR replaces all usage of
std::sync::Mutexwithin ModelManager withparking_lot::Mutex.Details:
Using this in favor of explicit error handling in #2678.
Motivation
std::sync::Mutex.std::sync::Mutex,parking_lot::Mutexdoes not require.unwrap()when acquiring a lock, removing the possibility of poisoned lock panics and simplifying code paths.Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Relates to:
unwrap()usage and add explicit error handling in ModelManager #2678Summary by CodeRabbit
No changes to public APIs or user workflows. Users may notice smoother performance and improved stability during concurrent operations.