Skip to content

Conversation

@jthomson04
Copy link
Contributor

@jthomson04 jthomson04 commented May 15, 2025

Summary by CodeRabbit

  • New Features

    • Introduced a global registry for managing block registrations across pools, enabling shared and reference-counted block lifetimes.
    • Added support for asynchronous block unregistration and automatic cleanup using an async runtime.
  • Enhancements

    • Block pools and related components now accept and propagate a global registry and async runtime handle for improved resource management.
    • Improved thread safety and concurrency in block registration and unregistration processes.
    • Updated block state handling to accommodate multi-field registration states across components.
    • Unified async runtime handle usage across block manager components for consistent initialization.
  • Bug Fixes

    • Updated internal state handling to support new multi-field registration states and ensure compatibility with the new registry system.
  • Documentation

    • Added detailed documentation explaining the new dual-layer block registry system and its operation.

Copy link
Contributor

@ryanolson ryanolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jthomson04

Please evaluate my concerns about the potential for multiple registrations under the None condition of existing handles across pools.

Also, since it took me a bit to grok the new Arc<()> usage, let's tick of one more documentation task and give registry.rs a proper top-level (//!) set of docstrings.

@jthomson04 jthomson04 force-pushed the jthomson04/restructure-registration-v2 branch from 48ab610 to d77ef79 Compare May 16, 2025 21:20
@jthomson04 jthomson04 requested a review from ryanolson May 16, 2025 21:20
@jthomson04 jthomson04 force-pushed the jthomson04/restructure-registration-v2 branch from d77ef79 to fb1a6c3 Compare May 20, 2025 00:06
@jthomson04
Copy link
Contributor Author

Blocked on #1134 for async runtime fixes

@jthomson04 jthomson04 force-pushed the jthomson04/restructure-registration-v2 branch from 4972eba to 4f2ab80 Compare May 22, 2025 17:30
@oandreeva-nv oandreeva-nv requested a review from ryanolson May 29, 2025 16:42
@jthomson04 jthomson04 force-pushed the jthomson04/restructure-registration-v2 branch from b90cee4 to fcedaa3 Compare May 29, 2025 18:39
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
lib/llm/src/block_manager/block/registry.rs (1)

161-199: Consider documenting lock acquisition order for thread safety.

The nested lock acquisition pattern acquires blocks first (line 161), then global_registry (line 177). While this appears safe in the current implementation, consider adding a comment documenting this ordering to prevent future deadlock issues if other code paths acquire these locks in different orders.

+        // Lock acquisition order: blocks -> global_registry (maintain this order to prevent deadlocks)
         let mut blocks = self.blocks.lock().unwrap();
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b90cee4 and fcedaa3.

📒 Files selected for processing (9)
  • lib/llm/src/block_manager/block.rs (3 hunks)
  • lib/llm/src/block_manager/block/registry.rs (5 hunks)
  • lib/llm/src/block_manager/block/state.rs (5 hunks)
  • lib/llm/src/block_manager/offload.rs (4 hunks)
  • lib/llm/src/block_manager/offload/pending.rs (1 hunks)
  • lib/llm/src/block_manager/pool.rs (7 hunks)
  • lib/llm/src/block_manager/pool/inactive.rs (9 hunks)
  • lib/llm/src/block_manager/pool/state.rs (6 hunks)
  • lib/llm/src/block_manager/state.rs (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • lib/llm/src/block_manager/offload/pending.rs
  • lib/llm/src/block_manager/block/state.rs
  • lib/llm/src/block_manager/offload.rs
  • lib/llm/src/block_manager/state.rs
  • lib/llm/src/block_manager/pool/state.rs
  • lib/llm/src/block_manager/pool/inactive.rs
  • lib/llm/src/block_manager/pool.rs
  • lib/llm/src/block_manager/block.rs
🧰 Additional context used
🧠 Learnings (1)
lib/llm/src/block_manager/block/registry.rs (1)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.894Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (5)
lib/llm/src/block_manager/block/registry.rs (5)

16-33: Excellent module documentation addressing previous feedback.

The comprehensive module-level documentation clearly explains the dual-layer registry design with global and per-pool components, the workflow, and the relationship between block handles and registration handles. This directly addresses the previous review comment requesting better documentation.


58-83: Well-designed BlockHandle for asynchronous cleanup.

The BlockHandle struct correctly implements the drop-based unregistration pattern. The use of mpsc::UnboundedSender for async notification ensures that the drop operation is non-blocking, and the let _ = pattern appropriately handles potential send failures when the receiver is dropped.


105-127: Background task implementation follows established pattern.

The detached async task for cleanup is correctly implemented and follows the established architectural pattern for this codebase. The task properly handles cleanup of expired weak references from both registries and will naturally terminate when the channel is closed.


175-197: Robust global registry logic with proper handle sharing.

The global registry logic correctly handles sharing of registration handles across pools. The use of weak references in the global registry and the upgrade check ensures that expired handles are properly detected. The creation of new registration handles only when needed optimizes memory usage.


202-205: Clean state transition with proper handle ownership.

The block state transition correctly packages both the registration handle and block handle into the Registered variant, maintaining proper ownership semantics for the dual-layer registry system.

@jthomson04 jthomson04 merged commit 3d40a69 into main May 29, 2025
10 checks passed
@jthomson04 jthomson04 deleted the jthomson04/restructure-registration-v2 branch May 29, 2025 19:38
@coderabbitai coderabbitai bot mentioned this pull request Aug 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants