-
Notifications
You must be signed in to change notification settings - Fork 676
feat: KVBM <-> vLLM Integration #1735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ts if a slot does not exist for the request
Signed-off-by: Ziqi Fan <ziqif@nvidia.com> Co-authored-by: Ryan Olson <rolson@nvidia.com>
WalkthroughThis update introduces a major overhaul of the block manager system, adding distributed leader-worker block management, locality-aware block abstractions, new block layouts (notably Changes
Sequence Diagram(s)sequenceDiagram
participant Python as Python Client
participant PyBinding as Python Rust Binding
participant Rust as Rust Block Manager
participant Leader as Distributed Leader
participant Worker as Distributed Worker
Python->>PyBinding: Create KvbmCacheManager(request)
PyBinding->>Rust: Initialize BlockManager(worker_id, leader, ...)
Rust->>Leader: KvbmLeader.new(bytes_per_block, world_size)
Leader-->>Rust: Leader instance (with host/disk block info)
Rust->>Worker: KvbmWorker.new(config)
Worker-->>Rust: Worker instance (with device/host/disk blocks)
Rust-->>PyBinding: BlockManager ready
Python->>PyBinding: Allocate slots / Get computed blocks
PyBinding->>Rust: Request block allocation / retrieval
Rust->>Worker: (If needed) Transfer blocks via leader-worker protocol
Worker-->>Rust: Block(s) transferred/allocated
Rust-->>PyBinding: Return block info
PyBinding-->>Python: Return blocks / status
Possibly related PRs
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (3.3.7)lib/bindings/python/src/dynamo/llm/vllm_integration/__init__.pylib/bindings/python/src/dynamo/llm/__init__.pylib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_manager.py
🔧 Clippy (1.86.0)error: failed to get Caused by: Caused by: Caused by: Caused by: Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 66
🔭 Outside diff range comments (1)
lib/llm/src/block_manager/block/collections.rs (1)
1-2: Missing licence header & unused placeholder file – breaks CIThe file is completely empty and the copyright-check job is failing.
Either add the standard NVIDIA/Apache-2.0 header and some minimal stub (e.g. a//!module–level doc comment), or delete the file until it is needed.+// SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +//! Collection helpers for the block-manager. + +// (Intentionally left empty – serves as a future extension point.)
🧹 Nitpick comments (58)
lib/bindings/python/rust/llm/block_manager/distributed/utils.rs (1)
4-6: LGTM with minor optimization suggestion.The function correctly implements environment variable retrieval with a sensible fallback. Consider a small optimization to avoid unnecessary string allocation:
- std::env::var("DYNAMO_KVBM_BARRIER_ID").unwrap_or("kvbm".to_string()) + std::env::var("DYNAMO_KVBM_BARRIER_ID").unwrap_or_else(|| "kvbm".to_string())lib/bindings/python/rust/llm/block_manager/layer.rs (1)
91-94: Consider moving PrivateToken import to reduce duplication.The scoped imports of
PrivateTokenwithin each match arm work correctly but create duplication. Consider moving the import to the function level:+ use dynamo_llm::block_manager::block::private::PrivateToken; { let mut mutable_block = self.inner.lock().unwrap(); ptr = match &mut *mutable_block { block::BlockType::Pinned(block) => { - use dynamo_llm::block_manager::block::private::PrivateToken; let block_data = block.block_data_mut(PrivateToken); // ... } block::BlockType::Device(block) => { - use dynamo_llm::block_manager::block::private::PrivateToken; let block_data = block.block_data_mut(PrivateToken); // ... }Also applies to: 98-101
lib/llm/src/block_manager/layout/distributed.rs (1)
9-24: Consider the purpose of entirely commented-out code.The entire
DistributedConfigstruct is commented out, making this file non-functional. If this is work-in-progress, consider adding a TODO comment or removing the commented code until it's ready for implementation.Do you want me to help implement the
DistributedConfigstruct or should this commented code be removed for now?lib/bindings/python/rust/llm/block_manager/vllm/request.rs (1)
37-42: Consider reducing debug logging verbosity for production.The debug logging on lines 37 and 42 logs potentially sensitive salt data and computed hashes. Consider using trace-level logging instead or adding conditional compilation for debug builds.
- tracing::debug!("salt: {:?}", salt); + tracing::trace!("salt: {:?}", salt); let salt_bytes = serde_json::to_vec(&salt).unwrap(); let salt_hash = compute_hash_v2(&salt_bytes, 0); - tracing::debug!("salt_hash: {:?}", salt_hash); + tracing::trace!("salt_hash: {:?}", salt_hash);lib/llm/src/block_manager/block/data/logical/lw_sharded.rs (1)
97-97: Complete the implementation or document the incomplete state.The function ends with
unimplemented!(), making it non-functional. Either complete the implementation to create and return the sharded factories, or add documentation explaining this is a work-in-progress.Do you want me to help implement the missing factory creation logic, or should this function be marked as a TODO/WIP until ready for completion?
lib/llm/src/block_manager/block/transfer_next/nixl.rs (1)
150-154: Consider propagating transfer status errors.Currently, errors during transfer status polling are only logged and the loop exits silently. Consider propagating these errors to the caller for better error handling.
- Err(e) => { - tracing::error!("Error getting transfer status: {}", e); - break; - } + Err(e) => { + return Err(e); + }Note: This would require changing the Future's Output type from
()toResult<()>.lib/bindings/python/tests/test_kvbm.py (1)
101-130: Consider making block ID assertions more flexible.The hardcoded block IDs assume a specific deterministic allocation order. If the allocation strategy changes in the future, these tests might break unnecessarily.
Consider verifying the structure and uniqueness of block IDs rather than exact values:
# Instead of: assert block_ids[0] == [0, 1] # Consider: assert len(block_ids[0]) == 2 assert all(isinstance(id, int) for id in block_ids[0]) assert len(set(block_ids[0])) == 2 # All IDs are uniquelib/bindings/python/rust/llm/block_manager/distributed/worker.rs (1)
55-75: Consider performance optimization for trait method implementations.The
TorchTensortrait implementations clone vectors on every call, which could be expensive for frequently accessed metadata like shape and stride.Consider returning references instead of cloning:
impl TorchTensor for VllmTensor { fn device(&self) -> TorchDevice { self.device.clone() } - fn shape(&self) -> Vec<usize> { - self.shape.clone() - } + fn shape(&self) -> &[usize] { + &self.shape + } - fn stride(&self) -> Vec<usize> { - self.stride.clone() - } + fn stride(&self) -> &[usize] { + &self.stride + } }lib/llm/src/block_manager/distributed/utils.rs (1)
10-22: Enhance documentation for tuple field semantics.The
blocksfield usesVec<(usize, usize)>but the tuple field meanings aren't documented.#[derive(Serialize, Deserialize, Debug, Getters, Clone)] pub struct BlockTransferRequest { from_pool: BlockTransferPool, to_pool: BlockTransferPool, + /// Vector of (block_index, size_bytes) tuples representing blocks to transfer blocks: Vec<(usize, usize)>, }lib/llm/src/block_manager/block/factory/logical.rs (1)
7-16: Update deprecation comment for consistency.The comment mentions "LocalBlockData" but this is a
LogicalBlockFactory. The deprecation notice should be clarified or removed if this is the current implementation.-/// Factory for creating LocalBlockData (DEPRECATED - use LocalBlockFactory instead) +/// Factory for creating LogicalBlockData with specified resources #[derive(Debug)] pub struct LogicalBlockFactory<S: Storage, R: LogicalResources> {lib/llm/src/block_manager/block/transfer_v2/executors/cuda.rs (1)
34-68: Track TODO implementation completion.The CUDA transfer implementation is currently placeholder code. This needs to be completed before the distributed block manager can handle GPU transfers effectively.
Key areas that need implementation:
- CUDA stream management from TransferContext
- Memory view acquisition from blocks
- Appropriate cuDnnMemcpy/cuDnnMemcpyAsync calls
- Synchronization handling for async operations
Would you like me to create an issue to track the completion of this CUDA implementation?
lib/llm/src/block_manager/block/transfer_next/memcpy.rs (1)
27-28: Document the private token pattern used for access control.The use of
private::PrivateTokenfor accessing block data is an interesting pattern but lacks documentation. Consider adding a comment explaining why this pattern is used and what security/encapsulation benefits it provides..devcontainer/devcontainer.json (1)
19-27: Clean up commented configuration.Remove the commented image references and build args if they're no longer needed. Keeping commented code creates confusion about which configuration is actually used.
Apply this diff to clean up:
- //"image": "dynamo:latest-vllm-local-dev", // Use the latest VLLM local dev image - // "image": "nixl:v0.1.0.dev.e0b34a3", "build": { - "dockerfile": "Dockerfile", - // "args": { - // "BUILDKIT_INLINE_CACHE": "1", - // "SOME_ARG": "value" - // } + "dockerfile": "Dockerfile" },lib/llm/src/block_manager/distributed/README.md (1)
164-164: Add newline at end of file.Add a newline character at the end of the file to follow Unix text file conventions.
lib/llm/src/block_manager/block/factory/local.rs (2)
7-7: Document the purpose of the Dissolve derive macro.The
Dissolvederive macro is used without explanation. Consider adding documentation about what this macro does and why it's needed.
53-53: Add newline at end of file.Add a newline character at the end of the file to follow Rust conventions.
lib/llm/src/block_manager/layout/utils.rs (2)
102-102: Add newline at end of file.Add a newline character at the end of the file to follow Rust conventions.
22-31: Add must_use attribute to validation function.The
validate_power_of_2function returns a Result that should not be ignored. Add the#[must_use]attribute to ensure callers handle the validation result.Apply this diff:
/// Validation function for Option<usize> to check if it's Some(power_of_2). +#[must_use] pub fn validate_power_of_2(alignment: usize) -> Result<(), ValidationError> {lib/llm/src/block_manager/distributed/transfer.rs (1)
35-35: Consider architectural improvements for block pool management.The TODO comment indicates uncertainty about the current approach. The workaround pattern of extracting and cloning block data might introduce unnecessary overhead.
Would you like me to suggest alternative architectures that could avoid the need for downcasting and cloning?
lib/llm/src/block_manager/block/transfer_next/context.rs (1)
28-28: Simplify the nixl_agent field type.The current type
Arc<Option<NixlAgent>>creates unnecessary double indirection. Since the Arc is cloned when needed,Option<Arc<NixlAgent>>would be more idiomatic.- nixl_agent: Arc<Option<NixlAgent>>, + nixl_agent: Option<Arc<NixlAgent>>,And update the constructor:
- nixl_agent: Arc<Option<NixlAgent>>, + nixl_agent: Option<Arc<NixlAgent>>,Also applies to: 38-39
lib/llm/src/block_manager/distributed/worker_test.rs (1)
71-71: Handle task cleanup more gracefully.Using
abort()on tasks that might have already completed is harmless but not ideal. The timeout approach in line 71 is better.Consider using a consistent pattern across all tests:
// Instead of response_task.abort(); let _ = tokio::time::timeout(Duration::from_millis(100), response_task).await;Also applies to: 235-235, 303-303
lib/llm/src/block_manager/block/data/logical/distributed_leader_worker.rs (2)
33-33: Improve error context for debugging.The error message is generic and doesn't provide enough context for debugging failures.
- .map_err(|e| anyhow::anyhow!("Failed to create DistributedLeaderWorkerResources: {}", e))?.detach(); + .map_err(|e| anyhow::anyhow!("Failed to spawn DistributedLeaderWorkerResources worker task: {}", e))?.detach();
80-81: Address the TODO about the unused transfer context parameter.The
_ctxparameter is marked as unused with a TODO comment, indicating incomplete implementation.Would you like me to create an issue to track the proper usage of the transfer context in the distributed locality implementation?
lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_utils.py (2)
15-16: Remove commented out code.The logger import is commented out without explanation. If logging is not needed, remove these lines entirely.
-# from vllm.logger import init_logger -# logger = init_logger(__name__)
75-75: Simplify list creation.The list comprehension is unnecessary when converting an iterable to a list.
- block_id=block.block_id, tokens=[t for t in block_hash.tokens_ids] + block_id=block.block_id, tokens=list(block_hash.tokens_ids)lib/llm/src/block_manager/block/transfer_v2/executors/nixl.rs (1)
194-209: Complete the NIXL transfer implementation.Both read and write operations are placeholders that return errors. The TODO comments provide good guidance on what needs to be implemented.
Would you like me to:
- Create issues to track the implementation of NIXL read and write operations?
- Generate a basic implementation skeleton based on the NIXL API patterns used elsewhere in the codebase?
The implementation would involve:
- For reads: Creating active message requests to pull data from remote memory
- For writes: Creating active message requests to push data to remote memory
- Proper error handling and retry logic
- Integration with the transfer context for async completion notification
Also applies to: 213-228
lib/llm/src/block_manager/block/transfer_v2/executors.rs (1)
83-88: Consider compile-time type checking for better performance.While the runtime
TypeIdcomparison works correctly, consider using trait-based compile-time type checking when possible, as it would eliminate runtime overhead.For example, you could add associated constants or marker traits to
LocalityProviderto enable compile-time locality detection in some cases.lib/llm/src/block_manager/distributed/leader.rs (1)
61-61: Address the TODO comment for worker data usage.The
_worker_datafield is marked as unused with an underscore prefix and has a TODO comment indicating it should storeKvbmLeaderDatainstead of unit type. This suggests incomplete implementation that should be addressed.Would you like me to help implement the proper worker data storage and usage, or open an issue to track this task?
lib/llm/src/block_manager/block/transfer_v2/context.rs (1)
94-98: Consider propagating CUDA synchronization errors to the callerWhen CUDA event synchronization fails, the error is only logged and the oneshot sender is notified as if the operation succeeded. This could mask underlying CUDA issues from the caller.
Consider sending an error result through the channel:
- mpsc::UnboundedSender<(CudaEvent, oneshot::Sender<()>)>, - mpsc::UnboundedReceiver<(CudaEvent, oneshot::Sender<()>)>, + mpsc::UnboundedSender<(CudaEvent, oneshot::Sender<Result<(), String>>)>, + mpsc::UnboundedReceiver<(CudaEvent, oneshot::Sender<Result<(), String>>)>,And then:
- if let Err(e) = event.synchronize() { - tracing::error!("Failed to synchronize CUDA event: {:?}", e); - } - let _ = tx.send(()); + let result = event.synchronize() + .map_err(|e| format!("Failed to synchronize CUDA event: {:?}", e)); + let _ = tx.send(result);lib/llm/src/block_manager/block/locality.rs (2)
46-60: Consider makinghandle_transfera required method without default implementationThe default implementation panics, which could lead to runtime failures if implementers forget to override this method. Since this appears to be a core method of the trait, consider removing the default implementation to enforce compile-time implementation requirements.
fn handle_transfer<RB, WB>( - _sources: &[RB], - _targets: &mut [WB], - _notify: bool, - _ctx: Arc<TransferContext>, - ) -> Result<Option<oneshot::Receiver<()>>, TransferError> + sources: &[RB], + targets: &mut [WB], + notify: bool, + ctx: Arc<TransferContext>, + ) -> Result<Option<oneshot::Receiver<()>>, TransferError> where RB: ReadableBlock + WriteToStrategy<WB> + storage::Local, <RB as StorageTypeProvider>::StorageType: NixlDescriptor, <WB as StorageTypeProvider>::StorageType: NixlDescriptor, RB: BlockDataProvider<Locality = Self>, - WB: WritableBlock + BlockDataProviderMut<Locality = Self>, - { - panic!("Transfers are not supported for this locality provider"); - } + WB: WritableBlock + BlockDataProviderMut<Locality = Self>;
104-379: Consider removing large sections of commented codeThe file contains extensive commented-out code (lines 104-379) which appears to be either work-in-progress or deprecated implementations. This reduces code readability and maintainability.
Consider either:
- Removing the commented code if it's no longer needed
- Moving it to a separate file or documentation if it's reference material
- Using version control history to preserve old implementations instead of comments
lib/llm/src/block_manager/block_v2.rs (2)
86-89: Fix typo in field nameThere's a typo in the field name: "paralleism" should be "parallelism".
pub struct LogicalData<S: StorageKind, P: ParallelKind> { block_id: BlockId, - paralleism: P, + parallelism: P, }
97-110: Remove or complete the commented WriteTo implementationThe commented-out
WriteToimplementation appears to be a stub that creates a channel but doesn't perform actual transfer logic.Consider either:
- Removing this commented code if it's not immediately needed
- Adding a TODO comment explaining what needs to be implemented
- Completing the implementation
Would you like me to help implement the transfer logic for
LogicalData?lib/llm/src/block_manager/block/data.rs (1)
113-115: Remove duplicateLocalityassociated typeThe
BlockDataProviderMuttrait already inheritsLocalityfromBlockDataProvider, so redefining it here is redundant and could cause confusion.pub trait BlockDataProviderMut: BlockDataProvider { - type Locality: LocalityProvider; - fn block_data_mut(&mut self) -> &mut impl BlockDataExt<Self::StorageType>; }lib/bindings/python/rust/llm/block_manager.rs (1)
24-28: Remove commented-out code.These commented module imports should be removed if they're no longer needed. Keeping dead code can cause confusion.
-// mod block; -// mod block_list; -// mod dlpack; -// mod layer; -lib/bindings/python/rust/llm/block_manager/vllm.rs (1)
178-188: Track unimplemented methods.The
reset_prefix_cacheandget_num_common_prefix_blocksmethods are not implemented. Consider adding TODO comments with implementation plans or tracking these in issues.Would you like me to create tracking issues for these unimplemented methods?
lib/bindings/python/rust/llm/block_manager/vllm/block_list.rs (1)
31-86: Consider reducing enum variants using generics.The
BlockListTypeenum has 6 variants that follow a pattern. Consider using a generic approach to reduce code duplication.You could potentially reduce this to a more generic structure, though the current approach is explicit and clear for Python bindings.
lib/bindings/python/rust/llm/block_manager/vllm/slot.rs (1)
11-11: Fix typo in comment.- /// The number of tokens that were ini + /// The number of tokens that were initiallylib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_manager.py (2)
58-59: Address the FIXME for conditional prefix cache stats.The current implementation always creates PrefixCacheStats when log_stats is True, but the FIXME suggests this should be conditional.
Would you like me to implement the conditional logic for prefix cache stats initialization?
407-466: Implement the KV connector protocol methods.Several methods are currently no-op stubs that need implementation:
start_load_kv: Should start async loading of KV cachewait_for_layer_load: Should block until layer is loadedsave_kv_layer: Should start saving KV layerwait_for_save: Should block until save completesThese methods are part of the KV connector protocol and their current no-op implementation may cause issues.
Would you like me to help implement these KV connector protocol methods or create tracking issues for them?
lib/llm/src/block_manager.rs (1)
315-316: Track the ignored test for NIXL partial metadata support.This test is ignored because NIXL doesn't support partial metadata in the Rust bindings. This limitation should be tracked and the test enabled once support is added.
Would you like me to create a tracking issue for enabling this test once NIXL supports partial metadata in Rust bindings?
lib/llm/src/block_manager/distributed/zmq.rs (1)
396-399: Consider implementing dynamic handler registration.The TODO comment suggests making handler registration dynamic. This would allow adding/removing handlers at runtime, which could be useful for plugin-like architectures.
Would you like me to implement dynamic handler registration with thread-safe add/remove operations?
lib/llm/src/block_manager/block/transfer_v2.rs (1)
186-194: Optimize notification handling to avoid unnecessary channel creation.Creating a oneshot channel just to immediately send on it is wasteful when notify is true.
- // Handle notification - if notify { - let (tx, rx) = oneshot::channel(); - let _ = tx.send(()); - Ok(Some(rx)) - } else { - Ok(None) - } + // Handle notification + Ok(if notify { + let (tx, rx) = oneshot::channel(); + // Schedule the notification to be sent after the transfer completes + // This is a placeholder - actual implementation would depend on UniversalCoordinator + tokio::spawn(async move { + // Wait for transfer completion + let _ = tx.send(()); + }); + Some(rx) + } else { + None + })Note: The current implementation appears to send the notification immediately rather than after the transfer completes, which seems incorrect.
lib/llm/src/block_manager/block/transfer_next.rs (1)
201-211: Clarify the behavior of dropping futures when notify is false.The code drops the transfer future when notification is not needed, which relies on the specific behavior documented in the learnings. Consider adding a comment to explain this.
if notify { ctx.async_rt_handle().spawn(async move { transfer_fut.await; tx.send(()).unwrap(); }); Ok(Some(rx)) } else { + // Drop the future - for NIXL transfers, the operation has already started + // and will complete independently (see PR #1363 learning) Ok(None) }lib/llm/src/block_manager/distributed/active_message.rs (1)
212-215: Replace dummy receiver hack with Option type.Creating a dummy receiver is a code smell. Consider using Option instead.
- // Move the message receiver out of self - let mut message_receiver = std::mem::replace( - &mut self.message_receiver, - mpsc::unbounded_channel().1, // Dummy receiver - ); + // Use Option to properly handle receiver ownership + let mut message_receiver = self.message_receiver.take() + .ok_or_else(|| anyhow::anyhow!("Driver already started"))?;This would require changing the field type to
Option<mpsc::UnboundedReceiver<IncomingActiveMessage>>.lib/llm/src/block_manager/layout.rs (2)
789-796: Consider returning slices instead of collecting into vectors.Creating new vectors on each call to
storage()andstorage_mut()is inefficient for read operations.- fn storage(&self) -> Vec<&Self::StorageType> { - self.storages.iter().collect() - } - - fn storage_mut(&mut self) -> Vec<&mut Self::StorageType> { - self.storages.iter_mut().collect() - } + fn storage(&self) -> &[Self::StorageType] { + &self.storages + } + + fn storage_mut(&mut self) -> &mut [Self::StorageType] { + &mut self.storages + }Note: This would require changing the trait definition to return slices instead of Vecs.
231-235: Cache layout config to avoid repeated cloning.The layout_config() method clones the entire config on every call, which could be expensive for frequently accessed properties.
Consider either:
- Returning a reference:
fn layout_config(&self) -> &LayoutConfig- Caching individual values that are accessed frequently
- Using
Arc<LayoutConfig>to make cloning cheap- fn layout_config(&self) -> LayoutConfig { - self.inner.clone() - } + fn layout_config(&self) -> &LayoutConfig { + &self.inner + }Also applies to: 245-258, 371-373, 549-552, 627-630, 799-801
lib/llm/src/block_manager/block.rs (2)
59-62: Unused private moduleThe
privatemodule withPrivateTokenis defined but doesn't appear to be used in this file. Consider removing it if unused, or document its intended purpose if it's used in other modules.
349-354: Address TODO: validate num_blocks() removalThe
num_blocks()method always returns 1 and has a TODO comment suggesting it might be removable. This should be resolved before merging.Would you like me to help analyze the usage of this method across the codebase to determine if it can be safely removed?
lib/llm/src/block_manager/block_next.rs (4)
197-215: Consider refactoring similar error handling patterns.The
sequence_hashandparent_sequence_hashmethods have nearly identical match patterns and error handling. Consider extracting a helper method to reduce duplication.-pub fn sequence_hash(&self) -> Result<SequenceHash, BlockError> { - match self.state() { - BlockState::Complete(state) => Ok(state.token_block().sequence_hash()), - BlockState::Registered(state, _) => Ok(state.sequence_hash()), - _ => Err(BlockError::InvalidState( - "Block is not complete nor registered.".to_string(), - )), - } -} - -pub fn parent_sequence_hash(&self) -> Result<Option<SequenceHash>, BlockError> { - match self.state() { - BlockState::Complete(state) => Ok(state.token_block().parent_sequence_hash()), - BlockState::Registered(state, _) => Ok(state.parent_sequence_hash()), - _ => Err(BlockError::InvalidState( - "Block is not complete nor registered.".to_string(), - )), - } -} +fn ensure_complete_or_registered(&self) -> Result<(), BlockError> { + match self.state() { + BlockState::Complete(_) | BlockState::Registered(_, _) => Ok(()), + _ => Err(BlockError::InvalidState( + "Block is not complete nor registered.".to_string(), + )), + } +} + +pub fn sequence_hash(&self) -> Result<SequenceHash, BlockError> { + self.ensure_complete_or_registered()?; + match self.state() { + BlockState::Complete(state) => Ok(state.token_block().sequence_hash()), + BlockState::Registered(state, _) => Ok(state.sequence_hash()), + _ => unreachable!(), + } +} + +pub fn parent_sequence_hash(&self) -> Result<Option<SequenceHash>, BlockError> { + self.ensure_complete_or_registered()?; + match self.state() { + BlockState::Complete(state) => Ok(state.token_block().parent_sequence_hash()), + BlockState::Registered(state, _) => Ok(state.parent_sequence_hash()), + _ => unreachable!(), + } +}
946-952: Improve error messages for failed mutable operations on immutable blocks.The error messages could be more descriptive to help users understand why the operation failed.
-fn layer_view_mut(&mut self, _: usize, _: usize) -> BlockResult<view::LayerViewMut<S>> { - // This should never be called since ImmutableBlock is immutable, - // but we need to implement the full trait - Err(BlockError::InvalidState( - "Cannot get mutable layer view from immutable block".to_string(), - )) -} +fn layer_view_mut(&mut self, layer_idx: usize, outer_idx: usize) -> BlockResult<view::LayerViewMut<S>> { + // This should never be called since ImmutableBlock is immutable, + // but we need to implement the full trait + Err(BlockError::InvalidState( + format!("Cannot get mutable layer view for layer {} outer {} from immutable block - use a MutableBlock instead", layer_idx, outer_idx), + )) +} -fn block_view_mut(&mut self) -> BlockResult<view::BlockViewMut<S>> { - // This should never be called since ImmutableBlock is immutable, - // but we need to implement the full trait - Err(BlockError::InvalidState( - "Cannot get mutable block view from immutable block".to_string(), - )) -} +fn block_view_mut(&mut self) -> BlockResult<view::BlockViewMut<S>> { + // This should never be called since ImmutableBlock is immutable, + // but we need to implement the full trait + Err(BlockError::InvalidState( + "Cannot get mutable block view from immutable block - use a MutableBlock instead".to_string(), + )) +}Also applies to: 958-964
1507-1510: Address TODO: Consider storing MemType explicitly.The TODO comment suggests that MemType might need to be stored explicitly if it cannot be reliably derived from block_set_idx. This could be important for correctness.
Would you like me to help implement explicit MemType storage in BlockDescriptorList or create an issue to track this enhancement?
1-2071: Consider splitting this large file into smaller modules.At 2071 lines, this file contains several distinct components that could be organized into separate modules for better maintainability:
- Core block abstractions and traits
- Mutable/Immutable block wrappers
- NIXL integration (already in a module, but could be a separate file)
- Test utilities and fixtures
This would make the codebase more navigable and easier to maintain.
Suggested structure:
block_next/ ├── mod.rs // Re-exports and core traits ├── core.rs // Block, BlockData, BlockExt ├── metadata.rs // BlockMetadata, BasicMetadata ├── mutable.rs // MutableBlock implementation ├── immutable.rs // ImmutableBlock implementation ├── collection.rs // Blocks collection └── nixl.rs // NIXL integration (move from nested module)lib/llm/src/block_manager/distributed.rs (1)
129-176: Consider adding transfer validation.The test verifies transfers complete without errors but doesn't validate the actual transfer results or block states.
Consider adding assertions to verify blocks are in the expected pools after transfers:
// After device->host transfer // Verify blocks are accessible in host pool // After host->disk transfer // Verify blocks are accessible in disk pool // After disk->device transfer // Verify blocks are back in device poollib/llm/src/block_manager/block/transfer_v2/coordinators.rs (2)
109-120: CUDA transfer implementation is pending.The CUDA transfer strategies are currently unimplemented. This limits the coordinator's functionality for GPU transfers.
Would you like me to help implement the CUDA transfer executors or create an issue to track this work?
238-241: LogicalCoordinator implementation is incomplete.The coordinator currently delegates to UniversalCoordinator instead of implementing proper logical locality handling with RPC.
Would you like me to help implement the RPC-based transfer logic for the LogicalCoordinator or create an issue to track this TODO?
lib/llm/src/block_manager/distributed/worker.rs (2)
139-155: Improve error messages for unsupported layouts.The error messages could be more descriptive about what tensor shapes are expected.
- return Err(anyhow::anyhow!(format!( - "Unsupported kv cache layout. Got shape: {:?}", - shape - ))); + return Err(anyhow::anyhow!(format!( + "Unsupported kv cache layout. Got shape: {:?}. Expected shape with at least 3 dimensions where shape[0] or shape[1] >= num_device_blocks ({})", + shape, config.num_device_blocks + )));
326-328: Main worker loop is not yet implemented.The worker currently only waits for cancellation. The TODO indicates additional functionality is planned.
Would you like me to help implement the main worker loop logic or create an issue to track this TODO?
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.locklib/bindings/python/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (82)
.devcontainer/devcontainer.json(3 hunks)container/build.sh(1 hunks)dynamo.code-workspace(1 hunks)lib/bindings/python/Cargo.toml(3 hunks)lib/bindings/python/rust/lib.rs(1 hunks)lib/bindings/python/rust/llm.rs(1 hunks)lib/bindings/python/rust/llm/block_manager.rs(1 hunks)lib/bindings/python/rust/llm/block_manager/block.rs(8 hunks)lib/bindings/python/rust/llm/block_manager/block_list.rs(4 hunks)lib/bindings/python/rust/llm/block_manager/distributed.rs(1 hunks)lib/bindings/python/rust/llm/block_manager/distributed/leader.rs(1 hunks)lib/bindings/python/rust/llm/block_manager/distributed/utils.rs(1 hunks)lib/bindings/python/rust/llm/block_manager/distributed/worker.rs(1 hunks)lib/bindings/python/rust/llm/block_manager/dlpack.rs(1 hunks)lib/bindings/python/rust/llm/block_manager/layer.rs(3 hunks)lib/bindings/python/rust/llm/block_manager/vllm.rs(1 hunks)lib/bindings/python/rust/llm/block_manager/vllm/block_list.rs(1 hunks)lib/bindings/python/rust/llm/block_manager/vllm/request.rs(1 hunks)lib/bindings/python/rust/llm/block_manager/vllm/slot.rs(1 hunks)lib/bindings/python/rust/llm/block_manager/vllm/slot_manager_test_plan.md(1 hunks)lib/bindings/python/rust/llm/block_manager/vllm/slot_test_plan.md(1 hunks)lib/bindings/python/src/dynamo/_core.pyi(1 hunks)lib/bindings/python/src/dynamo/llm/__init__.py(1 hunks)lib/bindings/python/src/dynamo/llm/vllm_integration/__init__.py(1 hunks)lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_manager.py(1 hunks)lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_utils.py(1 hunks)lib/bindings/python/src/dynamo/llm/vllm_integration/rust.py(1 hunks)lib/bindings/python/tests/test_kvbm.py(1 hunks)lib/llm/Cargo.toml(4 hunks)lib/llm/src/block_manager.md(1 hunks)lib/llm/src/block_manager.rs(11 hunks)lib/llm/src/block_manager/block.rs(22 hunks)lib/llm/src/block_manager/block/collections.rs(1 hunks)lib/llm/src/block_manager/block/data.rs(1 hunks)lib/llm/src/block_manager/block/data/local.rs(1 hunks)lib/llm/src/block_manager/block/data/logical.rs(1 hunks)lib/llm/src/block_manager/block/data/logical/distributed_leader_worker.rs(1 hunks)lib/llm/src/block_manager/block/data/logical/lw_sharded.rs(1 hunks)lib/llm/src/block_manager/block/data/logical/null.rs(1 hunks)lib/llm/src/block_manager/block/data/view.rs(10 hunks)lib/llm/src/block_manager/block/factory.rs(1 hunks)lib/llm/src/block_manager/block/factory/local.rs(1 hunks)lib/llm/src/block_manager/block/factory/logical.rs(1 hunks)lib/llm/src/block_manager/block/locality.rs(1 hunks)lib/llm/src/block_manager/block/state.rs(1 hunks)lib/llm/src/block_manager/block/transfer.rs(7 hunks)lib/llm/src/block_manager/block/transfer/cuda.rs(2 hunks)lib/llm/src/block_manager/block/transfer/memcpy.rs(2 hunks)lib/llm/src/block_manager/block/transfer/nixl.rs(3 hunks)lib/llm/src/block_manager/block/transfer_next.rs(1 hunks)lib/llm/src/block_manager/block/transfer_next/context.rs(1 hunks)lib/llm/src/block_manager/block/transfer_next/cuda.rs(1 hunks)lib/llm/src/block_manager/block/transfer_next/memcpy.rs(1 hunks)lib/llm/src/block_manager/block/transfer_next/nixl.rs(1 hunks)lib/llm/src/block_manager/block/transfer_next/strategy.rs(1 hunks)lib/llm/src/block_manager/block/transfer_v2.rs(1 hunks)lib/llm/src/block_manager/block/transfer_v2/context.rs(1 hunks)lib/llm/src/block_manager/block/transfer_v2/coordinators.rs(1 hunks)lib/llm/src/block_manager/block/transfer_v2/error.rs(1 hunks)lib/llm/src/block_manager/block/transfer_v2/executors.rs(1 hunks)lib/llm/src/block_manager/block/transfer_v2/executors/cuda.rs(1 hunks)lib/llm/src/block_manager/block/transfer_v2/executors/memcpy.rs(1 hunks)lib/llm/src/block_manager/block/transfer_v2/executors/nixl.rs(1 hunks)lib/llm/src/block_manager/block/transfer_v2/macros.rs(1 hunks)lib/llm/src/block_manager/block/transfer_v2/strategy.rs(1 hunks)lib/llm/src/block_manager/block/transfer_v3.rs(1 hunks)lib/llm/src/block_manager/block_next.rs(1 hunks)lib/llm/src/block_manager/block_v2.rs(1 hunks)lib/llm/src/block_manager/config.rs(4 hunks)lib/llm/src/block_manager/distributed.rs(1 hunks)lib/llm/src/block_manager/distributed/README.md(1 hunks)lib/llm/src/block_manager/distributed/active_message.rs(1 hunks)lib/llm/src/block_manager/distributed/leader.rs(1 hunks)lib/llm/src/block_manager/distributed/transfer.rs(1 hunks)lib/llm/src/block_manager/distributed/utils.rs(1 hunks)lib/llm/src/block_manager/distributed/worker.rs(1 hunks)lib/llm/src/block_manager/distributed/worker_test.rs(1 hunks)lib/llm/src/block_manager/distributed/zmq.rs(1 hunks)lib/llm/src/block_manager/layout.rs(20 hunks)lib/llm/src/block_manager/layout/distributed.rs(1 hunks)lib/llm/src/block_manager/layout/nixl.rs(9 hunks)lib/llm/src/block_manager/layout/utils.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (72)
📓 Common learnings
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/block/collections.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/bindings/python/rust/lib.rs (3)
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/bindings/python/src/dynamo/llm/__init__.py (1)
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
lib/bindings/python/Cargo.toml (1)
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
dynamo.code-workspace (1)
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
lib/bindings/python/src/dynamo/llm/vllm_integration/__init__.py (1)
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
lib/bindings/python/rust/llm.rs (2)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/llm/src/block_manager/block/transfer/memcpy.rs (1)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/bindings/python/rust/llm/block_manager/block_list.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/bindings/python/rust/llm/block_manager/layer.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/bindings/python/rust/llm/block_manager/distributed/utils.rs (2)
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1429
File: lib/runtime/src/utils/leader_worker_barrier.rs:69-72
Timestamp: 2025-06-08T03:12:03.985Z
Learning: In the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the `wait_for_key_count` function correctly uses exact equality (`==`) instead of greater-than-or-equal (`>=`) because worker IDs must be unique (enforced by etcd create-only operations), ensuring exactly the expected number of workers can register.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/bindings/python/rust/llm/block_manager/distributed.rs (5)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1429
File: lib/runtime/src/utils/leader_worker_barrier.rs:69-72
Timestamp: 2025-06-08T03:12:03.985Z
Learning: In the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the `wait_for_key_count` function correctly uses exact equality (`==`) instead of greater-than-or-equal (`>=`) because worker IDs must be unique (enforced by etcd create-only operations), ensuring exactly the expected number of workers can register.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scheduler.rs:260-266
Timestamp: 2025-05-30T06:34:12.785Z
Learning: In the KV router scheduler code, PeaBrane prefers fail-fast behavior over silent failure handling. When accessing worker metrics data that could be out-of-bounds (like dp_rank indexing), explicit panics are preferred over graceful degradation with continue statements to ensure data integrity issues are caught early.
lib/llm/src/block_manager/block/state.rs (1)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/llm/src/block_manager/block/transfer/cuda.rs (3)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
lib/llm/Cargo.toml (1)
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
lib/llm/src/block_manager/block/data/logical/lw_sharded.rs (3)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1429
File: lib/runtime/src/utils/leader_worker_barrier.rs:69-72
Timestamp: 2025-06-08T03:12:03.985Z
Learning: In the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the `wait_for_key_count` function correctly uses exact equality (`==`) instead of greater-than-or-equal (`>=`) because worker IDs must be unique (enforced by etcd create-only operations), ensuring exactly the expected number of workers can register.
lib/bindings/python/rust/llm/block_manager/distributed/leader.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1429
File: lib/runtime/src/utils/leader_worker_barrier.rs:69-72
Timestamp: 2025-06-08T03:12:03.985Z
Learning: In the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the `wait_for_key_count` function correctly uses exact equality (`==`) instead of greater-than-or-equal (`>=`) because worker IDs must be unique (enforced by etcd create-only operations), ensuring exactly the expected number of workers can register.
lib/llm/src/block_manager/block/data/logical/null.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager.md (2)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/llm/src/block_manager/block/transfer_next/nixl.rs (3)
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/block/transfer_v3.rs (4)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
lib/llm/src/block_manager/block/transfer_v2/executors/memcpy.rs (1)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/llm/src/block_manager/block/transfer_v2/executors/cuda.rs (3)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
lib/llm/src/block_manager/config.rs (2)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/llm/src/block_manager/distributed/transfer.rs (4)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
lib/llm/src/block_manager/block/data/logical/distributed_leader_worker.rs (6)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1429
File: lib/runtime/src/utils/leader_worker_barrier.rs:69-72
Timestamp: 2025-06-08T03:12:03.985Z
Learning: In the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the `wait_for_key_count` function correctly uses exact equality (`==`) instead of greater-than-or-equal (`>=`) because worker IDs must be unique (enforced by etcd create-only operations), ensuring exactly the expected number of workers can register.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
lib/llm/src/block_manager/distributed/worker_test.rs (4)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.
lib/llm/src/block_manager/layout/utils.rs (1)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/llm/src/block_manager/layout/distributed.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/distributed/utils.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/block/factory/logical.rs (1)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/llm/src/block_manager/block/factory/local.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/distributed.rs (5)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1429
File: lib/runtime/src/utils/leader_worker_barrier.rs:69-72
Timestamp: 2025-06-08T03:12:03.985Z
Learning: In the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the `wait_for_key_count` function correctly uses exact equality (`==`) instead of greater-than-or-equal (`>=`) because worker IDs must be unique (enforced by etcd create-only operations), ensuring exactly the expected number of workers can register.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.
lib/llm/src/block_manager/block/transfer_next/cuda.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/bindings/python/rust/llm/block_manager/vllm/slot_test_plan.md (1)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/llm/src/block_manager/block/transfer_v2/executors/nixl.rs (4)
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
.devcontainer/devcontainer.json (1)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1302-1306
Timestamp: 2025-06-11T21:18:00.425Z
Learning: In the Dynamo operator, the project’s preferred security posture is to set a Pod-level `PodSecurityContext` with `runAsUser`, `runAsGroup`, and `fsGroup` all set to `1000`, and then selectively override the user at the individual container level (e.g., `RunAsUser: 0` for Kaniko) when root is required.
lib/llm/src/block_manager/block/transfer_v2/coordinators.rs (1)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/llm/src/block_manager/block/transfer/nixl.rs (4)
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/block/transfer_v2/macros.rs (3)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/protocols.rs:85-112
Timestamp: 2025-06-16T20:02:54.935Z
Learning: When using derive_builder::Builder macro, the macro generates the builder struct and its methods, but does NOT generate a `builder()` method on the original struct. A manual `impl StructName { pub fn builder() -> StructNameBuilder { StructNameBuilder::default() } }` is required to provide the convenient `StructName::builder()` API pattern.
lib/llm/src/block_manager/block/data/local.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/block/factory.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/block/data/view.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/block/transfer_v2/strategy.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
lib/llm/src/block_manager/block/transfer_next/context.rs (4)
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
lib/bindings/python/rust/llm/block_manager/block.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/block/transfer_v2/context.rs (3)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
lib/llm/src/block_manager/block/transfer_next/memcpy.rs (1)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/llm/src/block_manager/distributed/leader.rs (5)
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1429
File: lib/runtime/src/utils/leader_worker_barrier.rs:69-72
Timestamp: 2025-06-08T03:12:03.985Z
Learning: In the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the `wait_for_key_count` function correctly uses exact equality (`==`) instead of greater-than-or-equal (`>=`) because worker IDs must be unique (enforced by etcd create-only operations), ensuring exactly the expected number of workers can register.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
lib/llm/src/block_manager/block/transfer_v2/executors.rs (3)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/bindings/python/rust/llm/block_manager/vllm/block_list.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/block_v2.rs (3)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
lib/llm/src/block_manager/block/data.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/distributed/README.md (4)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
lib/llm/src/block_manager/distributed/worker.rs (6)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1429
File: lib/runtime/src/utils/leader_worker_barrier.rs:69-72
Timestamp: 2025-06-08T03:12:03.985Z
Learning: In the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the `wait_for_key_count` function correctly uses exact equality (`==`) instead of greater-than-or-equal (`>=`) because worker IDs must be unique (enforced by etcd create-only operations), ensuring exactly the expected number of workers can register.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
lib/llm/src/block_manager/layout/nixl.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/block/locality.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager.rs (6)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1429
File: lib/runtime/src/utils/leader_worker_barrier.rs:69-72
Timestamp: 2025-06-08T03:12:03.985Z
Learning: In the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the `wait_for_key_count` function correctly uses exact equality (`==`) instead of greater-than-or-equal (`>=`) because worker IDs must be unique (enforced by etcd create-only operations), ensuring exactly the expected number of workers can register.
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.
lib/bindings/python/rust/llm/block_manager.rs (2)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/llm/src/block_manager/block/transfer_v2.rs (3)
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/bindings/python/rust/llm/block_manager/vllm.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/block/transfer.rs (3)
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/bindings/python/rust/llm/block_manager/vllm/slot.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/distributed/zmq.rs (6)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1429
File: lib/runtime/src/utils/leader_worker_barrier.rs:69-72
Timestamp: 2025-06-08T03:12:03.985Z
Learning: In the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the `wait_for_key_count` function correctly uses exact equality (`==`) instead of greater-than-or-equal (`>=`) because worker IDs must be unique (enforced by etcd create-only operations), ensuring exactly the expected number of workers can register.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.
lib/llm/src/block_manager/block.rs (4)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
lib/llm/src/block_manager/block/transfer_next.rs (3)
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/block/transfer_next/strategy.rs (1)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/llm/src/block_manager/distributed/active_message.rs (4)
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
lib/llm/src/block_manager/block/data/logical.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/layout.rs (3)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/block_manager/block_next.rs (2)
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
🧬 Code Graph Analysis (15)
lib/bindings/python/src/dynamo/llm/__init__.py (1)
lib/bindings/python/rust/lib.rs (1)
_core(39-89)
lib/bindings/python/rust/llm.rs (1)
lib/bindings/python/tests/test_block_manager.py (1)
block_manager(53-54)
lib/bindings/python/rust/llm/block_manager/layer.rs (2)
lib/llm/src/block_manager/block.rs (5)
block(1183-1197)block_data(415-417)block_data(635-637)block_data(816-818)block_data(1283-1285)lib/bindings/python/rust/lib.rs (1)
to_pyerr(91-96)
lib/bindings/python/src/dynamo/llm/vllm_integration/rust.py (2)
lib/bindings/python/rust/llm/block_manager/vllm.rs (1)
_vllm_integration(38-47)lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_manager.py (1)
KvbmCacheManager(33-466)
lib/bindings/python/rust/llm/block_manager/vllm/request.rs (2)
lib/llm/src/tokens.rs (5)
tokens(350-352)tokens(448-450)compute_hash_v2(46-48)salt_hash(453-455)salt_hash(810-812)lib/bindings/python/src/dynamo/llm/vllm_integration/rust.py (1)
KvbmRequest(17-18)
lib/llm/src/block_manager/distributed/worker_test.rs (1)
lib/llm/src/block_manager/distributed/active_message.rs (6)
create(378-383)create_example_handlers(450-468)handle_message(410-410)handle_message(431-446)create_object_handler(395-404)create_handler(386-392)
lib/llm/src/block_manager/block/factory/local.rs (4)
lib/llm/src/block_manager/block/factory/logical.rs (4)
new(19-34)create_block_data(38-52)num_blocks(54-56)layout_config(58-60)lib/llm/src/block_manager/state.rs (3)
new(117-212)new(220-321)worker_id(91-93)lib/llm/src/block_manager/block/data/logical.rs (2)
new(52-69)worker_id(85-87)lib/llm/src/block_manager/block/factory.rs (3)
create_block_data(22-22)num_blocks(46-46)layout_config(49-49)
lib/llm/src/block_manager/distributed.rs (8)
lib/llm/src/block_manager/block/data/logical/distributed_leader_worker.rs (1)
worker(47-71)lib/llm/src/block_manager/storage.rs (4)
size(177-177)size(394-396)size(474-476)size(515-517)lib/llm/src/block_manager/distributed/leader.rs (2)
new(67-118)builder(49-51)lib/bindings/python/rust/llm/block_manager/distributed/worker.rs (9)
new(25-52)new(87-127)shape(68-70)device(28-28)device(31-31)device(56-58)data_ptr(60-62)size_bytes(64-66)stride(72-74)lib/llm/src/block_manager/distributed/worker.rs (6)
new(125-200)shape(157-157)Self(269-269)Self(283-283)Self(300-300)builder(103-105)lib/llm/src/block_manager/state.rs (3)
new(117-212)new(220-321)device(87-89)lib/llm/src/block_manager/distributed/utils.rs (1)
new(26-36)lib/llm/src/block_manager/pool/api.rs (1)
match_sequence_hashes(6-6)
lib/llm/src/block_manager/block/transfer_v2/executors/nixl.rs (4)
lib/llm/src/block_manager/block/transfer_v2/strategy.rs (13)
strategy(37-37)strategy(60-62)strategy(67-69)strategy(74-76)strategy(81-83)strategy(87-89)strategy(93-95)strategy(100-102)strategy(106-108)strategy(112-114)strategy(119-121)strategy(125-127)strategy(131-133)lib/llm/src/block_manager/block/transfer_v2/context.rs (1)
agent(139-141)lib/llm/src/block_manager/block.rs (3)
size(972-974)num_layers(361-363)num_outer_dims(377-379)lib/llm/src/block_manager/storage.rs (4)
size(177-177)size(394-396)size(474-476)size(515-517)
lib/llm/src/block_manager/block/transfer_v2/error.rs (1)
deploy/sdk/src/dynamo/sdk/cli/build.py (1)
InvalidArgument(69-72)
lib/llm/src/block_manager/block/data/view.rs (6)
lib/llm/src/block_manager/storage/disk.rs (2)
storage_type(90-92)fd(76-78)lib/llm/src/block_manager/storage.rs (4)
storage_type(171-171)storage_type(386-388)storage_type(466-468)storage_type(507-509)lib/llm/src/block_manager/storage/nixl.rs (6)
storage_type(266-268)device_id(302-304)device_id(326-328)device_id(351-353)device_id(376-378)device_id(400-402)lib/llm/src/block_manager/block/data.rs (1)
storage_type(26-26)lib/llm/src/block_manager/block/data/logical.rs (1)
storage_type(89-91)lib/llm/src/block_manager/block/data/local.rs (2)
storage_type(45-47)storage_type(74-76)
lib/llm/src/block_manager/block/transfer_v2/executors.rs (3)
lib/llm/src/block_manager/block/transfer_v2/coordinators.rs (4)
execute_local_transfer(91-131)write_to(143-151)write_to(163-209)write_to(265-273)lib/llm/src/block_manager/block/transfer_v2/strategy.rs (13)
strategy(37-37)strategy(60-62)strategy(67-69)strategy(74-76)strategy(81-83)strategy(87-89)strategy(93-95)strategy(100-102)strategy(106-108)strategy(112-114)strategy(119-121)strategy(125-127)strategy(131-133)lib/llm/src/block_manager/block/transfer_v2.rs (4)
write_to(50-56)write_to(120-132)write_to(162-194)write_to(223-255)
lib/llm/src/block_manager/block/data.rs (6)
lib/llm/src/block_manager/state.rs (1)
worker_id(91-93)lib/llm/src/block_manager/block.rs (12)
block_id(356-358)worker_id(1125-1127)num_layers(361-363)page_size(366-368)num_outer_dims(377-379)block_data(415-417)block_data(635-637)block_data(816-818)block_data(1283-1285)block_data_mut(423-425)block_data_mut(645-647)block_data_mut(1307-1309)lib/llm/src/block_manager/block/data/logical.rs (11)
block_id(77-79)block_set_id(81-83)worker_id(85-87)storage_type(89-91)is_fully_contiguous(93-95)num_layers(97-99)page_size(102-104)num_outer_dims(106-108)num_inner_dims(110-112)is_local(114-116)is_local_mut(118-120)lib/llm/src/block_manager/block/data/local.rs (19)
block_id(59-61)block_set_id(64-66)worker_id(69-71)storage_type(45-47)storage_type(74-76)is_fully_contiguous(49-51)is_fully_contiguous(78-80)num_layers(82-84)page_size(94-96)num_outer_dims(86-88)num_inner_dims(90-92)is_local(98-100)is_local_mut(102-104)local_layer_view(108-118)local_layer_view_mut(120-129)local_block_view(131-143)local_block_view_mut(145-157)block_data(167-169)block_data_mut(175-177)lib/llm/src/block_manager/layout.rs (5)
storage_type(205-205)storage_type(516-518)storage_type(747-749)num_layers(237-239)page_size(250-252)lib/llm/src/block_manager/distributed/transfer.rs (1)
block_data(47-48)
lib/llm/src/block_manager/layout/nixl.rs (6)
lib/llm/src/block_manager/layout.rs (20)
storage(196-196)storage(506-508)storage(789-791)layout_type(193-193)layout_type(502-504)layout_type(783-787)new(332-360)new(400-430)new(582-619)new(656-692)allocate(471-496)allocate(711-743)storage_type(205-205)storage_type(516-518)storage_type(747-749)config(208-208)config(520-522)config(751-753)new_internal(435-449)new_internal(694-706)lib/llm/src/block_manager/state/local.rs (1)
create_layout(92-122)lib/llm/src/block_manager/block.rs (9)
new(189-196)new(581-590)new(654-663)new(788-794)new(954-964)new(1117-1123)new(1162-1172)new(1224-1235)layout(1200-1202)lib/llm/src/block_manager/block/data.rs (1)
storage_type(26-26)lib/llm/src/block_manager/block/data/local.rs (2)
storage_type(45-47)storage_type(74-76)lib/llm/src/block_manager/storage/nixl.rs (1)
storage_type(266-268)
lib/llm/src/block_manager/block/transfer_next/strategy.rs (2)
lib/llm/src/block_manager/block/transfer.rs (5)
write_to_strategy(113-115)write_to_strategy(133-137)write_to_strategy(642-717)read_from_strategy(122-124)read_from_strategy(146-148)lib/llm/src/block_manager/block/transfer_next.rs (5)
write_to_strategy(112-114)write_to_strategy(131-135)write_to_strategy(621-696)read_from_strategy(121-123)read_from_strategy(144-146)
🪛 GitHub Actions: Copyright Checks
lib/llm/src/block_manager/block/collections.rs
[error] 1-1: Invalid or missing copyright header detected.
lib/llm/src/block_manager/block/data/logical/null.rs
[error] 1-1: Invalid or missing copyright header detected.
lib/llm/src/block_manager/block/transfer_v3.rs
[error] 1-1: Invalid or missing copyright header detected.
lib/llm/src/block_manager/block/transfer_v2/executors/memcpy.rs
[error] 1-1: Invalid or missing copyright header detected.
lib/llm/src/block_manager/block/transfer_v2/executors/cuda.rs
[error] 1-1: Invalid or missing copyright header detected.
lib/llm/src/block_manager/block/data/logical/distributed_leader_worker.rs
[error] 1-1: Invalid or missing copyright header detected.
lib/llm/src/block_manager/layout/distributed.rs
[error] 1-1: Invalid or missing copyright header detected.
lib/llm/src/block_manager/block/transfer_v2/executors/nixl.rs
[error] 1-1: Invalid or missing copyright header detected.
lib/llm/src/block_manager/block/transfer_v2/coordinators.rs
[error] 1-1: Invalid or missing copyright header detected.
lib/llm/src/block_manager/block/transfer_v2/error.rs
[error] 1-1: Invalid or missing copyright header detected.
lib/llm/src/block_manager/block/transfer_v2/macros.rs
[error] 1-1: Invalid or missing copyright header detected.
lib/llm/src/block_manager/block/transfer_v2/strategy.rs
[error] 1-1: Invalid or missing copyright header detected.
lib/llm/src/block_manager/block/transfer_v2/executors.rs
[error] 1-1: Invalid or missing copyright header detected.
lib/llm/src/block_manager/block/transfer_v2.rs
[error] 1-1: Invalid or missing copyright header detected.
lib/llm/src/block_manager/distributed/active_message.rs
[error] 1-1: Invalid or missing copyright header detected.
🪛 Flake8 (7.2.0)
lib/bindings/python/src/dynamo/llm/__init__.py
[error] 29-29: 'dynamo._core.KvbmLeader' imported but unused
(F401)
[error] 30-30: 'dynamo._core.KvbmWorker' imported but unused
(F401)
lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_manager.py
[error] 24-24: redefinition of unused 'KVCacheBlocks' from line 16
(F811)
[error] 25-25: redefinition of unused 'Request' from line 19
(F811)
🪛 LanguageTool
lib/bindings/python/rust/llm/block_manager/vllm/slot_manager_test_plan.md
[uncategorized] ~195-~195: This verb does not appear to agree with the subject. Consider using a different form.
Context: ...he hits significantly faster than cache miss - Block reuse reduces memory allocation...
(AI_EN_LECTOR_REPLACEMENT_VERB_AGREEMENT)
🪛 GitHub Actions: Rust pre-merge checks
lib/bindings/python/rust/llm/block_manager/vllm.rs
[error] 563-570: cargo fmt formatting check failed. Code style differences detected in lines 563-570. Run 'cargo fmt' to fix formatting issues.
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (95)
lib/bindings/python/src/dynamo/llm/vllm_integration/__init__.py (1)
1-2: LGTM – header only stub is fineNothing to change here; the file just establishes the namespace and already contains the correct SPDX lines.
lib/bindings/python/rust/lib.rs (1)
82-86: Commenting out helper modules may break Python imports – double-check feature gatingBy commenting out
dynamo_block_manager/kvbm_utils, the symbols that Python now tries to import (KvbmLeader,KvbmWorker) might no longer be registered unlessllm::block_manager::add_to_module()already adds them.Please verify that:
KvbmLeaderandKvbmWorkerare still added to the Python module when theblock-managerfeature is on.- The CI build enabling these symbols actually compiles with that feature.
If not, either re-enable the lines or register the two classes explicitly in
_core.lib/bindings/python/rust/llm/block_manager/dlpack.rs (1)
98-104: Nice cleanup – uses struct-field shorthandThe refactor to shorthand field initialisation is correct and behaviour-preserving.
dynamo.code-workspace (1)
8-11: LGTM! Workspace configuration properly updated for vLLM integration.The added Python analysis paths correctly support the new Python bindings and vLLM integration modules introduced in this PR.
lib/bindings/python/Cargo.toml (3)
37-37: LGTM! Block manager feature enabled by default.Enabling the "block-manager" feature by default aligns with the KVBM integration objectives and makes the functionality accessible without explicit feature flags.
81-82: Good addition of testing framework.Adding
rstestas a dev dependency will enhance testing capabilities with parameterized and fixture-based tests.
48-48: Dependency version currency verifiedThe
derive-getters = "0.5"requirement inlib/bindings/python/Cargo.tomlmatches the latest published crate version 0.5.0 (released August 7, 2024). No update is necessary.lib/llm/src/block_manager/block/transfer/memcpy.rs (2)
27-28: LGTM! API simplification improves usability.Removing the
PrivateTokenargument simplifies the block data access pattern while preserving the existing memcpy logic. This aligns with the broader locality-aware block abstractions refactoring.
56-57: Consistent API simplification.The same token removal applied consistently in
copy_layersmaintains API coherence across the transfer functions.lib/bindings/python/rust/llm/block_manager/block_list.rs (4)
43-44: Excellent use of shorthand struct initialization.Using shorthand syntax for field initialization is more idiomatic Rust and improves readability.
57-57: Good optimization: removed unnecessary clone.Since
dtypeappears to be aCopytype, the explicit.clone()is unnecessary and removing it improves performance.
74-74: Consistent optimization applied.The same unnecessary
.clone()removal applied consistently across the__getitem__method.
93-93: Clone removal maintains consistency.Consistent application of the optimization in the
__next__method completes the cleanup across all usage sites.lib/bindings/python/rust/llm.rs (1)
48-49: LGTM! Proper feature gating implementation.The conditional compilation using
#[cfg(feature = "block-manager")]correctly follows Rust conventions for optional modules and supports the modular architecture goals mentioned in the PR objectives.lib/bindings/python/rust/llm/block_manager/distributed.rs (1)
6-11: LGTM! Clean module organization.The module structure correctly follows Rust conventions with clear separation of concerns and proper re-exports of the main distributed components.
lib/llm/src/block_manager/block/state.rs (1)
96-108: LGTM! Well-designed state transition method.The
apply_token_blockmethod correctly implements the Reset → Complete state transition with proper validation and error handling. The documentation clearly specifies the preconditions and the implementation follows existing patterns in the codebase.lib/bindings/python/rust/llm/block_manager/layer.rs (2)
20-20: LGTM! Proper trait-based access pattern.The addition of
BlockDataProviderMutimport supports the new trait-based block data access pattern, which improves encapsulation and consistency.
125-125: Good optimization removing unnecessary clone.The change from
self.dtype.clone()toself.dtypeeliminates an unnecessary clone sinceDTypelikely implementsCopy, improving performance.lib/llm/Cargo.toml (4)
28-31: LGTM: Default features updated to enable new functionality.The change from no default features to enabling "block-manager" and "testing-full" aligns with the PR objectives of introducing KVBM integration and distributed block management capabilities.
36-36: LGTM: New testing feature added.The "testing-etcd" feature addition supports testing of distributed coordination components that rely on etcd.
56-56: LGTM: New dependencies support distributed functionality.The addition of
futures-utilandtmqdependencies aligns with the asynchronous programming patterns and messaging infrastructure needed for the distributed block manager implementation.Also applies to: 66-66
87-87: LGTM: nixl-sys dependency updated.The updated git revision likely includes fixes or features required by the new distributed transfer and block registration logic.
lib/llm/src/block_manager/block/transfer/cuda.rs (2)
53-54: LGTM: Simplified block data access.The removal of private token arguments simplifies the API and aligns with the broader refactoring to use locality-aware block data handling. This change is consistent with similar simplifications mentioned in other transfer modules.
103-104: LGTM: Consistent API simplification.The same simplification pattern applied in
copy_layersmaintains consistency with thecopy_blockfunction changes.lib/bindings/python/src/dynamo/_core.pyi (2)
1018-1024: LGTM: Type stub for vLLM cache manager.The
KvbmCacheManagertype stub correctly exposes the Rust-backed class with appropriate initialization parameters.
1027-1033: LGTM: Type stub for KV cache request.The
KvbmRequesttype stub provides proper type information for the request class with sensible parameters for KV cache operations.lib/bindings/python/src/dynamo/llm/vllm_integration/rust.py (1)
1-50: LGTM: Well-structured Rust extension loader.This module follows Python best practices for handling Rust extensions:
- Proper use of
TYPE_CHECKINGto separate static analysis from runtime behavior- Dynamic loading via
getattris appropriate for extension modules- Complete
__all__list for clear public interface- Class exports match the Rust extension registration in
vllm.rsThe approach cleanly separates type information for static analysis while enabling dynamic loading of the actual Rust-backed classes at runtime.
lib/llm/src/block_manager/block/data/logical/null.rs (1)
3-23: LGTM: Appropriate null object implementation.The
NullResourcesstruct correctly implements the null object pattern for logical resources, explicitly disabling transfer operations by panicking. This serves as a safe default that prevents unintended transfers when no proper logical resource is configured.lib/llm/src/block_manager/layout/distributed.rs (1)
23-23: Object safety confirmed forBlockLayouttrait usage.I’ve reviewed
BlockLayoutalong with its super-traits (GenericBlockLayoutandBlockLayoutConfig) and their methods—none declare generic type parameters or returnSelf. All inherited traits (Debug,Send,Sync) are object-safe. UsingArc<dyn BlockLayout<StorageType = NixlStorage>>is therefore valid.lib/llm/src/block_manager/block/transfer_v2/error.rs (1)
4-30: Well-designed error enum with comprehensive coverage.The
TransferErrorenum provides excellent coverage of transfer operation failure modes with descriptive error messages. The use ofthiserrorfor derive macros and transparent forwarding fromanyhow::Errorfollows Rust best practices.lib/bindings/python/rust/llm/block_manager/vllm/request.rs (1)
22-49: Well-structured implementation with correct hash computation.The implementation correctly uses the
compute_hash_v2function with serialized salt data to generate deterministic hashes. The salt structure properly handles optional fields with serde skip serialization attributes.lib/llm/src/block_manager/config.rs (3)
88-90: Good change from DType to byte width for clearer semantics.Replacing the
dtype: DTypefield withdtype_width_bytes: usizeprovides clearer semantics for byte-level operations and eliminates the need to compute element sizes from data types at runtime.
98-104: Well-defined parallelism strategy for distributed block management.The
BlockParallelismStrategyenum clearly defines the leader-worker sharded approach with good documentation explaining the memory/computation tradeoffs versus communication overhead.
151-163: Improved validation logic correctly implements XOR constraint.The updated validation logic properly enforces that exactly one of
storage,allocator, orlogicalmust be provided, with clear error messages for each failure case. This is a significant improvement over the previous validation.lib/llm/src/block_manager/block/data/logical/lw_sharded.rs (1)
100-124: Well-implemented validation logic for block compatibility.The
validate_block_compatibilityhelper function correctly handles the various cases (both None, one None, layout comparison) and provides clear error messages for validation failures.lib/llm/src/block_manager/block/transfer_next/nixl.rs (1)
22-83: Implementation looks correct.The function properly handles both contiguous and layered block layouts with appropriate assertions and error handling.
lib/bindings/python/rust/llm/block_manager/distributed/leader.rs (2)
10-16: Helper function correctly computes block counts.The function properly handles missing or invalid environment variables and performs the correct calculation.
27-55: Well-structured leader initialization.The constructor properly initializes all components with appropriate error handling and thread-safe wrapping.
lib/llm/src/block_manager/block/transfer/nixl.rs (3)
22-33: Simplified function signature looks good.The removal of
Arcwrapper and addition of trait bounds aligns with the broader refactoring. The lifetime of the reference is managed by the caller.
34-35: Removal of private token improves API ergonomics.The removal of
private::PrivateTokenfromblock_data()calls simplifies the API without compromising encapsulation, as access control is now handled at a different level.
88-99: Function signature updates are consistent.The changes to
write_blocks_tomirror those inappend_xfer_request, maintaining consistency across the API.lib/bindings/python/rust/llm/block_manager/distributed/worker.rs (1)
24-53: LGTM! Well-designed PyTorch tensor wrapper.The
VllmTensorimplementation properly extracts tensor metadata while holding a reference to prevent garbage collection. The error handling withPython::with_gilis correct for PyO3 integration.lib/llm/src/block_manager/distributed/utils.rs (1)
7-8: LGTM! Clear and well-named message constants.The ZMQ message type constants follow a consistent naming pattern and are self-documenting.
lib/llm/src/block_manager/block/transfer_v2/executors/memcpy.rs (1)
105-108: Unsafe memory copy is correctly implemented.The use of
std::ptr::copy_nonoverlappingis appropriate here as:
- Size validation ensures src and dst have equal lengths
- Different memory regions are guaranteed by the block system design
- Layer-wise copying prevents overlapping issues
lib/llm/src/block_manager/block/factory/logical.rs (2)
37-52: LGTM! Proper factory implementation with validation.The
create_block_datamethod correctly validates block indices and constructs block data with all necessary parameters. Error handling is appropriate.
75-108: Excellent test coverage demonstrating full factory lifecycle.The test covers construction, block creation, property validation, and integration with
BlockPool. This provides good confidence in the implementation.lib/llm/src/block_manager/block/transfer_v2/executors/cuda.rs (1)
79-94: LGTM! Strategy validation tests provide good coverage.The tests properly validate that storage type combinations map to expected CUDA transfer strategies. This ensures the strategy selection system works correctly.
lib/llm/src/block_manager/block/transfer_next/memcpy.rs (1)
73-82: Well-implemented unsafe memory copy with proper safety checks.The
memcpyfunction correctly uses debug assertions to prevent undefined behavior from overlapping memory regions. Theinline(always)attribute is appropriate for this performance-critical operation..devcontainer/devcontainer.json (1)
94-103: Good security practice for user configuration.The dynamic UID/GID configuration from environment variables with fallback values is a good practice for avoiding permission issues in development containers.
lib/llm/src/block_manager.md (1)
1-143: Excellent documentation with clear diagrams!The documentation provides a comprehensive overview of the block lifecycle and OffloadManager architecture. The Mermaid diagrams effectively visualize the state transitions and component interactions.
lib/llm/src/block_manager/block/transfer_v2/strategy.rs (1)
4-191: Well-designed transfer strategy system!The implementation provides a clean abstraction for different transfer strategies with appropriate trait implementations for all storage type combinations. The comprehensive unit tests ensure correctness.
lib/bindings/python/rust/llm/block_manager/vllm/slot_manager_test_plan.md (1)
1-216: Comprehensive and well-structured test plan!The test plan provides excellent coverage of the SlotManager functionality with clear validation points, test phases, and success criteria. The sequence diagrams effectively illustrate the cache hit and miss workflows.
lib/llm/src/block_manager/block/data/view.rs (2)
44-48: Good refactoring to use trait objects for flexibility.The change from concrete
BlockData<S>to trait objectdyn BlockDataExt<S>improves extensibility by allowing different block data implementations. Adding the explicitstorage_typefield eliminates the need to query through the trait object for this frequently accessed property.Also applies to: 63-67, 96-100, 112-116
171-177: Consider returning Result instead of panicking.The panic for invalid storage types could crash the application. Since this is in a trait implementation, it would be better to handle this gracefully.
Consider changing the
NixlDescriptortrait to returnResult<u64>fordevice_id(), or at minimum, use a more descriptive panic message:fn device_id(&self) -> u64 { match self.storage_type { StorageType::System | StorageType::Pinned => 0, StorageType::Device(device_id) => device_id as u64, StorageType::Disk(fd) => fd, - _ => panic!("Invalid storage type"), + _ => panic!("Storage type {:?} does not support NIXL operations", self.storage_type), } }The same change should be applied to the second occurrence at lines 200-206.
Also applies to: 200-206
⛔ Skipped due to learnings
Learnt from: PeaBrane PR: ai-dynamo/dynamo#1392 File: lib/llm/src/kv_router/scoring.rs:35-46 Timestamp: 2025-06-05T01:02:15.318Z Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.Learnt from: PeaBrane PR: ai-dynamo/dynamo#1285 File: lib/llm/src/kv_router/scoring.rs:58-63 Timestamp: 2025-05-30T06:38:09.630Z Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.lib/llm/src/block_manager/block/transfer_next/context.rs (1)
107-116: LGTM!The Drop implementation correctly handles cleanup by canceling the worker and joining the thread with appropriate error logging.
lib/llm/src/block_manager/block/transfer_v2/executors/nixl.rs (1)
233-299: LGTM!The tests appropriately cover the implemented functionality (strategy validation and conversions). Once the actual transfer operations are implemented, additional integration tests will be needed.
lib/llm/src/block_manager/block/transfer_v2/executors.rs (1)
183-206: LGTM!The test coverage appropriately validates the core functionality of the UniversalCoordinator.
lib/llm/src/block_manager/block/data/local.rs (1)
107-158: LGTM!The
BlockDataViewsimplementation properly handles unsafe view creation with appropriate error handling for non-contiguous blocks. The unsafe blocks are justified as they operate on memory regions guaranteed by the layout.lib/llm/src/block_manager/block/factory.rs (1)
15-78: LGTM!Well-designed factory trait hierarchy with appropriate default implementations and a useful extension trait for batch operations. The separation between
BlockFactoryandIntoBlocksprovides good flexibility.lib/bindings/python/rust/llm/block_manager/block.rs (1)
31-31: LGTM!The locality parameter additions and block data access updates are consistent with the new locality-aware block management design. The use of
PrivateTokenproperly enforces access control.Also applies to: 38-38, 170-179
lib/llm/src/block_manager/block/transfer_next/cuda.rs (1)
229-294: LGTM!Comprehensive test coverage that properly verifies memset operations and bidirectional CUDA transfers with appropriate stream synchronization.
lib/llm/src/block_manager/distributed/leader.rs (1)
66-137: Well-structured distributed leader implementation.The implementation demonstrates good practices:
- Proper async/await usage with cancellation tokens
- Comprehensive error handling with context
- Effective use of tracing for debugging
- Clean separation of concerns between barrier sync and ZMQ setup
lib/bindings/python/rust/llm/block_manager/vllm/slot_test_plan.md (1)
1-267: Comprehensive and well-structured test plan!This test plan provides excellent coverage of the Slot block management system with:
- Clear workflow diagrams using Mermaid
- Systematic organization into 4 phases
- Detailed validation points for each test scenario
- Good coverage of both happy paths and error conditions
- Helpful code examples and patterns
The plan effectively documents the testing strategy and will serve as a valuable reference for implementation and validation.
lib/bindings/python/rust/llm/block_manager.rs (2)
123-134: Consider the implications of blocking on async operations in constructor.The constructor blocks on an async operation using
rt.block_on(). This could cause issues if called from an async Python context or if the initialization takes a long time. Consider documenting this behavior or providing an async factory method.
144-149: Clean and minimal API design.Good design choice to keep the Python API surface minimal with only the essential
block_size()method exposed. The internalget_block_manager()method provides clean access for Rust code.lib/bindings/python/rust/llm/block_manager/vllm.rs (1)
346-358: Well-designed generic slot manager.The
SlotManagerimplementation is well-structured with:
- Generic design over
RequestKeyfor flexibility- Proper error handling with custom
SlotErrortype- Clear separation of concerns
lib/llm/src/block_manager/layout/nixl.rs (2)
120-131: Good trait simplification.Removing the
BlockLayoutNixlStoragetrait and simplifying the bounds to just requireBlockLayout + ToSerializedNixlBlockLayoutimproves the API clarity.
152-183: Improved layout creation API.The changes to return
Box<dyn NixlLayout>instead of opaque types provide better flexibility and cleaner API. The support forLayerSeparatelayout alongsideFullyContiguousis well-integrated.lib/bindings/python/rust/llm/block_manager/vllm/block_list.rs (1)
112-115: Good use of Option for one-time consumption.The
take_blocksmethod correctly usesOption::take()to ensure blocks can only be consumed once, preventing use-after-move bugs.lib/llm/src/block_manager/block/data/logical.rs (1)
39-49: Well-designed block data structure.Good design choices:
- Using
PhantomDatato associate storage type without runtime cost- Storing resources as
Arcfor efficient sharing- Explicit comment about not implementing Clone to ensure uniqueness
lib/llm/src/block_manager/block/transfer.rs (3)
127-138: LGTM! Clean abstraction for transfer strategy.The updated trait bounds and delegation to storage type's
write_to_strategy()method provides better separation of concerns.
180-194: LGTM! Proper CUDA async transfer handling.The implementation correctly handles CUDA async transfers with proper stream usage and event-based notification.
224-240: Excellent architectural improvement with locality abstraction.Delegating transfer handling to
LocalityProvider::handle_transferprovides better separation of concerns and allows different locality implementations to customize transfer behavior.lib/bindings/python/rust/llm/block_manager/vllm/slot.rs (1)
291-295: LGTM! Proper resource cleanup in Drop.The Drop implementation correctly frees all blocks when the slot is destroyed.
lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_manager.py (1)
83-103: LGTM! Well-implemented cache block retrieval.The method properly handles slot creation, block retrieval, and statistics tracking.
lib/llm/src/block_manager.rs (2)
90-98: LGTM! Clean cancellation token management.The
CancelOnLastDroppattern ensures proper cleanup by cancelling the token when the KvBlockManager is dropped.
115-148: LGTM! Well-designed accessor methods.The new accessor methods provide a clean API for accessing block pools and configuration while maintaining proper encapsulation.
lib/llm/src/block_manager/block/transfer_next.rs (1)
392-431: Well-implemented validation logic.The validation ensures data integrity by checking for duplicate destinations and disjoint source/destination sets. This prevents common transfer errors.
lib/llm/src/block_manager/distributed/active_message.rs (1)
414-606: Excellent documentation and examples!The comprehensive examples module with complete usage examples, integration patterns, and clear documentation makes this code very maintainable and easy to understand.
lib/llm/src/block_manager/layout.rs (2)
559-805: Excellent implementation of LayerSeparate layout!The LayerSeparate layout implementation is well-designed with:
- Clear separation between outer-contiguous and block-contiguous modes
- Proper validation and error handling
- Comprehensive stride calculations
- Good alignment support
This provides the flexibility needed for vLLM integration where each layer requires separate allocation.
1144-1498: Comprehensive test coverage for LayerSeparate layout.The tests thoroughly cover:
- Both outer-contiguous and block-contiguous modes
- Invalid configurations
- Memory region calculations
- Alignment handling
- Stride calculations
- All edge cases
Excellent test design!
lib/llm/src/block_manager/block.rs (3)
180-188: Clean locality abstraction implementationThe refactoring to make
Blockgeneric overLocalityProvideris well-designed. This allows for different block data implementations based on locality (Local vs Logical) while maintaining a consistent API.
987-1051: Commented Nixl trait implementations need resolutionThe Nixl-related trait implementations are commented out. This suggests incomplete integration with the new locality abstraction.
Is Nixl support temporarily disabled? If so, please add a comment explaining the timeline for re-enabling it. If it's permanently removed, these sections should be deleted.
874-876: Elegant marker trait designThe blanket implementations for
ReadableBlockandWritableBlockbased onBlockDataProvidertraits provide a clean and extensible design.lib/llm/src/block_manager/block/transfer_next/strategy.rs (3)
21-48: Well-structured transfer strategy implementationsThe
DiskStoragetransfer strategies are correctly implemented. The use ofNixl(NixlTransfer::Read)when DiskStorage is the source andNixl(NixlTransfer::Write)when it's the destination properly reflects the Nixl system's perspective.
133-176: Excellent use of generic implementationsThe generic implementations for
NixlStoragetransfers and the delegation pattern inReadFromStrategyimplementations reduce code duplication while maintaining type safety.
240-257: Good documentation of compile-time constraintsThe commented tests effectively document that
NixlStoragecannot be used as a source storage type, enforcing this constraint at compile time rather than runtime.lib/llm/src/block_manager/distributed.rs (2)
15-40: Module structure looks good!The distributed module is well-organized with clear separation of concerns between leader, worker, transfer, and communication components.
97-127: Well-structured test helper function.The function properly sets up distributed components with barrier synchronization, ensuring workers and leader coordinate correctly.
lib/llm/src/block_manager/block/transfer_v2/coordinators.rs (1)
316-381: Good test coverage for implemented functionality.The tests verify coordinator creation, strategy selection, locality compatibility, and transfer mechanism determination.
lib/llm/src/block_manager/block/transfer_v2/macros.rs (2)
5-131: Well-designed macro system for reducing boilerplate.The macros effectively reduce code duplication while maintaining type safety and proper error handling. The composition pattern with
impl_local_transfersusing other macros is particularly clean.
133-163: Basic test coverage is appropriate for macro validation.The test verifies macro expansion and compilation, which is the primary concern for macro code.
lib/llm/src/block_manager/distributed/worker.rs (3)
33-76: Robust tensor validation with clear error messages.The function thoroughly validates tensor properties and provides helpful error messages. The TODOs indicate planned enhancements for more complex tensor layouts.
78-106: Well-structured configuration with builder pattern.The configuration uses sensible defaults and correctly implements the builder pattern with the manual
builder()method as required by the derive_builder macro.
334-341: Proper cleanup in Drop implementation.The Drop trait correctly cancels and detaches the worker task, following the established pattern for long-running background tasks.
| from dynamo._core import KvbmLeader as KvbmLeader | ||
| from dynamo._core import KvbmWorker as KvbmWorker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
KvbmLeader / KvbmWorker import will raise at import-time & flake8 F401
dynamo._core currently does not expose these classes (see lib/bindings/python/rust/lib.rs::_core). Importing them unconditionally will cause ImportError, and flake8 flags them as unused.
Wrap the import in the same guarded pattern used for BlockManager, and add the names to __all__ (or suppress with # noqa: F401) so they are considered “exported”.
-from dynamo._core import KvbmLeader as KvbmLeader
-from dynamo._core import KvbmWorker as KvbmWorker
+try:
+ from dynamo._core import KvbmLeader as KvbmLeader # noqa: F401
+ from dynamo._core import KvbmWorker as KvbmWorker # noqa: F401
+except ImportError:
+ pass # Distributed block-manager not enabled at build-timeDon’t forget to append 'KvbmLeader', 'KvbmWorker' to the module’s __all__ if one exists further down the file.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from dynamo._core import KvbmLeader as KvbmLeader | |
| from dynamo._core import KvbmWorker as KvbmWorker | |
| try: | |
| from dynamo._core import KvbmLeader as KvbmLeader # noqa: F401 | |
| from dynamo._core import KvbmWorker as KvbmWorker # noqa: F401 | |
| except ImportError: | |
| pass # Distributed block-manager not enabled at build-time |
🧰 Tools
🪛 Flake8 (7.2.0)
[error] 29-29: 'dynamo._core.KvbmLeader' imported but unused
(F401)
[error] 30-30: 'dynamo._core.KvbmWorker' imported but unused
(F401)
🤖 Prompt for AI Agents
In lib/bindings/python/src/dynamo/llm/__init__.py around lines 29-30, the
imports of KvbmLeader and KvbmWorker from dynamo._core will raise ImportError
because these classes are not always exposed, and flake8 flags them as unused.
To fix this, wrap these imports in the same conditional import guard used for
BlockManager to avoid import-time errors, then add 'KvbmLeader' and 'KvbmWorker'
to the module's __all__ list (or suppress the unused import warning with # noqa:
F401) so they are recognized as exported symbols.
| VLLM_V1_BASE_IMAGE_TAG="25.01-cuda12.8-devel-ubuntu24.04" | ||
|
|
||
| NIXL_COMMIT=16348080f5bdeb9fe6058a23be140cec020ef3f3 | ||
| NIXL_COMMIT=fa800bcfe3814b08df9cda9c30443de8c19665e5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify the NIXL commit hash exists and is compatible.
The commit hash update ensures compatibility with the new block manager features. Please verify this commit exists in the nixl repository and contains the required functionality.
🏁 Script executed:
#!/bin/bash
# Description: Verify the NIXL commit exists in the repository
# Expected: The commit should exist and be accessible
NIXL_COMMIT="fa800bcfe3814b08df9cda9c30443de8c19665e5"
# Check if we can access the commit via GitHub API
curl -s "https://api.github.com/repos/ai-dynamo/nixl/commits/${NIXL_COMMIT}" | jq '.sha, .commit.message' || echo "Failed to fetch commit info"Length of output: 1639
🏁 Script executed:
#!/bin/bash
# Description: List files modified in the specified NIXL commit to verify block manager changes
NIXL_COMMIT="fa800bcfe3814b08df9cda9c30443de8c19665e5"
curl -s -H "Accept: application/vnd.github.v3+json" \
"https://api.github.com/repos/ai-dynamo/nixl/commits/${NIXL_COMMIT}" | jq -r '.files[]?.filename'Length of output: 367
Verify NIXL_COMMIT includes required feature code
The commit fa800bcfe3814b08df9cda9c30443de8c19665e5 exists, but it only updates CI/Jenkins configuration—no block manager or KVBM integration code is present.
• container/build.sh (line 117): NIXL_COMMIT set to fa800…
• Files changed in that commit:
- .ci/assets/nixlbench-version-info.json.template
- .ci/jenkins/lib/nixlbench-container-build-matrix.yaml
- .ci/jenkins/pipeline/proj-jjb.yaml
Please update the hash to the NIXL commit that actually contains the block manager/KVBM feature code, or include those code changes in this PR.
🤖 Prompt for AI Agents
In container/build.sh at line 117, the NIXL_COMMIT is set to a commit that only
updates CI/Jenkins configuration and does not include the block manager or KVBM
integration code. Update the NIXL_COMMIT variable to point to the commit hash
that contains the actual block manager/KVBM feature code, ensuring the build
uses the correct code changes.
| @@ -0,0 +1,23 @@ | |||
| use super::*; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing copyright header.
The pipeline failure indicates this file is missing the required copyright header. All source files in this codebase should include the standard NVIDIA copyright header.
Add the standard copyright header at the beginning of the file:
+// SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// SPDX-License-Identifier: Apache-2.0
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
use super::*;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| use super::*; | |
| // SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| // SPDX-License-Identifier: Apache-2.0 | |
| // | |
| // Licensed under the Apache License, Version 2.0 (the "License"); | |
| // you may not use this file except in compliance with the License. | |
| // You may obtain a copy of the License at | |
| // | |
| // http://www.apache.org/licenses/LICENSE-2.0 | |
| // | |
| // Unless required by applicable law or agreed to in writing, software | |
| // distributed under the License is distributed on an "AS IS" BASIS, | |
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| // See the License for the specific language governing permissions and | |
| // limitations under the License. | |
| use super::*; |
🧰 Tools
🪛 GitHub Actions: Copyright Checks
[error] 1-1: Invalid or missing copyright header detected.
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/block/data/logical/null.rs at line 1, the file is
missing the required NVIDIA copyright header. Add the standard NVIDIA copyright
header at the very top of the file before any code or imports to comply with the
project's licensing requirements.
| @@ -0,0 +1,24 @@ | |||
| //! Distributed Layout | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing copyright header to resolve pipeline failure.
The pipeline reports a missing copyright header. Please add the standard NVIDIA copyright header at the beginning of the file.
+// SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// SPDX-License-Identifier: Apache-2.0
+
//! Distributed Layout📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| //! Distributed Layout | |
| // SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| // SPDX-License-Identifier: Apache-2.0 | |
| //! Distributed Layout |
🧰 Tools
🪛 GitHub Actions: Copyright Checks
[error] 1-1: Invalid or missing copyright header detected.
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/layout/distributed.rs at line 1, add the standard
NVIDIA copyright header at the very top of the file before any code or comments
to resolve the pipeline failure caused by the missing header.
| @@ -0,0 +1,30 @@ | |||
| use super::super::BlockError; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing copyright header to resolve pipeline failure.
The pipeline reports a missing copyright header. Please add the standard NVIDIA copyright header at the beginning of the file.
+// SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// SPDX-License-Identifier: Apache-2.0
+
use super::super::BlockError;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| use super::super::BlockError; | |
| // SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| // SPDX-License-Identifier: Apache-2.0 | |
| use super::super::BlockError; |
🧰 Tools
🪛 GitHub Actions: Copyright Checks
[error] 1-1: Invalid or missing copyright header detected.
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/block/transfer_v2/error.rs at line 1, the file is
missing the standard NVIDIA copyright header, causing pipeline failure. Add the
required NVIDIA copyright header comment block at the very top of the file
before any code or use statements to resolve the issue.
| fn is_fully_contiguous(&self) -> bool { | ||
| self.layout.layout_type() == LayoutType::FullyContiguous | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove duplicate is_fully_contiguous method.
This private method duplicates the trait implementation and is never used. The trait method at lines 78-80 is sufficient.
- fn is_fully_contiguous(&self) -> bool {
- self.layout.layout_type() == LayoutType::FullyContiguous
- }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn is_fully_contiguous(&self) -> bool { | |
| self.layout.layout_type() == LayoutType::FullyContiguous | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/block/data/local.rs around lines 49 to 52, remove
the private method is_fully_contiguous as it duplicates the trait implementation
found at lines 78-80 and is never used. Keeping only the trait method will avoid
redundancy and maintain clarity.
| (src_ptr as usize + size <= dst_ptr as usize) | ||
| || (dst_ptr as usize + size <= src_ptr as usize), | ||
| "Source and destination device memory regions must not overlap for D2D copy" | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect assertion messages.
The debug assertion messages incorrectly mention "D2D copy" for H2D and D2H transfers.
- "Source and destination device memory regions must not overlap for D2D copy"
+ "Source and destination memory regions must not overlap for H2D copy"And for D2H:
- "Source and destination device memory regions must not overlap for D2D copy"
+ "Source and destination memory regions must not overlap for D2H copy"Also applies to: 197-200
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/block/transfer_next/cuda.rs around lines 175 to 178
and 197 to 200, the debug assertion messages incorrectly refer to "D2D copy"
when the transfers are actually H2D or D2H. Update the assertion messages to
accurately describe the type of transfer being checked, replacing "D2D copy"
with the correct transfer direction such as "H2D copy" or "D2H copy" to reflect
the actual operation.
| config.barrier_id | ||
| ); | ||
|
|
||
| let leader_sockets = new_leader_sockets("tcp://127.0.0.1")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider making the ZMQ bind address configurable.
The hardcoded "tcp://127.0.0.1" limits the leader to localhost connections only. For true distributed deployment, this should be configurable through KvbmLeaderConfig.
+ /// The bind address for ZMQ sockets
+ #[builder(default = "String::from(\"tcp://127.0.0.1\")")]
+ bind_address: String,And update line 78:
- let leader_sockets = new_leader_sockets("tcp://127.0.0.1")?;
+ let leader_sockets = new_leader_sockets(&config.bind_address)?;Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/distributed/leader.rs at line 78, the ZMQ bind
address is hardcoded to "tcp://127.0.0.1", restricting connections to localhost.
Modify the code to retrieve the bind address from the KvbmLeaderConfig
configuration instead of using a fixed string. Update the call to
new_leader_sockets to use this configurable address, allowing for flexible
deployment in distributed environments.
| impl CudaTransferContext { | ||
| pub fn new(stream: Arc<CudaStream>) -> Self { | ||
| let (cuda_event_tx, mut cuda_event_rx): ( | ||
| mpsc::UnboundedSender<(CudaEvent, oneshot::Sender<()>)>, | ||
| mpsc::UnboundedReceiver<(CudaEvent, oneshot::Sender<()>)>, | ||
| ) = mpsc::unbounded_channel(); | ||
| let cancel_token = CancellationToken::new(); | ||
| let inner_cancel_token = cancel_token.clone(); | ||
|
|
||
| let cuda_event_worker = task::spawn(async move { | ||
| loop { | ||
| tokio::select! { | ||
| Some((event, tx)) = cuda_event_rx.recv() => { | ||
| if let Err(e) = event.synchronize() { | ||
| tracing::error!("Failed to synchronize CUDA event: {:?}", e); | ||
| } | ||
| let _ = tx.send(()); | ||
| } | ||
| _ = inner_cancel_token.cancelled() => { | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| }); | ||
|
|
||
| Self { | ||
| stream, | ||
| cuda_event_tx, | ||
| cuda_event_worker: Some(cuda_event_worker), | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Cancellation token is not stored or connected to external cancellation
The CudaTransferContext creates a local cancellation token that is never stored or connected to any external cancellation mechanism. This means the worker task cannot be gracefully cancelled - it can only be aborted in Drop.
Consider either:
- Accepting a cancellation token as a parameter from the parent
TransferContext - Storing the cancellation token in the struct for external cancellation
Apply this diff to accept a cancellation token from the parent context:
- pub fn new(stream: Arc<CudaStream>) -> Self {
+ pub fn new(stream: Arc<CudaStream>, cancel_token: CancellationToken) -> Self {
let (cuda_event_tx, mut cuda_event_rx): (
mpsc::UnboundedSender<(CudaEvent, oneshot::Sender<()>)>,
mpsc::UnboundedReceiver<(CudaEvent, oneshot::Sender<()>)>,
) = mpsc::unbounded_channel();
- let cancel_token = CancellationToken::new();
let inner_cancel_token = cancel_token.clone();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| impl CudaTransferContext { | |
| pub fn new(stream: Arc<CudaStream>) -> Self { | |
| let (cuda_event_tx, mut cuda_event_rx): ( | |
| mpsc::UnboundedSender<(CudaEvent, oneshot::Sender<()>)>, | |
| mpsc::UnboundedReceiver<(CudaEvent, oneshot::Sender<()>)>, | |
| ) = mpsc::unbounded_channel(); | |
| let cancel_token = CancellationToken::new(); | |
| let inner_cancel_token = cancel_token.clone(); | |
| let cuda_event_worker = task::spawn(async move { | |
| loop { | |
| tokio::select! { | |
| Some((event, tx)) = cuda_event_rx.recv() => { | |
| if let Err(e) = event.synchronize() { | |
| tracing::error!("Failed to synchronize CUDA event: {:?}", e); | |
| } | |
| let _ = tx.send(()); | |
| } | |
| _ = inner_cancel_token.cancelled() => { | |
| break; | |
| } | |
| } | |
| } | |
| }); | |
| Self { | |
| stream, | |
| cuda_event_tx, | |
| cuda_event_worker: Some(cuda_event_worker), | |
| } | |
| } | |
| impl CudaTransferContext { | |
| pub fn new(stream: Arc<CudaStream>, cancel_token: CancellationToken) -> Self { | |
| let (cuda_event_tx, mut cuda_event_rx): ( | |
| mpsc::UnboundedSender<(CudaEvent, oneshot::Sender<()>)>, | |
| mpsc::UnboundedReceiver<(CudaEvent, oneshot::Sender<()>)>, | |
| ) = mpsc::unbounded_channel(); | |
| let inner_cancel_token = cancel_token.clone(); | |
| let cuda_event_worker = task::spawn(async move { | |
| loop { | |
| tokio::select! { | |
| Some((event, tx)) = cuda_event_rx.recv() => { | |
| if let Err(e) = event.synchronize() { | |
| tracing::error!("Failed to synchronize CUDA event: {:?}", e); | |
| } | |
| let _ = tx.send(()); | |
| } | |
| _ = inner_cancel_token.cancelled() => { | |
| break; | |
| } | |
| } | |
| } | |
| }); | |
| Self { | |
| stream, | |
| cuda_event_tx, | |
| cuda_event_worker: Some(cuda_event_worker), | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/block/transfer_v2/context.rs around lines 81 to
111, the CudaTransferContext creates a local cancellation token but does not
store it or connect it to any external cancellation, preventing graceful
cancellation of the worker task. To fix this, modify the constructor to accept a
cancellation token parameter from the parent context and store it in the struct,
then use this token in the async worker loop to allow external cancellation
control.
| fn load_resources<B: BlockDataProvider<Locality = Logical<R>>>(blocks: &[B]) -> Vec<Arc<R>> { | ||
| blocks | ||
| .iter() | ||
| .map(|block| { | ||
| let any_block = block.block_data() as &dyn Any; | ||
|
|
||
| // TODO: Downcasting and unwrapping like this is atrocious... | ||
| let logical_block = any_block | ||
| .downcast_ref::<LogicalBlockData<<B as StorageTypeProvider>::StorageType, R>>() | ||
| .unwrap(); | ||
|
|
||
| logical_block.resources() | ||
| }) | ||
| .collect() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Unsafe downcasting with unwrap() can cause runtime panics
The downcasting logic uses unwrap() which will panic if the type doesn't match expectations. This is acknowledged in the TODO comment but represents a significant runtime safety issue.
Consider returning a proper error instead of panicking:
- let logical_block = any_block
- .downcast_ref::<LogicalBlockData<<B as StorageTypeProvider>::StorageType, R>>()
- .unwrap();
+ let logical_block = any_block
+ .downcast_ref::<LogicalBlockData<<B as StorageTypeProvider>::StorageType, R>>()
+ .ok_or_else(|| anyhow::anyhow!("Failed to downcast block data to LogicalBlockData"))?;This would require changing the return type to Result<Vec<Arc<R>>, anyhow::Error> and propagating the error in handle_transfer.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/block/locality.rs lines 138 to 152, the current
downcasting uses unwrap() which can panic at runtime if the type does not match.
To fix this, change the function signature to return Result<Vec<Arc<R>>,
anyhow::Error> instead of Vec<Arc<R>>. Replace unwrap() with proper error
handling by returning an error if downcast_ref returns None. Then propagate this
Result error upwards to handle_transfer to ensure safe error handling without
panics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review continued from previous batch...
| } | ||
|
|
||
| pub fn free_blocks(&mut self, request_id: &R) -> Result<(), SlotError> { | ||
| let slot = self.slots.get_mut(request_id).ok_or(SlotError::NotFound)?; | ||
| slot.free_blocks(); | ||
| Ok(()) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix formatting issues.
The pipeline indicates formatting issues in this range. Please run cargo fmt to fix the formatting.
#!/bin/bash
# Verify the formatting issues
cd lib/bindings/python/rust/llm/block_manager
cargo fmt --check vllm.rs🧰 Tools
🪛 GitHub Actions: Rust pre-merge checks
[error] 563-570: cargo fmt formatting check failed. Code style differences detected in lines 563-570. Run 'cargo fmt' to fix formatting issues.
🤖 Prompt for AI Agents
In lib/bindings/python/rust/llm/block_manager/vllm.rs around lines 563 to 570,
there are formatting issues detected by the pipeline. Run `cargo fmt` in the
lib/bindings/python/rust/llm/block_manager directory to automatically fix the
formatting problems in vllm.rs and ensure the code adheres to Rust style
guidelines.
| let blocks = matched_blocks.take_blocks(); | ||
| match blocks { | ||
| Some(BlockListType::ImmutableDevice(blocks)) => { | ||
| tracing::debug!( | ||
| request_id, | ||
| "applying {} cache-hit tokens", | ||
| blocks.len() * self.block_size | ||
| ); | ||
| slot.apply_computed_blocks(blocks)?; | ||
| } | ||
| Some(BlockListType::MutableDevice(_blocks)) => { | ||
| panic!( | ||
| "impossibility: mutable blocks were provided instead of immutable blocks" | ||
| ); | ||
| } | ||
| Some(BlockListType::ImmutableHost(_blocks)) => { | ||
| panic!("ImmutableHost should not be provided"); | ||
| } | ||
| Some(BlockListType::MutableHost(_blocks)) => { | ||
| panic!("MutableHost should not be provided"); | ||
| } | ||
| Some(BlockListType::ImmutableDisk(_blocks)) => { | ||
| panic!("ImmutableDisk should not be provided"); | ||
| } | ||
| Some(BlockListType::MutableDisk(_blocks)) => { | ||
| panic!("MutableDisk should not be provided"); | ||
| } | ||
| None => { | ||
| panic!("impossibility: block list was none; possible taken previously"); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace panic! with proper error handling.
Using panic! in Python bindings will crash the entire Python process. Convert these to proper error returns.
- Some(BlockListType::MutableDevice(_blocks)) => {
- panic!(
- "impossibility: mutable blocks were provided instead of immutable blocks"
- );
- }
+ Some(BlockListType::MutableDevice(_blocks)) => {
+ return Err(SlotError::from_str(
+ "mutable blocks were provided instead of immutable blocks"
+ ));
+ }Apply similar changes to all the panic! calls in this match statement.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In lib/bindings/python/rust/llm/block_manager/vllm.rs lines 435 to 465, replace
all panic! calls inside the match arms with proper error handling by returning
an appropriate error type instead of panicking. This will prevent crashing the
Python process and allow the error to be handled gracefully. Use a Result return
type and convert each panic! message into an Err variant with a descriptive
error message.
| fn serialize_storages<S: NixlRegisterableStorage>( | ||
| storages: Vec<&S>, | ||
| ) -> Result<Vec<NixlStorage>, LayoutError> { | ||
| let mut storage_descriptors = Vec::new(); | ||
|
|
||
| for storage in storages { | ||
| let descriptor = unsafe { storage.as_nixl_descriptor() }.ok_or_else(|| { | ||
| LayoutError::OperationFailed( | ||
| "Storage does not provide NIXL descriptors for serialization".to_string(), | ||
| ) | ||
| })?; | ||
| storage_descriptors.push(descriptor); | ||
| } | ||
|
|
||
| Ok(storage_descriptors) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Document the safety requirements for unsafe code.
The unsafe block calls as_nixl_descriptor(). Please add a safety comment explaining why this is safe and what invariants must be upheld.
fn serialize_storages<S: NixlRegisterableStorage>(
storages: Vec<&S>,
) -> Result<Vec<NixlStorage>, LayoutError> {
let mut storage_descriptors = Vec::new();
for storage in storages {
+ // Safety: The storage must be properly registered with NIXL before calling this.
+ // The returned descriptor is valid as long as the storage remains registered.
let descriptor = unsafe { storage.as_nixl_descriptor() }.ok_or_else(|| {
LayoutError::OperationFailed(
"Storage does not provide NIXL descriptors for serialization".to_string(),
)
})?;
storage_descriptors.push(descriptor);
}
Ok(storage_descriptors)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn serialize_storages<S: NixlRegisterableStorage>( | |
| storages: Vec<&S>, | |
| ) -> Result<Vec<NixlStorage>, LayoutError> { | |
| let mut storage_descriptors = Vec::new(); | |
| for storage in storages { | |
| let descriptor = unsafe { storage.as_nixl_descriptor() }.ok_or_else(|| { | |
| LayoutError::OperationFailed( | |
| "Storage does not provide NIXL descriptors for serialization".to_string(), | |
| ) | |
| })?; | |
| storage_descriptors.push(descriptor); | |
| } | |
| Ok(storage_descriptors) | |
| } | |
| fn serialize_storages<S: NixlRegisterableStorage>( | |
| storages: Vec<&S>, | |
| ) -> Result<Vec<NixlStorage>, LayoutError> { | |
| let mut storage_descriptors = Vec::new(); | |
| for storage in storages { | |
| // Safety: The storage must be properly registered with NIXL before calling this. | |
| // The returned descriptor is valid as long as the storage remains registered. | |
| let descriptor = unsafe { storage.as_nixl_descriptor() }.ok_or_else(|| { | |
| LayoutError::OperationFailed( | |
| "Storage does not provide NIXL descriptors for serialization".to_string(), | |
| ) | |
| })?; | |
| storage_descriptors.push(descriptor); | |
| } | |
| Ok(storage_descriptors) | |
| } |
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/layout/nixl.rs around lines 233 to 248, the unsafe
block calling as_nixl_descriptor() lacks a safety comment. Add a comment above
the unsafe block explaining why this call is safe, specifying the invariants or
conditions that guarantee safety, such as assumptions about the storage type or
internal state that ensure as_nixl_descriptor() will not cause undefined
behavior.
| fn is_fully_contiguous(&self) -> bool { | ||
| unimplemented!() | ||
| } | ||
|
|
||
| fn num_layers(&self) -> usize { | ||
| unimplemented!() | ||
| } | ||
|
|
||
| /// Even though the block is logical, we still need to know this for the token block stuff. | ||
| fn page_size(&self) -> usize { | ||
| self.page_size | ||
| } | ||
|
|
||
| fn num_outer_dims(&self) -> usize { | ||
| unimplemented!() | ||
| } | ||
|
|
||
| fn num_inner_dims(&self) -> usize { | ||
| unimplemented!() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid unimplemented!() in trait implementations.
Multiple methods return unimplemented!() which will panic at runtime. This is problematic for production code. Consider:
- Providing proper implementations
- Returning Result with an error
- Refactoring the trait to not require these methods for logical blocks
fn is_fully_contiguous(&self) -> bool {
- unimplemented!()
+ false // Logical blocks are not contiguous in physical memory
}
fn num_layers(&self) -> usize {
- unimplemented!()
+ 1 // Or return the actual number if available
}🤖 Prompt for AI Agents
In lib/llm/src/block_manager/block/data/logical.rs between lines 93 and 112,
several trait methods currently use unimplemented!(), which causes runtime
panics. To fix this, replace unimplemented!() with proper method implementations
that return meaningful values for logical blocks, or modify the methods to
return a Result type with an appropriate error if the operation is unsupported.
Alternatively, consider refactoring the trait to make these methods optional or
not required for logical blocks, ensuring no panics occur in production.
| TransferStrategy::Nixl(transfer_type) => { | ||
| let transfer_fut = nixl::write_blocks_to(sources, targets, &ctx, transfer_type)?; | ||
|
|
||
| if notify { | ||
| ctx.async_rt_handle().spawn(async move { | ||
| transfer_fut.await; | ||
| tx.send(()).unwrap(); | ||
| }); | ||
| Ok(Some(rx)) | ||
| } else { | ||
| Ok(None) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Handle potential send error in NIXL transfer notification.
The unwrap() on line 201 could panic if the receiver is dropped before the transfer completes. Consider handling this gracefully.
ctx.async_rt_handle().spawn(async move {
transfer_fut.await;
- tx.send(()).unwrap();
+ let _ = tx.send(()); // Receiver may have been dropped
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| TransferStrategy::Nixl(transfer_type) => { | |
| let transfer_fut = nixl::write_blocks_to(sources, targets, &ctx, transfer_type)?; | |
| if notify { | |
| ctx.async_rt_handle().spawn(async move { | |
| transfer_fut.await; | |
| tx.send(()).unwrap(); | |
| }); | |
| Ok(Some(rx)) | |
| } else { | |
| Ok(None) | |
| } | |
| } | |
| TransferStrategy::Nixl(transfer_type) => { | |
| let transfer_fut = nixl::write_blocks_to(sources, targets, &ctx, transfer_type)?; | |
| if notify { | |
| ctx.async_rt_handle().spawn(async move { | |
| transfer_fut.await; | |
| let _ = tx.send(()); // Receiver may have been dropped | |
| }); | |
| Ok(Some(rx)) | |
| } else { | |
| Ok(None) | |
| } | |
| } |
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/block/transfer.rs around lines 195 to 207, the call
to tx.send(()).unwrap() can panic if the receiver is dropped early. Replace
unwrap() with proper error handling by matching on the result of tx.send(()),
logging or ignoring the error gracefully to avoid panics during notification
sending.
| impl MockTensor { | ||
| fn new(shape: Vec<usize>) -> Self { | ||
| let allocator = DeviceAllocator::new(0).unwrap(); | ||
|
|
||
| // Multiply by 2 for fp16. | ||
| let size = shape.iter().product::<usize>() * 2; | ||
|
|
||
| let device_storage = std::mem::ManuallyDrop::new(allocator.allocate(size).unwrap()); | ||
|
|
||
| let ptr = device_storage.addr(); | ||
| Self { ptr, size, shape } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix memory leak in MockTensor implementation.
The ManuallyDrop wrapper prevents the DeviceStorage from being dropped, causing a memory leak. Each test creates multiple MockTensor instances without cleanup.
Consider implementing a proper Drop trait or using a different approach:
#[derive(Clone, Debug)]
struct MockTensor {
ptr: u64,
size: usize,
shape: Vec<usize>,
+ _storage: Arc<DeviceStorage>,
}
impl MockTensor {
fn new(shape: Vec<usize>) -> Self {
let allocator = DeviceAllocator::new(0).unwrap();
// Multiply by 2 for fp16.
let size = shape.iter().product::<usize>() * 2;
- let device_storage = std::mem::ManuallyDrop::new(allocator.allocate(size).unwrap());
+ let device_storage = Arc::new(allocator.allocate(size).unwrap());
let ptr = device_storage.addr();
- Self { ptr, size, shape }
+ Self { ptr, size, shape, _storage: device_storage }
}
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/distributed.rs around lines 49 to 61, the use of
ManuallyDrop for device_storage in MockTensor prevents automatic cleanup,
causing a memory leak. To fix this, implement the Drop trait for MockTensor to
explicitly free the allocated device_storage when the MockTensor instance is
dropped, or avoid using ManuallyDrop and manage the device_storage ownership
properly so it is dropped automatically.
| pub fn get_transfer_mechanism<SrcL: LocalityProvider, DstL: LocalityProvider>( | ||
| ) -> locality::TransferMechanism { | ||
| use crate::block_manager::block::locality; | ||
|
|
||
| let src_name = std::any::type_name::<SrcL>(); | ||
| let dst_name = std::any::type_name::<DstL>(); | ||
|
|
||
| // Same locality transfers | ||
| if src_name == dst_name { | ||
| if src_name.contains("Local") { | ||
| return locality::TransferMechanism::DirectMemory; | ||
| } else if src_name.contains("MockLogical") { | ||
| return locality::TransferMechanism::MockCounting; | ||
| } else if src_name.contains("Logical") { | ||
| return locality::TransferMechanism::RemoteRpc; | ||
| } | ||
| } | ||
|
|
||
| // Future: Cross-locality transfers | ||
| // Local -> Logical: RemoteRpc | ||
| // Logical -> Local: RemoteRpc | ||
| // etc. | ||
|
|
||
| locality::TransferMechanism::Unsupported | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
String matching on type names is fragile.
The function relies on type names containing specific substrings, which could break if types are renamed or reorganized.
Consider using a trait-based approach for more robust type detection:
trait LocalityType {
const TRANSFER_MECHANISM: TransferMechanism;
}
impl LocalityType for locality::Local {
const TRANSFER_MECHANISM: TransferMechanism = TransferMechanism::DirectMemory;
}
impl LocalityType for locality::MockLogical {
const TRANSFER_MECHANISM: TransferMechanism = TransferMechanism::MockCounting;
}
// Then use:
pub fn get_transfer_mechanism<SrcL: LocalityProvider + LocalityType, DstL: LocalityProvider + LocalityType>() -> TransferMechanism {
if TypeId::of::<SrcL>() == TypeId::of::<DstL>() {
return SrcL::TRANSFER_MECHANISM;
}
TransferMechanism::Unsupported
}🤖 Prompt for AI Agents
In lib/llm/src/block_manager/block/transfer_v2/coordinators.rs around lines 290
to 314, the current function uses fragile string matching on type names to
determine the transfer mechanism. Replace this with a trait-based approach by
defining a LocalityType trait with an associated constant TRANSFER_MECHANISM
implemented for each locality type. Then update get_transfer_mechanism to
require SrcL and DstL to implement LocalityType and LocalityProvider, compare
their TypeIds for equality, and return the corresponding TRANSFER_MECHANISM
constant from SrcL. Return Unsupported otherwise. This removes reliance on
string matching and makes the code more robust to type renames.
| @@ -0,0 +1,381 @@ | |||
| use super::*; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing copyright header.
The file is missing the required SPDX copyright header.
Add the copyright header:
+// SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// SPDX-License-Identifier: Apache-2.0
+
use super::*;🧰 Tools
🪛 GitHub Actions: Copyright Checks
[error] 1-1: Invalid or missing copyright header detected.
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/block/transfer_v2/coordinators.rs at line 1, the
file is missing the required SPDX copyright header. Add the appropriate
copyright header comment at the very top of the file before any code or use
statements to comply with licensing requirements.
| @@ -0,0 +1,163 @@ | |||
| /// Macro to generate WriteTo implementations for specific storage type combinations | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing copyright header.
The file is missing the required SPDX copyright header.
Add the copyright header:
+// SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+// SPDX-License-Identifier: Apache-2.0
+
/// Macro to generate WriteTo implementations for specific storage type combinations📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Macro to generate WriteTo implementations for specific storage type combinations | |
| // SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| // SPDX-License-Identifier: Apache-2.0 | |
| /// Macro to generate WriteTo implementations for specific storage type combinations |
🧰 Tools
🪛 GitHub Actions: Copyright Checks
[error] 1-1: Invalid or missing copyright header detected.
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/block/transfer_v2/macros.rs at line 1, the file is
missing the required SPDX copyright header. Add the appropriate copyright header
at the very top of the file before any code or comments to comply with licensing
requirements.
| // SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| // SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| // SPDX-License-Identifier: Apache-2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove duplicate copyright header.
Lines 1-2 contain a duplicate copyright header.
-// SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| // SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| // SPDX-License-Identifier: Apache-2.0 | |
| // SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| // SPDX-License-Identifier: Apache-2.0 |
🤖 Prompt for AI Agents
In lib/llm/src/block_manager/distributed/worker.rs at lines 1 to 3, there is a
duplicate copyright header on lines 1 and 2. Remove the redundant copyright
header on line 2 so that only one copyright header remains at the top of the
file.
|
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (3.3.7)lib/bindings/python/src/dynamo/llm/__init__.pylib/bindings/python/src/dynamo/llm/vllm_integration/__init__.pylib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_manager.py
🔧 Clippy (1.86.0)error: failed to get Caused by: Caused by: Caused by: Caused by: Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
#1771) Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
|
This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information. |
Summary by CodeRabbit
New Features
Enhancements
Bug Fixes
Documentation
Tests
End-users benefit from improved distributed KV cache management, enhanced performance and scalability, and expanded integration capabilities with vLLM and Python.