-
Notifications
You must be signed in to change notification settings - Fork 680
refactor: move lib/llm/src/tokens.rs to lib/tokens #2050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi da-x! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
|
FYI: @ryanolson |
WalkthroughThis set of changes migrates the token handling logic from Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant LLM
participant DynamoTokens
Client->>LLM: Use token-related functionality
LLM->>DynamoTokens: Forward all token operations (via re-export)
DynamoTokens-->>LLM: Provide token block/sequence handling, hashing, errors
LLM-->>Client: Return token operation results
Estimated code review effort4 (~90 minutes) Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
lib/tokens/src/lib.rs (1)
90-102: Add safety documentation for i32 to u32 conversion.The conversion from
i32toToken(u32) usingascast can produce unexpected results for negative values (e.g., -1 becomes u32::MAX). Consider documenting this behavior or using a safer conversion method.Add documentation:
impl From<Vec<i32>> for Tokens { - /// Converts `Vec<i32>` to `Tokens`, casting each `i32` to `u32`. + /// Converts `Vec<i32>` to `Tokens`, casting each `i32` to `u32`. + /// + /// # Safety + /// Negative values will wrap around (e.g., -1 becomes u32::MAX). + /// Ensure input values are non-negative if this behavior is not desired. fn from(tokens: Vec<i32>) -> Self { Tokens(tokens.into_iter().map(|t| t as u32).collect()) } } impl From<&[i32]> for Tokens { - /// Converts `&[i32]` to `Tokens`, casting each `i32` to `u32`. + /// Converts `&[i32]` to `Tokens`, casting each `i32` to `u32`. + /// + /// # Safety + /// Negative values will wrap around (e.g., -1 becomes u32::MAX). + /// Ensure input values are non-negative if this behavior is not desired. fn from(tokens: &[i32]) -> Self { Tokens(tokens.iter().map(|&t| t as u32).collect()) } }
🧹 Nitpick comments (3)
lib/llm/src/tokens.rs (1)
15-15: Consider using explicit re-exports instead of wildcard.While the wildcard re-export maintains backward compatibility, it might expose more items than intended. Consider explicitly listing the public API items to maintain better control over what's exposed.
For example:
-pub use dynamo_tokens::*; +pub use dynamo_tokens::{ + Token, Salt, SaltHash, BlockHash, SequenceHash, + compute_hash_v2, Tokens, TokenBlockError, + PartialTokenBlock, TokenBlock, TokenBlockSequence, + blocks +};lib/tokens/src/lib.rs (2)
110-141: Remove redundant PartialEq implementations.Since
TokensderivesEq(line 55), the explicitPartialEqimplementations forVec<Token>,[Token], and&[Token]are unnecessary. The derived implementation already handles these cases.-// PartialEq implementations for comparing Tokens with Vec<Token> and &[Token] -// (Generated implementations are usually sufficient, but explicit ones can be clearer) -impl PartialEq<Vec<Token>> for Tokens { - fn eq(&self, other: &Vec<Token>) -> bool { - self.0 == *other - } -} - -impl PartialEq<Tokens> for Vec<Token> { - fn eq(&self, other: &Tokens) -> bool { - *self == other.0 - } -} - -impl PartialEq<[Token]> for Tokens { - fn eq(&self, other: &[Token]) -> bool { - self.0.as_slice() == other - } -} - -impl PartialEq<Tokens> for &[Token] { - fn eq(&self, other: &Tokens) -> bool { - *self == other.0.as_slice() - } -} - -impl PartialEq for Tokens { - fn eq(&self, other: &Self) -> bool { - self.0 == other.0 - } -}
711-760: Consider simplifying the pop() implementation.The
pop()method has complex logic for determining the last token. Consider extracting helper methods for better readability.+ /// Returns the last token in the sequence without removing it. + fn peek_last(&self) -> Option<Token> { + if !self.current_block.tokens.is_empty() { + self.current_block.tokens.last().copied() + } else { + self.blocks.last()?.tokens().last().copied() + } + } + pub fn pop(&mut self) -> Option<Token> { let current_total_len = self.total_tokens(); if current_total_len == 0 { return None; } - // Determine the last token. It must be in the current_block if current_block is not empty. - // If current_block is empty, it must be the last token of the last full block. - let last_token = if !self.current_block.tokens.is_empty() { - // Last token is in the partial block - *self - .current_block - .tokens - .last() - .expect("Current block checked for non-empty") - } else { - // Current block is empty, sequence is not. Must be in the last full block. - let last_block = self - .blocks - .last() - .expect("Sequence is not empty but has no blocks and empty current block?"); - *last_block - .tokens() - .last() - .expect("Last block cannot be empty") - }; + let last_token = self.peek_last() + .expect("Sequence has tokens but peek_last returned None"); // Truncate the sequence by one element. // We expect this to succeed since we know the length > 0. match self.truncate(current_total_len - 1) { Ok(_) => Some(last_token), Err(_) => { // This should be logically impossible if total_tokens() and truncate() are correct. // Panic in debug, return None in release as a fallback, though it indicates a bug. debug_assert!( false, "truncate failed unexpectedly after checking length in pop" ); None } } }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
lib/llm/Cargo.toml(1 hunks)lib/llm/src/tokens.rs(1 hunks)lib/tokens/Cargo.toml(1 hunks)lib/tokens/src/lib.rs(5 hunks)
🧠 Learnings (3)
📓 Common learnings
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
lib/tokens/Cargo.toml (2)
Learnt from: PeaBrane
PR: #1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.
Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
lib/llm/src/tokens.rs (1)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
lib/tokens/Cargo.toml (2)
Learnt from: PeaBrane
PR: #1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.
Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
lib/llm/src/tokens.rs (1)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - vllm
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
🔇 Additional comments (8)
lib/llm/Cargo.toml (1)
46-46: LGTM!The addition of
dynamo-tokensas a workspace dependency is appropriate for the refactoring that moves token handling logic to a separate crate.lib/tokens/Cargo.toml (1)
30-32: No changes needed:uuidandserdeare in useBoth
uuidandserdeare actively used in lib/tokens/src/blocks.rs:
use uuid::Uuid;use serde::{Deserialize, Serialize};#[derive(..., Serialize, Deserialize)]These dependencies are required and should remain in Cargo.toml.
lib/tokens/src/lib.rs (6)
27-50: Well-structured type definitions and hashing implementation.The type aliases are clear and well-documented. The use of xxhash for hashing is a good choice for performance.
165-184: Well-designed error handling.The
TokenBlockErrorenum provides clear, descriptive error variants with good error messages using thiserror.
185-354: Excellent implementation of PartialTokenBlock.The implementation is well-designed with:
- Clear API with proper error handling
- Good defensive programming (e.g., saturating_sub in remaining())
- Proper state management in commit()
- Appropriate trait implementations
356-463: Well-architected block structures.The separation between
TokenBlockChunk(for parallel hash computation) andTokenBlock(final immutable block) is a good design choice that enables efficient parallel processing while maintaining sequence integrity.
864-1517: Excellent test coverage!The test suite is comprehensive with:
- Validation of hash constants
- Edge case testing for all major operations
- Clear test organization with helper functions
- Good coverage of error conditions
25-25: Confirmedblocksmodule presence and implementation
The file lib/tokens/src/blocks.rs exists and defines:
GlobalHashtype aliasUniqueBlockenum with bothPartialBlock(Uuid)andFullBlock(GlobalHash)variants- A
Defaultimpl generating a random UUIDNo further changes required.
| // limitations under the License. | ||
|
|
||
| #![deny(missing_docs)] | ||
| #![allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove #![allow(dead_code)] attribute.
This attribute suppresses warnings about unused code. It's better to remove it and address any actual dead code warnings to keep the codebase clean.
-#![allow(dead_code)]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #![allow(dead_code)] |
🤖 Prompt for AI Agents
In lib/tokens/src/lib.rs at line 16, remove the attribute `#![allow(dead_code)]`
to stop suppressing warnings about unused code. After removing it, review and
fix any dead code warnings that appear to maintain a clean codebase.
| /// Extends the sequence with the given tokens, potentially completing multiple blocks. | ||
| /// | ||
| /// If the block is full, it will be converted into a [TokenBlock] | ||
| /// and added to the sequence. | ||
| /// This method processes all tokens from the input [`Tokens`] object. | ||
| /// If adding tokens causes one or more blocks to become full, they are committed | ||
| /// and added to the internal list of completed blocks. | ||
| /// | ||
| /// ## Example | ||
| /// # Arguments | ||
| /// | ||
| /// ```rust | ||
| /// use dynamo_tokens::{Tokens, TokenBlockSequence}; | ||
| /// let mut sequence = TokenBlockSequence::new(Tokens::default(), 4, Some(1337 as u64)); | ||
| /// * `tokens` - The [`Tokens`] object containing the tokens to extend the sequence with. | ||
| /// | ||
| /// sequence.push_token(1); | ||
| /// sequence.push_token(2); | ||
| /// sequence.push_token(3); | ||
| /// sequence.push_token(4); | ||
| /// sequence.push_token(5); | ||
| /// # Returns | ||
| /// | ||
| /// assert_eq!(sequence.blocks().len(), 1); | ||
| /// assert_eq!(sequence.current_block().tokens().len(), 1); | ||
| /// assert_eq!(sequence.blocks()[0].sequence_hash(), 14643705804678351452); | ||
| /// ``` | ||
| pub fn push_token(&mut self, token: Token) -> Option<&TokenBlock> { | ||
| if let Some(block) = self.current_block.push_token(token) { | ||
| self.blocks.push(block); | ||
| self.blocks.last() | ||
| /// * `Ok(Some(Range<usize>))` - The range of indices in the `blocks` vector corresponding | ||
| /// to the blocks completed during this `extend` operation. | ||
| /// * `Ok(None)` - If no blocks were completed. | ||
| /// * `Err(TokenBlockError)` - If an internal error occurs during commit. | ||
| pub fn extend(&mut self, tokens: Tokens) -> Result<Option<Range<usize>>, TokenBlockError> { | ||
| let start_block_index = self.blocks.len(); | ||
| let mut tokens_to_append = tokens; | ||
|
|
||
| while !tokens_to_append.is_empty() { | ||
| let remaining_in_current = self.current_block.remaining(); | ||
|
|
||
| if remaining_in_current == 0 { | ||
| // Current block is full, commit it first | ||
| let new_block = self.current_block.commit()?; | ||
| self.blocks.push(new_block); | ||
| // Continue loop to add tokens to the *new* current_block | ||
| } | ||
|
|
||
| // Push as many tokens as possible into the current (potentially new) block | ||
| let available_tokens = tokens_to_append; | ||
| tokens_to_append = self.current_block.push_tokens(available_tokens); | ||
|
|
||
| // Check if the current block *became* full after pushing tokens | ||
| if self.current_block.remaining() == 0 && !tokens_to_append.is_empty() { | ||
| // If it became full AND there are still more tokens to append, | ||
| // commit it now so the next loop iteration starts with a fresh block. | ||
| let new_block = self.current_block.commit()?; | ||
| self.blocks.push(new_block); | ||
| } | ||
| // If it became full and there are NO more tokens, the loop will exit, | ||
| // and the block remains partial but full, ready for the next append/commit. | ||
| } | ||
|
|
||
| let end_block_index = self.blocks.len(); | ||
| if start_block_index == end_block_index { | ||
| Ok(None) // No blocks were completed | ||
| } else { | ||
| None | ||
| Ok(Some(start_block_index..end_block_index)) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Document the lazy commit behavior in extend().
The extend method has a lazy commit behavior where a full block is not immediately committed to the blocks vector. This is shown in the tests (lines 1146-1148) and could be confusing for users.
Add documentation about this behavior:
/// Extends the sequence with the given tokens, potentially completing multiple blocks.
///
/// This method processes all tokens from the input [`Tokens`] object.
/// If adding tokens causes one or more blocks to become full, they are committed
/// and added to the internal list of completed blocks.
+ ///
+ /// Note: If the current block becomes exactly full after adding tokens but there are
+ /// no more tokens to add, the block remains in the current_block (not committed)
+ /// until the next operation that requires a new block.
///
/// # Arguments📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Extends the sequence with the given tokens, potentially completing multiple blocks. | |
| /// | |
| /// If the block is full, it will be converted into a [TokenBlock] | |
| /// and added to the sequence. | |
| /// This method processes all tokens from the input [`Tokens`] object. | |
| /// If adding tokens causes one or more blocks to become full, they are committed | |
| /// and added to the internal list of completed blocks. | |
| /// | |
| /// ## Example | |
| /// # Arguments | |
| /// | |
| /// ```rust | |
| /// use dynamo_tokens::{Tokens, TokenBlockSequence}; | |
| /// let mut sequence = TokenBlockSequence::new(Tokens::default(), 4, Some(1337 as u64)); | |
| /// * `tokens` - The [`Tokens`] object containing the tokens to extend the sequence with. | |
| /// | |
| /// sequence.push_token(1); | |
| /// sequence.push_token(2); | |
| /// sequence.push_token(3); | |
| /// sequence.push_token(4); | |
| /// sequence.push_token(5); | |
| /// # Returns | |
| /// | |
| /// assert_eq!(sequence.blocks().len(), 1); | |
| /// assert_eq!(sequence.current_block().tokens().len(), 1); | |
| /// assert_eq!(sequence.blocks()[0].sequence_hash(), 14643705804678351452); | |
| /// ``` | |
| pub fn push_token(&mut self, token: Token) -> Option<&TokenBlock> { | |
| if let Some(block) = self.current_block.push_token(token) { | |
| self.blocks.push(block); | |
| self.blocks.last() | |
| /// * `Ok(Some(Range<usize>))` - The range of indices in the `blocks` vector corresponding | |
| /// to the blocks completed during this `extend` operation. | |
| /// * `Ok(None)` - If no blocks were completed. | |
| /// * `Err(TokenBlockError)` - If an internal error occurs during commit. | |
| pub fn extend(&mut self, tokens: Tokens) -> Result<Option<Range<usize>>, TokenBlockError> { | |
| let start_block_index = self.blocks.len(); | |
| let mut tokens_to_append = tokens; | |
| while !tokens_to_append.is_empty() { | |
| let remaining_in_current = self.current_block.remaining(); | |
| if remaining_in_current == 0 { | |
| // Current block is full, commit it first | |
| let new_block = self.current_block.commit()?; | |
| self.blocks.push(new_block); | |
| // Continue loop to add tokens to the *new* current_block | |
| } | |
| // Push as many tokens as possible into the current (potentially new) block | |
| let available_tokens = tokens_to_append; | |
| tokens_to_append = self.current_block.push_tokens(available_tokens); | |
| // Check if the current block *became* full after pushing tokens | |
| if self.current_block.remaining() == 0 && !tokens_to_append.is_empty() { | |
| // If it became full AND there are still more tokens to append, | |
| // commit it now so the next loop iteration starts with a fresh block. | |
| let new_block = self.current_block.commit()?; | |
| self.blocks.push(new_block); | |
| } | |
| // If it became full and there are NO more tokens, the loop will exit, | |
| // and the block remains partial but full, ready for the next append/commit. | |
| } | |
| let end_block_index = self.blocks.len(); | |
| if start_block_index == end_block_index { | |
| Ok(None) // No blocks were completed | |
| } else { | |
| None | |
| Ok(Some(start_block_index..end_block_index)) | |
| } | |
| } | |
| /// Extends the sequence with the given tokens, potentially completing multiple blocks. | |
| /// | |
| /// This method processes all tokens from the input [`Tokens`] object. | |
| /// If adding tokens causes one or more blocks to become full, they are committed | |
| /// and added to the internal list of completed blocks. | |
| /// | |
| /// Note: If the current block becomes exactly full after adding tokens but there are | |
| /// no more tokens to add, the block remains in the current_block (not committed) | |
| /// until the next operation that requires a new block. | |
| /// | |
| /// # Arguments | |
| /// | |
| /// * `tokens` - The [`Tokens`] object containing the tokens to extend the sequence with. | |
| /// | |
| /// # Returns | |
| /// | |
| /// * `Ok(Some(Range<usize>))` - The range of indices in the `blocks` vector corresponding | |
| /// to the blocks completed during this `extend` operation. | |
| /// * `Ok(None)` - If no blocks were completed. | |
| /// * `Err(TokenBlockError)` - If an internal error occurs during commit. | |
| pub fn extend(&mut self, tokens: Tokens) -> Result<Option<Range<usize>>, TokenBlockError> { | |
| let start_block_index = self.blocks.len(); | |
| let mut tokens_to_append = tokens; | |
| // ... rest of method unchanged ... | |
| } |
🤖 Prompt for AI Agents
In lib/tokens/src/lib.rs around lines 513 to 564, the extend() method has a lazy
commit behavior where a full block is not immediately committed to the blocks
vector, which can be confusing. Update the method's documentation to explicitly
state that when a block becomes full after pushing tokens, it is not always
immediately committed; instead, the commit may be deferred until more tokens are
appended or the next operation. Clarify this lazy commit behavior so users
understand when blocks are actually committed during the extend process.
The crate `lib/tokens` was not really used and contained old duplicate code. Looks like the origin intention was to split tokens functionality out of `dynamo-runtime`, so let's complete it. To avoid too many changes, keep a redirect with `pub use dynamo_tokens::*;`. Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
|
This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information. |
The crate
lib/tokenswas not really used and contained old duplicate code. Looks like the origin intention was to split tokens functionality out ofdynamo-runtime, so let's complete it.To avoid too many changes, keep a redirect with
pub use dynamo_tokens::*;.Summary by CodeRabbit
Refactor
New Features
Bug Fixes
Chores