-
Notifications
You must be signed in to change notification settings - Fork 680
feat: vllm mock workers, Rusty skeleton #1033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
77 commits
Select commit
Hold shift + click to select a range
ee0ce47
generic LRU evictor
PeaBrane 6cb8430
sequence hash with depth (needed for eviction)
PeaBrane 997404e
small note about derived traits (Ord and PartialOrd)
PeaBrane 7c52927
skeleton for mock workers
PeaBrane 2864578
rename to mocker.rs
PeaBrane b340b8c
rm useless comments
PeaBrane bf98163
compute seq hashes from block hashes
PeaBrane 1670658
test for seq hash compute
PeaBrane dbf0b53
multi mock workers
PeaBrane ea59a13
using indexmap
PeaBrane 6fcee91
license
PeaBrane a064672
active sequence refactor
PeaBrane 31add93
async move blocks
PeaBrane a76fd70
fat refactor
PeaBrane ca220db
revert kv_router back
PeaBrane 4f14f1a
token blocks processing into tokens.rs
PeaBrane 1d375a5
no need indexmap
PeaBrane 2a71637
just use Unref in place of Free
PeaBrane 78b7596
Unref -> Deref
PeaBrane 7000656
evictor bug fixes and more stringent tests
PeaBrane 60d7df6
unify lock ordering (todo fix evictor cleanup)
PeaBrane 2d226a4
rust todo macro
PeaBrane 2c9aeb1
actual commit to fix deadlock
PeaBrane 40dbbb4
clippy + fmt
PeaBrane fab73c1
fix cleanup bug in evictor
PeaBrane 1ce222f
move_block_tx
PeaBrane 530bd7c
More stringent move management test
PeaBrane 32223a9
watermark and eviction panic
PeaBrane 3af946b
batch hashes in single event
PeaBrane 4eead00
allow promotion of block to a full block
PeaBrane be4d6f3
more stringent partial and full block testing
PeaBrane 8c7ea0c
assume block_size > 1 and simplify logic
PeaBrane 083b3fd
ActiveSequence sends block events
PeaBrane 9af97f9
refactor core as kv_manager
PeaBrane a8dae90
initial scheduler impl
PeaBrane 385657d
clippy + fmt
PeaBrane f3c0229
cancellation token
PeaBrane 0c1fbb1
right number of output tokens in test
PeaBrane db2b1f2
idiomatic guard for empty waiting queue
PeaBrane ffab0b6
make kv manager synchronous, and remove message passing
PeaBrane a2298a4
denest with guard
PeaBrane e25748f
refactor out free method
PeaBrane ca39c18
only success and failure response
PeaBrane f90a73c
allow kv manager to probe how many new blocks would be needed
PeaBrane 2f0b920
prefill cost simulation
PeaBrane 56523a4
fix logic of request to sequence conv
PeaBrane b001f1d
fwd pass metric
PeaBrane 422f94c
allow reset (preemption)
PeaBrane e8e8175
restructure scheduler state
PeaBrane f733a68
make fields private
PeaBrane e410fe3
reset the partial block
PeaBrane 7f8f341
get move block response
PeaBrane 2ceb8a5
let SchedulerState handle some logic
PeaBrane bff9057
let evictor handle timestamp
PeaBrane e9226a2
avoid unnecessary clones
PeaBrane 74c2885
pre-emption
PeaBrane 2b39ea7
fmt
PeaBrane 3764afa
no need to expose request sender
PeaBrane 05ad9c0
move the prefill cost logic into scheduler state
PeaBrane 89f774f
decoding counts for 1 batched token
PeaBrane 869c261
more comments
PeaBrane 81e1999
Merge branch 'main' into rupei/vllm-evictor
PeaBrane 728e1fd
small comment on test
PeaBrane ec842d9
unnecessary cast
PeaBrane 2ed8a06
chore: i love let then
PeaBrane 5608d84
a test with caching, and more stringent asserts in kv manager
PeaBrane e52030a
chore: more denesting
PeaBrane 85340e3
derive getters
PeaBrane 7f621ac
remove dummy MoveBlockResponse protocol, just use bool
PeaBrane 0c89c86
no need for default evictor
PeaBrane 0efcd6d
reorganized defaults
PeaBrane bd98cbf
debloat, no task handle and derive clone
PeaBrane 968f989
improved docs + kv cached based decoding time estimation
PeaBrane f38e51a
integrate with tokens
PeaBrane 8faa5fc
better but still bugged after interfacing with tokens (cannot promote…
PeaBrane b1749ce
fix: reset creation signal as well
PeaBrane fa280e1
don't flood the logs in unit tests
PeaBrane File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| // SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| // | ||
| // Licensed under the Apache License, Version 2.0 (the "License"); | ||
| // you may not use this file except in compliance with the License. | ||
| // You may obtain a copy of the License at | ||
| // | ||
| // http://www.apache.org/licenses/LICENSE-2.0 | ||
| // | ||
| // Unless required by applicable law or agreed to in writing, software | ||
| // distributed under the License is distributed on an "AS IS" BASIS, | ||
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| // See the License for the specific language governing permissions and | ||
| // limitations under the License. | ||
|
|
||
| pub mod evictor; | ||
| pub mod kv_manager; | ||
| pub mod protocols; | ||
| pub mod scheduler; | ||
| pub mod sequence; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,191 @@ | ||
| // SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| // | ||
| // Licensed under the Apache License, Version 2.0 (the "License"); | ||
| // you may not use this file except in compliance with the License. | ||
| // You may obtain a copy of the License at | ||
| // | ||
| // http://www.apache.org/licenses/LICENSE-2.0 | ||
| // | ||
| // Unless required by applicable law or agreed to in writing, software | ||
| // distributed under the License is distributed on an "AS IS" BASIS, | ||
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| // See the License for the specific language governing permissions and | ||
| // limitations under the License. | ||
|
|
||
| use std::cmp::Eq; | ||
| use std::collections::{HashMap, VecDeque}; | ||
| use std::hash::Hash; | ||
| use std::time::Instant; | ||
|
|
||
| /// An LRU evictor that maintains objects and evicts them based on their | ||
| /// last accessed time. Implements a "lazy" eviction mechanism where: | ||
| /// 1. The priority queue does not immediately reflect updates or removes | ||
| /// 2. Objects are pushed to the queue in order of increasing priority (older objects first) | ||
| /// 3. The user must ensure objects are added in correct priority (temporal order) | ||
| /// 4. Remove and update operations are lazy - entries remain in the queue until | ||
| /// they are either evicted or cleaned up during maintenance | ||
| #[derive(Debug)] | ||
| pub struct LRUEvictor<T: Clone + Eq + Hash> { | ||
| free_table: HashMap<T, f64>, | ||
| priority_queue: VecDeque<(T, f64)>, | ||
| cleanup_threshold: usize, | ||
| start_time: Instant, | ||
| } | ||
|
|
||
| impl<T: Clone + Eq + Hash> Default for LRUEvictor<T> { | ||
| fn default() -> Self { | ||
| Self { | ||
| free_table: HashMap::new(), | ||
| priority_queue: VecDeque::new(), | ||
| cleanup_threshold: 50, | ||
| start_time: Instant::now(), | ||
| } | ||
| } | ||
| } | ||
|
|
||
| impl<T: Clone + Eq + Hash> LRUEvictor<T> { | ||
| /// Create a new LRUEvictor with the default cleanup threshold | ||
| pub fn new(cleanup_threshold: usize) -> Self { | ||
| Self { | ||
| cleanup_threshold, | ||
| ..Default::default() | ||
| } | ||
| } | ||
|
|
||
| /// Get the current timestamp as seconds since initialization | ||
| pub fn current_timestamp(&self) -> f64 { | ||
| self.start_time.elapsed().as_secs_f64() | ||
| } | ||
|
|
||
| /// Get an iterator over the keys in the evictor | ||
| pub fn keys(&self) -> std::collections::hash_map::Keys<'_, T, f64> { | ||
| self.free_table.keys() | ||
| } | ||
|
|
||
| /// Insert or update an object in the evictor with current timestamp | ||
| pub fn insert(&mut self, object: T) { | ||
| let timestamp = self.current_timestamp(); | ||
| self._insert(object, timestamp); | ||
| } | ||
|
|
||
| /// Check if the evictor contains the given object | ||
| pub fn contains(&self, object: &T) -> bool { | ||
| self.free_table.contains_key(object) | ||
| } | ||
|
|
||
| /// Evict an object based on LRU policy | ||
| /// Returns the evicted object or None if no objects are available | ||
| pub fn evict(&mut self) -> Option<T> { | ||
| if self.free_table.is_empty() { | ||
| return None; | ||
| } | ||
|
|
||
| while let Some((object, last_accessed)) = self.priority_queue.pop_front() { | ||
| let Some(¤t_last_accessed) = self.free_table.get(&object) else { | ||
| continue; // entry is already removed | ||
| }; | ||
|
|
||
| if current_last_accessed == last_accessed { | ||
| self.free_table.remove(&object); | ||
| return Some(object); | ||
| } // otherwise entry is stale | ||
| } | ||
|
|
||
| None | ||
| } | ||
|
|
||
| /// Insert or update an object in the evictor | ||
| fn _insert(&mut self, object: T, last_accessed: f64) { | ||
| self.free_table.insert(object.clone(), last_accessed); | ||
| self.priority_queue.push_back((object, last_accessed)); | ||
| self.cleanup_if_necessary(); | ||
| } | ||
|
|
||
| /// Remove an object from the evictor | ||
| /// We don't remove from the priority queue immediately, as that would be inefficient | ||
| /// Outdated entries will be filtered out during eviction or cleanup | ||
| pub fn remove(&mut self, object: &T) -> bool { | ||
| self.free_table.remove(object).is_some() | ||
| } | ||
|
|
||
| /// Get the number of objects in the evictor | ||
| pub fn len(&self) -> usize { | ||
| self.free_table.len() | ||
| } | ||
|
|
||
| /// Check if the evictor is empty | ||
| pub fn is_empty(&self) -> bool { | ||
| self.free_table.is_empty() | ||
| } | ||
|
|
||
| /// Check if cleanup is necessary and perform it if needed | ||
| fn cleanup_if_necessary(&mut self) { | ||
| if self.priority_queue.len() > self.cleanup_threshold * self.free_table.len() { | ||
| self.cleanup(); | ||
| } | ||
| } | ||
|
|
||
| /// Clean up the priority queue by removing outdated entries | ||
| fn cleanup(&mut self) { | ||
| let mut new_priority_queue = VecDeque::new(); | ||
| for (object, timestamp) in self.priority_queue.drain(..) { | ||
| let Some(¤t_timestamp) = self.free_table.get(&object) else { | ||
| continue; | ||
| }; | ||
|
|
||
| if current_timestamp == timestamp { | ||
| new_priority_queue.push_back((object, timestamp)); | ||
| } | ||
| } | ||
| self.priority_queue = new_priority_queue; | ||
| } | ||
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { | ||
| use super::*; | ||
| use rstest::rstest; | ||
|
|
||
| #[rstest] | ||
| #[case(1)] | ||
| #[case(2)] | ||
| #[case(3)] | ||
| fn test_lru_evictor_eviction_order(#[case] threshold: usize) { | ||
| // Create a new LRUEvictor with the given cleanup threshold | ||
| let mut evictor = LRUEvictor::<i32>::new(threshold); | ||
|
|
||
| // Add items in the specified order with small delays between each | ||
| evictor.insert(4); | ||
| std::thread::sleep(std::time::Duration::from_millis(1)); | ||
| evictor.insert(3); | ||
| std::thread::sleep(std::time::Duration::from_millis(1)); | ||
| evictor.insert(2); | ||
| std::thread::sleep(std::time::Duration::from_millis(1)); | ||
| evictor.insert(1); | ||
| std::thread::sleep(std::time::Duration::from_millis(1)); | ||
| evictor.insert(5); | ||
| std::thread::sleep(std::time::Duration::from_millis(1)); | ||
| evictor.insert(1); // Updates timestamp for 1 | ||
| std::thread::sleep(std::time::Duration::from_millis(1)); | ||
| evictor.insert(4); // Updates timestamp for 4 | ||
| std::thread::sleep(std::time::Duration::from_millis(1)); | ||
| evictor.insert(2); // Updates timestamp for 2 | ||
|
|
||
| // Verify the eviction order | ||
| println!("Testing with threshold {}", threshold); | ||
| let evicted = evictor.evict().unwrap(); | ||
| assert_eq!(evicted, 3); | ||
| let evicted = evictor.evict().unwrap(); | ||
| assert_eq!(evicted, 5); | ||
| let evicted = evictor.evict().unwrap(); | ||
| assert_eq!(evicted, 1); | ||
| let evicted = evictor.evict().unwrap(); | ||
| assert_eq!(evicted, 4); | ||
| let evicted = evictor.evict().unwrap(); | ||
| assert_eq!(evicted, 2); | ||
| let evicted = evictor.evict(); | ||
| assert_eq!(evicted, None); | ||
| assert_eq!(evictor.len(), 0); | ||
| } | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.