Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
ee0ce47
generic LRU evictor
PeaBrane May 11, 2025
6cb8430
sequence hash with depth (needed for eviction)
PeaBrane May 12, 2025
997404e
small note about derived traits (Ord and PartialOrd)
PeaBrane May 12, 2025
7c52927
skeleton for mock workers
PeaBrane May 12, 2025
2864578
rename to mocker.rs
PeaBrane May 12, 2025
b340b8c
rm useless comments
PeaBrane May 12, 2025
bf98163
compute seq hashes from block hashes
PeaBrane May 12, 2025
1670658
test for seq hash compute
PeaBrane May 12, 2025
dbf0b53
multi mock workers
PeaBrane May 12, 2025
ea59a13
using indexmap
PeaBrane May 13, 2025
6fcee91
license
PeaBrane May 13, 2025
a064672
active sequence refactor
PeaBrane May 13, 2025
31add93
async move blocks
PeaBrane May 13, 2025
a76fd70
fat refactor
PeaBrane May 13, 2025
ca220db
revert kv_router back
PeaBrane May 13, 2025
4f14f1a
token blocks processing into tokens.rs
PeaBrane May 13, 2025
1d375a5
no need indexmap
PeaBrane May 13, 2025
2a71637
just use Unref in place of Free
PeaBrane May 13, 2025
78b7596
Unref -> Deref
PeaBrane May 13, 2025
7000656
evictor bug fixes and more stringent tests
PeaBrane May 13, 2025
60d7df6
unify lock ordering (todo fix evictor cleanup)
PeaBrane May 13, 2025
2d226a4
rust todo macro
PeaBrane May 13, 2025
2c9aeb1
actual commit to fix deadlock
PeaBrane May 13, 2025
40dbbb4
clippy + fmt
PeaBrane May 13, 2025
fab73c1
fix cleanup bug in evictor
PeaBrane May 13, 2025
1ce222f
move_block_tx
PeaBrane May 13, 2025
530bd7c
More stringent move management test
PeaBrane May 14, 2025
32223a9
watermark and eviction panic
PeaBrane May 14, 2025
3af946b
batch hashes in single event
PeaBrane May 14, 2025
4eead00
allow promotion of block to a full block
PeaBrane May 14, 2025
be4d6f3
more stringent partial and full block testing
PeaBrane May 14, 2025
8c7ea0c
assume block_size > 1 and simplify logic
PeaBrane May 14, 2025
083b3fd
ActiveSequence sends block events
PeaBrane May 14, 2025
9af97f9
refactor core as kv_manager
PeaBrane May 14, 2025
a8dae90
initial scheduler impl
PeaBrane May 14, 2025
385657d
clippy + fmt
PeaBrane May 14, 2025
f3c0229
cancellation token
PeaBrane May 14, 2025
0c1fbb1
right number of output tokens in test
PeaBrane May 15, 2025
db2b1f2
idiomatic guard for empty waiting queue
PeaBrane May 15, 2025
ffab0b6
make kv manager synchronous, and remove message passing
PeaBrane May 16, 2025
a2298a4
denest with guard
PeaBrane May 16, 2025
e25748f
refactor out free method
PeaBrane May 16, 2025
ca39c18
only success and failure response
PeaBrane May 16, 2025
f90a73c
allow kv manager to probe how many new blocks would be needed
PeaBrane May 16, 2025
2f0b920
prefill cost simulation
PeaBrane May 17, 2025
56523a4
fix logic of request to sequence conv
PeaBrane May 17, 2025
b001f1d
fwd pass metric
PeaBrane May 17, 2025
422f94c
allow reset (preemption)
PeaBrane May 17, 2025
e8e8175
restructure scheduler state
PeaBrane May 17, 2025
f733a68
make fields private
PeaBrane May 18, 2025
e410fe3
reset the partial block
PeaBrane May 18, 2025
7f8f341
get move block response
PeaBrane May 18, 2025
2ceb8a5
let SchedulerState handle some logic
PeaBrane May 19, 2025
bff9057
let evictor handle timestamp
PeaBrane May 19, 2025
e9226a2
avoid unnecessary clones
PeaBrane May 19, 2025
74c2885
pre-emption
PeaBrane May 19, 2025
2b39ea7
fmt
PeaBrane May 19, 2025
3764afa
no need to expose request sender
PeaBrane May 19, 2025
05ad9c0
move the prefill cost logic into scheduler state
PeaBrane May 19, 2025
89f774f
decoding counts for 1 batched token
PeaBrane May 19, 2025
869c261
more comments
PeaBrane May 19, 2025
81e1999
Merge branch 'main' into rupei/vllm-evictor
PeaBrane May 19, 2025
728e1fd
small comment on test
PeaBrane May 19, 2025
ec842d9
unnecessary cast
PeaBrane May 19, 2025
2ed8a06
chore: i love let then
PeaBrane May 19, 2025
5608d84
a test with caching, and more stringent asserts in kv manager
PeaBrane May 19, 2025
e52030a
chore: more denesting
PeaBrane May 19, 2025
85340e3
derive getters
PeaBrane May 19, 2025
7f621ac
remove dummy MoveBlockResponse protocol, just use bool
PeaBrane May 19, 2025
0c89c86
no need for default evictor
PeaBrane May 19, 2025
0efcd6d
reorganized defaults
PeaBrane May 19, 2025
bd98cbf
debloat, no task handle and derive clone
PeaBrane May 19, 2025
968f989
improved docs + kv cached based decoding time estimation
PeaBrane May 21, 2025
f38e51a
integrate with tokens
PeaBrane May 21, 2025
8faa5fc
better but still bugged after interfacing with tokens (cannot promote…
PeaBrane May 21, 2025
b1749ce
fix: reset creation signal as well
PeaBrane May 21, 2025
fa280e1
don't flood the logs in unit tests
PeaBrane May 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions lib/llm/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ pub mod hub;
pub mod key_value_store;
pub mod kv_router;
pub use kv_router::DEFAULT_KV_BLOCK_SIZE;
pub mod mocker;
pub mod model_card;
pub mod model_type;
pub mod preprocessor;
Expand Down
20 changes: 20 additions & 0 deletions lib/llm/src/mocker.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
// SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

pub mod evictor;
pub mod kv_manager;
pub mod protocols;
pub mod scheduler;
pub mod sequence;
191 changes: 191 additions & 0 deletions lib/llm/src/mocker/evictor.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
// SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

use std::cmp::Eq;
use std::collections::{HashMap, VecDeque};
use std::hash::Hash;
use std::time::Instant;

/// An LRU evictor that maintains objects and evicts them based on their
/// last accessed time. Implements a "lazy" eviction mechanism where:
/// 1. The priority queue does not immediately reflect updates or removes
/// 2. Objects are pushed to the queue in order of increasing priority (older objects first)
/// 3. The user must ensure objects are added in correct priority (temporal order)
/// 4. Remove and update operations are lazy - entries remain in the queue until
/// they are either evicted or cleaned up during maintenance
#[derive(Debug)]
pub struct LRUEvictor<T: Clone + Eq + Hash> {
free_table: HashMap<T, f64>,
priority_queue: VecDeque<(T, f64)>,
cleanup_threshold: usize,
start_time: Instant,
}

impl<T: Clone + Eq + Hash> Default for LRUEvictor<T> {
fn default() -> Self {
Self {
free_table: HashMap::new(),
priority_queue: VecDeque::new(),
cleanup_threshold: 50,
start_time: Instant::now(),
}
}
}

impl<T: Clone + Eq + Hash> LRUEvictor<T> {
/// Create a new LRUEvictor with the default cleanup threshold
pub fn new(cleanup_threshold: usize) -> Self {
Self {
cleanup_threshold,
..Default::default()
}
}

/// Get the current timestamp as seconds since initialization
pub fn current_timestamp(&self) -> f64 {
self.start_time.elapsed().as_secs_f64()
}

/// Get an iterator over the keys in the evictor
pub fn keys(&self) -> std::collections::hash_map::Keys<'_, T, f64> {
self.free_table.keys()
}

/// Insert or update an object in the evictor with current timestamp
pub fn insert(&mut self, object: T) {
let timestamp = self.current_timestamp();
self._insert(object, timestamp);
}

/// Check if the evictor contains the given object
pub fn contains(&self, object: &T) -> bool {
self.free_table.contains_key(object)
}

/// Evict an object based on LRU policy
/// Returns the evicted object or None if no objects are available
pub fn evict(&mut self) -> Option<T> {
if self.free_table.is_empty() {
return None;
}

while let Some((object, last_accessed)) = self.priority_queue.pop_front() {
let Some(&current_last_accessed) = self.free_table.get(&object) else {
continue; // entry is already removed
};

if current_last_accessed == last_accessed {
self.free_table.remove(&object);
return Some(object);
} // otherwise entry is stale
}

None
}

/// Insert or update an object in the evictor
fn _insert(&mut self, object: T, last_accessed: f64) {
self.free_table.insert(object.clone(), last_accessed);
self.priority_queue.push_back((object, last_accessed));
self.cleanup_if_necessary();
}

/// Remove an object from the evictor
/// We don't remove from the priority queue immediately, as that would be inefficient
/// Outdated entries will be filtered out during eviction or cleanup
pub fn remove(&mut self, object: &T) -> bool {
self.free_table.remove(object).is_some()
}

/// Get the number of objects in the evictor
pub fn len(&self) -> usize {
self.free_table.len()
}

/// Check if the evictor is empty
pub fn is_empty(&self) -> bool {
self.free_table.is_empty()
}

/// Check if cleanup is necessary and perform it if needed
fn cleanup_if_necessary(&mut self) {
if self.priority_queue.len() > self.cleanup_threshold * self.free_table.len() {
self.cleanup();
}
}

/// Clean up the priority queue by removing outdated entries
fn cleanup(&mut self) {
let mut new_priority_queue = VecDeque::new();
for (object, timestamp) in self.priority_queue.drain(..) {
let Some(&current_timestamp) = self.free_table.get(&object) else {
continue;
};

if current_timestamp == timestamp {
new_priority_queue.push_back((object, timestamp));
}
}
self.priority_queue = new_priority_queue;
}
}

#[cfg(test)]
mod tests {
use super::*;
use rstest::rstest;

#[rstest]
#[case(1)]
#[case(2)]
#[case(3)]
fn test_lru_evictor_eviction_order(#[case] threshold: usize) {
// Create a new LRUEvictor with the given cleanup threshold
let mut evictor = LRUEvictor::<i32>::new(threshold);

// Add items in the specified order with small delays between each
evictor.insert(4);
std::thread::sleep(std::time::Duration::from_millis(1));
evictor.insert(3);
std::thread::sleep(std::time::Duration::from_millis(1));
evictor.insert(2);
std::thread::sleep(std::time::Duration::from_millis(1));
evictor.insert(1);
std::thread::sleep(std::time::Duration::from_millis(1));
evictor.insert(5);
std::thread::sleep(std::time::Duration::from_millis(1));
evictor.insert(1); // Updates timestamp for 1
std::thread::sleep(std::time::Duration::from_millis(1));
evictor.insert(4); // Updates timestamp for 4
std::thread::sleep(std::time::Duration::from_millis(1));
evictor.insert(2); // Updates timestamp for 2

// Verify the eviction order
println!("Testing with threshold {}", threshold);
let evicted = evictor.evict().unwrap();
assert_eq!(evicted, 3);
let evicted = evictor.evict().unwrap();
assert_eq!(evicted, 5);
let evicted = evictor.evict().unwrap();
assert_eq!(evicted, 1);
let evicted = evictor.evict().unwrap();
assert_eq!(evicted, 4);
let evicted = evictor.evict().unwrap();
assert_eq!(evicted, 2);
let evicted = evictor.evict();
assert_eq!(evicted, None);
assert_eq!(evictor.len(), 0);
}
}
Loading
Loading