-
Notifications
You must be signed in to change notification settings - Fork 420
Add LRU in MasterService, complexity O(1) #287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution. Here are a few points that might warrant consideration: First, if eviction support were implemented, the garbage collection (GC) related functionalities could potentially be removed. Second, as reflected in the codebase, our current implementation utilizes lock sharding, wherein each object is safeguarded by the lock corresponding to its shard. Implementing a Least Recently Used (LRU) eviction strategy, however, would likely necessitate a global lock to manage the LRU queue, potentially introducing performance bottlenecks due to contention. My proposal involves repurposing the existing GC thread into a eviction thread. When eviction becomes necessary—perhaps triggered by monitoring specific watermarks or thresholds—this thread could select a target shard, acquire the lock exclusive to that shard, and subsequently perform eviction operations solely within its confines. This approach aims to circumvent the performance limitations associated with a global lock. |
|
I've conceived a potentially simpler approach that we could discuss. If we modify the garbage collection (GC) producer's operation from 'get' to 'put_end', would this effectively function as a First-In, First-Out (FIFO) eviction mechanism? |
Thanks for your constructive suggestions! |
Exactly yes, we have tested this method, while in real cases, FIFO is not always the best eviction method, which client will frequently get a subset of a large prefilled datasets. |
|
Can you ensure that the LRU implementation is thread-safe? |
Work in process, later will fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces an LRU mechanism into the MasterService to prevent discarding put() requests when the store reaches maximum capacity. Key changes include:
- Initializing and clearing LRU data structures in the MasterService constructor and destructor.
- Updating the Get() and PutStart() methods to incorporate LRU update and eviction logic.
- Adding preprocessor definitions and corresponding LRU member variables to the header file.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| mooncake-store/src/master_service.cpp | Implements LRU update and eviction in various service methods. |
| mooncake-store/include/master_service.h | Adds preprocessor flags and LRU data structures. |
Comments suppressed due to low confidence (2)
mooncake-store/include/master_service.h:18
- [nitpick] Consider defining USE_LRU_MASTER without a string value (e.g. using '#define USE_LRU_MASTER') to better align with standard preprocessor flag practices.
#define USE_LRU_MASTER "ON"
mooncake-store/src/master_service.cpp:143
- [nitpick] Consider refactoring the duplicate LRU update logic found in Get() and PutStart() into a helper function to improve code maintainability and reduce redundancy.
all_key_list_.push_front(key);
| } | ||
| all_key_list_.push_front(key); | ||
| all_key_idx_map_[key] = all_key_list_.begin(); | ||
| if(all_key_list_.size() >= LRU_MAX_CAPACITY) |
Copilot
AI
Apr 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify that the eviction condition correctly reflects the intended capacity constraint; if LRU_MAX_CAPACITY is the max allowed entries, eviction should occur only when adding a new key would exceed that capacity.
| if(all_key_list_.size() >= LRU_MAX_CAPACITY) | |
| if(all_key_list_.size() > LRU_MAX_CAPACITY) |
|
Hi, @zhaoyongke , Thank you for your contributions, https://github.com/kvcache-ai/Mooncake/actions/runs/14609309219/job/40984273235?pr=287. master_service_test has failed. Please check it. |
| #include "allocator.h" | ||
| #include "types.h" | ||
|
|
||
| #define USE_LRU_MASTER "ON" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define it in the CMake File?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine, I'll fix it
|
|
||
| #ifdef USE_LRU_MASTER | ||
| // LRU statistics | ||
| std::list <std::string> all_key_list_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a class (e.g., class LRUList {....}) to wrap these two LRU lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, and we'd better add some basic tests for the new class.
CMakeLists.txt
Outdated
|
|
||
| if (USE_LRU_MASTER) | ||
| add_compile_definitions(USE_LRU_MASTER) | ||
| add_compile_definitions(LRU_MAX_CAPACITY=1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'd better move it to command arguments, so that we can change it on demand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now fixed
| std::shared_ptr<BufferAllocatorManager> buffer_allocator_manager_; | ||
| std::shared_ptr<AllocationStrategy> allocation_strategy_; | ||
|
|
||
| #ifdef USE_LRU_MASTER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding MACRO here and there. Introducing an EvictStrategy class just like the AllocationStrategy is a good choice. By default is FIFO, LRU is the other option. In this strategy class, wrap all the stuffs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll think about it, thanks for your advice ~
CMakeLists.txt
Outdated
| option(USE_REDIS "option for enable redis as metadata server" OFF) | ||
| option(USE_HTTP "option for enable http as metadata server" ON) | ||
| option(USE_LRU_MASTER "option for using LRU in master service" OFF) | ||
| set(LRU_MAX_CAPACITY 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'd better configure it at run time, similar to enable_gc
Mooncake/mooncake-store/src/master.cpp
Line 14 in 14af70c
| DEFINE_bool(enable_gc, false, "Enable garbage collection"); |
| } | ||
| all_key_list_.push_front(key); | ||
| all_key_idx_map_[key] = all_key_list_.begin(); | ||
| if(all_key_list_.size() >= LRU_MAX_CAPACITY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A global lru size limitation may not good for production, we'd better limit the size at per client/node level.
Since each client could have limited cache size, but the client number could be dynamic scale up/down in production.
|
|
||
| LOG(INFO) << "### LRU Update in Put() ###"; | ||
| eviction_strategy_->AddKey(key); | ||
| if(eviction_strategy_ -> GetSize() >= LRU_MAX_CAPACITY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if(eviction_strategy_ -> GetSize() >= LRU_MAX_CAPACITY) | |
| if(eviction_strategy_->GetSize() >= LRU_MAX_CAPACITY) |
… if more than 80% used, trigger evict
|
Looks good to me! Thanks a lot for your contribution — this is an important feature for |
* Fix nvmeof build issue * Add LRU in Master Service, complexity O(1) * Move Macros to CMake Options * fix lru build options * Now we can setup LRU_MAX_CAPACITY with cmake commands * Refactoring LRU to EvictionStrategy Class * Add tests of eviction strategy * fix build issues * Fix eviction strategy test issue * Resolve conflicts with master * resolve conflicts, second part * Change MasterMetrics to get ratio of used storage and total capacity, if more than 80% used, trigger evict (cherry picked from commit d085d86)
We found mooncake store will discard all the put() requests after its capacity reaches the max. This case is not applicable in our online service. We solved this via LRU( Least Recently Used) algorithm.