-
Notifications
You must be signed in to change notification settings - Fork 420
[store] feat: add secondary storage usage monitor #976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @yejj710, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request extends Mooncake's storage monitoring capabilities beyond memory to include SSD persistent storage. It introduces a new configuration option for defining the total SSD capacity and implements the necessary logic within the MasterService and MasterMetricManager to accurately track allocated SSD space and report its usage. This enhancement provides a more comprehensive view of resource utilization in tiered caching environments. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces monitoring for secondary storage (SSD), which is a great addition for tiered caching. The changes for configuration and metric collection are mostly well-implemented. However, I've found a critical issue in PutRevoke that could lead to a service crash, and a significant logic omission where allocated_file_size is not decremented upon object removal, which will cause the metric to be inaccurate over time. I've also included a few medium-severity suggestions to improve code readability and correctness. Please review the comments for details.
mooncake-store/include/types.h
Outdated
| static constexpr int64_t DEFAULT_CLIENT_LIVE_TTL_SEC = 10; // in seconds | ||
| static const std::string DEFAULT_CLUSTER_ID = "mooncake_cluster"; | ||
| static const std::string DEFAULT_ROOT_FS_DIR = ""; | ||
| static const uint64_t DEFAULT_GLOBAL_FILE_SEGMENT_SIZE = 536870912000; // 500 GiB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would confuse the user. How about setting it to u64::MAX and showing "infinite"? Since we don't limit DFS usage right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea! i will fix it later.
| }; | ||
|
|
||
| // Serialize Gauges | ||
| serialize_metric(allocated_size_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename as mem_allocated_size? On the other hand, should we have a total_allocated_size metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for question 1: i have renamd memory related metrics already;
for question 2: I don't see concrete scenario for total_allocated_size. It would be more appropriate to extend the functionality when specific scenarios emerge in the future.
|
The current PR fails the Build CI Test. Please resolve the issues. @yejj710 |
| // --- Get current values --- | ||
| int64_t allocated = allocated_size_.value(); | ||
| int64_t capacity = total_capacity_.value(); | ||
| int64_t allocated = mem_allocated_size_.value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename it as mem_allocated to keep it readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This actually helped me identify a bug in the percentage utilization alerts
| std::string file_path = ResolvePath(key); | ||
| replicas.emplace_back(file_path, total_length, | ||
| ReplicaStatus::PROCESSING); | ||
| MasterMetricManager::instance().inc_allocated_file_size(total_length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure this is the only path of file allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I've double-checked the code. no problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have some time, you could also update the docs that show up on our website.
If you don’t have much time, just let me know—the PR is ready to merge either way.
| Note: When enabling this feature, the user must ensure that the DFS-mounted directory (`root_fs_dir=/path/to/dir`) is valid and consistent across all client hosts. If some clients have invalid or incorrect mount paths, it may cause abnormal behavior in Mooncake Store. | ||
|
|
||
| #### Persistent Storage Space Configuration | ||
| Mooncake provides configurable DFS available space. Users can specify `--global_file_segment_size=1048576` when starting the master, indicating a maximum usable space of 1MB on DFS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just let users know it's just a metric or something—we didn't evict anything on DFS right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will add more descriptive information
| return tl::make_unexpected(ErrorCode::INVALID_WRITE); | ||
| } | ||
| // When disk replica is enabled, update allocated_file_size | ||
| if (use_disk_replica_ && replica_type == ReplicaType::DISK) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we do it in ObjectMetadata's constructor and destructor? That's more RAII and the code looks cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move it to destructor sounds better, i will slove it today
|
@yejj710 Please confirm these issues are resolved so this PR can be merged before today's release. |
I have resolved these issues. |
|
Cant pass CI, plz fix it. @yejj710 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the build success.
done. please review it again @stmatengss |
|
@yejj710 Is it ready for merging? |
yes ~ @stmatengss |
background
Mooncake has already supported tiered caching, but currently only memory usage is tracked by mooncake_master
new feature