[Store]feat: Migrate Persistence Metadata from Client to Master Service #690

SgtPepperr · 2025-07-29T12:12:15Z

Background

Based on the iteration roadmap discussed in previous PRs #610 and issues #578 , this release migrates the metadata management of the SSD KV-cache in the hierarchical caching feature from the client to the master service, preserving the control-data separation design philosophy of Mooncake.
The main benefits are:

Improved BatchGet read performance on 3FS.
Stronger data-consistency guarantees.

Design & Implementation

The key idea is to extend the notion of replica so it can be either memory or disk.

PutStart now returns both memory replica and a disk replica.
The client writes data to each independently.
Disk-replica file writes are asynchronous (handled by a dedicated file-writer thread pool) and therefore do not block the synchronous path.
The PutEnd RPC is extended:
– The synchronous path issues PutEnd for the memory replica.
– The asynchronous path issues PutEnd for the disk replica.
Get chooses the appropriate read path depending on replica type.
Evict is modified: instead of deleting the entire metadata entry, only the memory replica portion is removed.
The persistence switch is still controlled by the client:
– If a storage path is specified → enable disk replica (Put/Get).
– If not specified → skip all disk-related operations.

The persistence feature's switch has been moved from the client to the master side, where enabling persistence is now controlled by specifying (--root_fs_dir=/path/to/dir)during master startup, requiring all client hosts to mount their DFS directories under this specified path to avoid potential abnormal behavior in Mooncake Store.

State Transitions

Moving metadata from the client to the master increases the complexity of state management:

Old model	New model
`memory` only	`mem`, `mem+disk`, `disk`

Further, each replica has its own status (e.g., PROCESSING, COMPLETED), so the combined state space explodes.
Explicit tests and eventually state-machine diagrams will be added to prevent bugs.

Basic transition rules

stateDiagram
    [*] --> empty

    empty --> mem      : Put
    empty --> disk     : Put
    empty --> mem+disk  : Put

    mem     --> empty : Remove
    mem     --> empty : Evict

    disk    --> empty : Remove

    mem+disk --> disk  : Evict
    mem+disk --> empty     : Remove

Corner Cases

Mixed persistence settings
3 clients, 2 with valid mount directory, 1 with invalid mount directory.
– potential abnormal behavior may happen in Mooncake Store.
Memory write fails, disk write succeeds (mem+disk path)
– Memory failure triggers PutRevoke, removing the memory replica.
– Disk success triggers PutEnd, leaving a COMPLETED disk replica usable by later reads.
New Put on an existing pure-disk replica
– Currently rejected with “object exists”.
– Future plan: allow re-adding a memory replica, transitioning disk → mem+disk.
Get while disk replica is still processing
– If the memory replica exist and are COMPLETED, the master returns all completed replicas, ignoring the still-processing disk replica.

xiaguan

I don't think the master should handle filesystem allocation this way.

We should provide the master with a parameter indicating it can perform filesystem allocation, where the root path is xxx. Then it can simply append the object key to the root path?

mooncake-store/include/types.h

mooncake-store/src/client.cpp

xiaguan · 2025-07-31T03:58:11Z

mooncake-store/src/client.cpp

        value.append(static_cast<char*>(slice.ptr), slice.size);
    }

    write_thread_pool_.enqueue(


I believe we could move the write_thread_pool down to the storage backend, as it's not directly related to the Client class.

Since the write_thread_pool involves calling putend and putrevoke from the master-client, from an abstraction perspective, I believe the storage layer should only handle file read/write operations, while putend and similar operations are part of the client->master interaction logic. Therefore, I think moving the write_thread_pool down to the storage layer might require further discussion.

mooncake-store/src/master_service.cpp

mooncake-store/include/types.h

SgtPepperr · 2025-08-01T03:52:29Z

I don't think the master should handle filesystem allocation this way.

We should provide the master with a parameter indicating it can perform filesystem allocation, where the root path is xxx. Then it can simply append the object key to the root path?

Yes, I also believe that after migrating the metadata to be managed by the master, the persistence switch and corresponding paths should also be controlled by the master. The latest commit has already made the corresponding changes.

Additionally, considering that with the master managing the metadata, file read/write concurrency will be controlled by the master and will not trigger various read/write conflicts, the file read/write locks have also been removed.

SpecterCipher · 2025-08-02T01:17:20Z

Hello, I'd like to inquire about a question regarding high availability mode: When a master switchover occurs, will the new master delete both the in-memory data and disk files from the previous master?

xiaguan

Just some grammar suggestions. No major changes needed, but we should add some coverage in master_service_test.cpp for the modifications. Others LGTM.

mooncake-store/include/client.h

mooncake-store/include/master_client.h

mooncake-store/include/types.h

SgtPepperr · 2025-08-04T03:04:02Z

Hello, I'd like to inquire about a question regarding high availability mode: When a master switchover occurs, will the new master delete both the in-memory data and disk files from the previous master?

In the current high-availability implementation, when a master switchover occurs, the client automatically reconnects to the new master and mounts the corresponding segment. However, all mem-kv data is cleared, and the new master's metadata is also empty, so it cannot query any previously saved kv pairs, requiring a re-Put operation.

As for disk-kv, an automatic file deletion mechanism has not yet been introduced. Although disk-kv cannot be indexed by the new master, the kv files remain in their original paths.

In the future, we plan to introduce eviction or related mechanisms to ensure the deletion of disk-kv files.

SpecterCipher · 2025-08-04T07:08:01Z

Hello, I'd like to inquire about a question regarding high availability mode: When a master switchover occurs, will the new master delete both the in-memory data and disk files from the previous master?

In the current high-availability implementation, when a master switchover occurs, the client automatically reconnects to the new master and mounts the corresponding segment. However, all mem-kv data is cleared, and the new master's metadata is also empty, so it cannot query any previously saved kv pairs, requiring a re-Put operation.

As for disk-kv, an automatic file deletion mechanism has not yet been introduced. Although disk-kv cannot be indexed by the new master, the kv files remain in their original paths.

In the future, we plan to introduce eviction or related mechanisms to ensure the deletion of disk-kv files.

Thank you for your response. May I ask if there will be similar persistent operations for KV metadata in the future to ensure the cache remains available after a master switch?

SgtPepperr · 2025-08-04T08:03:56Z

Hello, I'd like to inquire about a question regarding high availability mode: When a master switchover occurs, will the new master delete both the in-memory data and disk files from the previous master?

In the current high-availability implementation, when a master switchover occurs, the client automatically reconnects to the new master and mounts the corresponding segment. However, all mem-kv data is cleared, and the new master's metadata is also empty, so it cannot query any previously saved kv pairs, requiring a re-Put operation.
As for disk-kv, an automatic file deletion mechanism has not yet been introduced. Although disk-kv cannot be indexed by the new master, the kv files remain in their original paths.
In the future, we plan to introduce eviction or related mechanisms to ensure the deletion of disk-kv files.

Thank you for your response. May I ask if there will be similar persistent operations for KV metadata in the future to ensure the cache remains available after a master switch?

Yes, according to my understanding, the high-availability mode's TODO list does include the recovery of master metadata after a failure. You can refer to the description in the previous PR #451 regarding this matter.

SgtPepperr · 2025-08-04T08:10:50Z

Just some grammar suggestions. No major changes needed, but we should add some coverage in master_service_test.cpp for the modifications. Others LGTM.

Thanks！I have added a master_service_ssd_test.cpp to cover the correctness testing of MasterService behavior when the SSD offload feature is enabled.

SpecterCipher · 2025-08-04T13:40:02Z

Hello, I'd like to inquire about a question regarding high availability mode: When a master switchover occurs, will the new master delete both the in-memory data and disk files from the previous master?

In the current high-availability implementation, when a master switchover occurs, the client automatically reconnects to the new master and mounts the corresponding segment. However, all mem-kv data is cleared, and the new master's metadata is also empty, so it cannot query any previously saved kv pairs, requiring a re-Put operation.
As for disk-kv, an automatic file deletion mechanism has not yet been introduced. Although disk-kv cannot be indexed by the new master, the kv files remain in their original paths.
In the future, we plan to introduce eviction or related mechanisms to ensure the deletion of disk-kv files.

Thank you for your response. May I ask if there will be similar persistent operations for KV metadata in the future to ensure the cache remains available after a master switch?

Yes, according to my understanding, the high-availability mode's TODO list does include the recovery of master metadata after a failure. You can refer to the description in the previous PR #451 regarding this matter.

ok thx

xiaguan

LGTM

stmatengss · 2025-08-07T04:37:04Z

One more thing: Could you check if it has conflicts with #710?

SgtPepperr · 2025-08-07T06:11:01Z

One more thing: Could you check if it has conflicts with #710?

At the moment, there’s no direct conflict with the existing code; it’s just that the VRAM-side implementation will likely need to add file-handling logic inside the putToVram function later on.

SgtPepperr · 2025-08-12T03:24:52Z

Given that another PR #710 will require modifying the replica configuration to specify VRAM-based PUT operations, and the SSD feature also needs changes to the replica config to let users control SSD writes and to validate the client mount, we plan to submit a new PR later that introduces a unified resource type for replica configuration. This will consolidate replica metadata for DRAM, VRAM, and SSD into a single, consistent schema.

…ce (kvcache-ai#690) * initial commit * fix client::query return fault * fix isexist return fault * fix test bug * fix clearinvalidhandles problem * add file description for 3fs * change ssd function start from client to master * fix naming error * edit doc description * edit doc * clang format * fix as the review comment * fix formmat * add master service test for ssd * fix format * add log and cli * fix putend test

initial commit

30c50d6

SgtPepperr marked this pull request as draft July 29, 2025 12:12

sgt added 5 commits July 30, 2025 11:35

fix client::query return fault

c18d2be

fix isexist return fault

09176c3

fix test bug

c5fef16

fix clearinvalidhandles problem

dba12af

add file description for 3fs

3f4d614

SgtPepperr marked this pull request as ready for review July 30, 2025 07:39

xiaguan self-requested a review July 31, 2025 03:26

xiaguan requested changes Jul 31, 2025

View reviewed changes

sgt added 2 commits August 1, 2025 11:28

change ssd function start from client to master

d0f3929

fix conflict

5bea6e9

sgt added 3 commits August 1, 2025 11:53

fix naming error

1a91f2d

edit doc description

40d2a5c

edit doc

cadffc2

SgtPepperr requested a review from xiaguan August 1, 2025 06:34

xiaguan reviewed Aug 4, 2025

View reviewed changes

mooncake-store/include/client.h Outdated Show resolved Hide resolved

mooncake-store/include/master_client.h Outdated Show resolved Hide resolved

mooncake-store/include/types.h Outdated Show resolved Hide resolved

mooncake-store/include/types.h Outdated Show resolved Hide resolved

fix conflict

85d0ec2

sgt added 3 commits August 4, 2025 11:19

clang format

404e3a5

fix as the review comment

12a00f1

fix formmat

ad31aa9

add master service test for ssd

c1274e4

fix format

59f21b4

SgtPepperr requested a review from xiaguan August 4, 2025 08:11

xiaguan approved these changes Aug 5, 2025

View reviewed changes

sgt added 2 commits August 5, 2025 15:36

fix conflict

08c6027

add log and cli

e1902f6

stmatengss mentioned this pull request Aug 7, 2025

[Connector] mooncake: add multi-layer storage LMCache/LMCache#1271

Merged

sgt added 2 commits August 12, 2025 10:30

fix conflict

94eccb4

fix putend test

0d9ade3

stmatengss mentioned this pull request Aug 12, 2025

[RoadMap] Mooncake Store V2 #378

Open

29 tasks

fix conflict

1a91495

stmatengss approved these changes Aug 14, 2025

View reviewed changes

stmatengss merged commit 81f492c into kvcache-ai:main Aug 14, 2025
11 checks passed

SgtPepperr deleted the ssd_master branch August 14, 2025 06:44

[Store]feat: Migrate Persistence Metadata from Client to Master Service #690

[Store]feat: Migrate Persistence Metadata from Client to Master Service #690

Uh oh!

Conversation

SgtPepperr commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Design & Implementation

State Transitions

Basic transition rules

Corner Cases

Uh oh!

xiaguan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xiaguan Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

SgtPepperr Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SgtPepperr commented Aug 1, 2025

Uh oh!

SpecterCipher commented Aug 2, 2025

Uh oh!

xiaguan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SgtPepperr commented Aug 4, 2025

Uh oh!

SpecterCipher commented Aug 4, 2025

Uh oh!

SgtPepperr commented Aug 4, 2025

Uh oh!

SgtPepperr commented Aug 4, 2025

Uh oh!

SpecterCipher commented Aug 4, 2025

Uh oh!

xiaguan left a comment

Choose a reason for hiding this comment

Uh oh!

stmatengss commented Aug 7, 2025

Uh oh!

SgtPepperr commented Aug 7, 2025

Uh oh!

SgtPepperr commented Aug 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SgtPepperr commented Jul 29, 2025 •

edited

Loading