-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of Baseline Resync #248
base: main
Are you sure you want to change the base?
Conversation
* Add snapshot related msg protocol / struct and framework for snapshot resync - pg_blob_iterator is the snapshot resync context for leader(read path) - SnapshotReceiveHandler is a newly added structure as the snapshot context for follower(write path) - Comment previous implementation codes
Implemented ReplicationStateMachine::write_snapshot_data() method and SnapshotReceiveHandler logic. Additionally, extracted local_create_pg, local_create_shard, and local_add_blob_info functions for follower data creation.
* Add UT for PGBlobIterator * fix comments
Extracted SnapshotContext from SnapshotReceiveHandler and added UT for SnapshotReceiveHandler. Additionally, fixed several bugs in the SnapshotReceiveHandler logic.
Update baseline_resync branch with latest commits
Fix some bugs in baseline resync
* Enhance SnapshotReceiveHandler UTs * Add test and flip for snapshot resync
* Blob duplication handling This commit addresses blob duplication issues caused by resync, it mainly happens when resend snapshot mesg during baseline resync and apply log after snapshot completion. This helps avoid unnecessary GC due to duplicated data. This mechanism is effective only for duplicated blobs. Since we do not record the blks of the `shardInfo` stored in the data service, we are unable to skip data writes for duplicated shards. * Reset PG after failures
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #248 +/- ##
==========================================
+ Coverage 63.15% 64.14% +0.98%
==========================================
Files 32 33 +1
Lines 1900 2418 +518
Branches 204 281 +77
==========================================
+ Hits 1200 1551 +351
- Misses 600 720 +120
- Partials 100 147 +47 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
bool success = local_add_blob_info(pg_id, blob_info); | ||
|
||
if (ctx) { | ||
ctx->promise_.setValue(success ? BlobManager::Result< BlobInfo >(blob_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you pls help me understand why change the condition from 'status != success && !exist_alreay' to 'status!=success'? If add_to_index_table is failed because of already_exist, what is expect to happen? (or will it happen?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, if the blob already exists in index table, add_to_index_table
fails but on_blob_put_commit
returns success. I think it is intended to pass the replay logs scenario. But actually, there might be 2 cases: pbas is the same as the existing pbas and pbas is a different one(maybe by some mistakes). Currently, in add_to_index_table
func, if the pbas is the same as the old one, it will return success, so we don't need && !exist_already
condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And actually it's not quite clear at this point in what cases we may encounter a conflict value in the index table. Let's wait for further tests to uncover more corner cases and optimize our handling accordingly.
} | ||
// sort shard list by <vchunkid, lsn> to ensure open shards positioned after sealed shards | ||
std::ranges::sort(shard_list_, [](ShardEntry& a, ShardEntry& b) { | ||
return a.v_chunk_num != b.v_chunk_num ? a.v_chunk_num < b.v_chunk_num : a.info.lsn < b.info.lsn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood correctly, here assume shards with smaller v_chunk_num is more likely to be sealed. Could you pls explain the reason? IIRC, shard is selected by available size (Besides, v_chunk_id may be easier to understand than v_chunk_num, as they are same)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't care about the shard sequence across different chunks. This is just to ensure that within each chunk the opened shard is positioned last.
Let me edit the comment for clarity in the last push.
To keep the commit log history clean, all additional changes in this PR will be placed in the latest commit |
//Check if pg exists, if yes, clean the stale pg resources, may be due to previous snapshot failure. Let's resync on a pristine base | ||
if (home_object_->pg_exists(pg_data->pg_id())) { | ||
LOGI("pg already exists, clean pg resources before snapshot, pg_id:{} {}", pg_data->pg_id(), log_suffix); | ||
home_object_->pg_destroy(pg_data->pg_id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the reason for src/lib/homestore_backend/hs_blob_manager.cpp:L363-369?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/lib/homestore_backend/hs_blob_manager.cpp:L363-369
is more like a general defensive safety check, otherwise if pg is not exists, it can not proceed. Similar to the existing shard check.
In my understanding, reset pg and create pg is the first msg before create blob in the baseline resync(if fail, will retry the same msg), so this line should not fail that check.
I have an additional question regarding the baseline rsync: If the leader considers timeout during the baseline rsync, what will happen in leader and follower side? If leader resends snapshot, will the follower recreate a new snapshot? Or leader will not resend --- raft will try to re-read-save the snapshot obj until the obj is successfully written or fail? |
This PR contains all changes from the branch
baseline_resync
.The commit history has been reworked for simplicity and clarity.