Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Baseline Resync #248

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Conversation

koujl
Copy link

@koujl koujl commented Dec 26, 2024

This PR contains all changes from the branch baseline_resync.

The commit history has been reworked for simplicity and clarity.

yuwmao and others added 14 commits December 26, 2024 16:26
* Add snapshot related msg protocol / struct and framework for snapshot resync

- pg_blob_iterator is the snapshot resync context for leader(read path)
- SnapshotReceiveHandler is a newly added structure as the snapshot context for follower(write path)
- Comment previous implementation codes
Implemented ReplicationStateMachine::write_snapshot_data() method and 
SnapshotReceiveHandler logic. Additionally, extracted local_create_pg, 
local_create_shard, and local_add_blob_info functions for follower data 
creation.
* Add UT for PGBlobIterator

* fix comments
Extracted SnapshotContext from SnapshotReceiveHandler and added UT for
SnapshotReceiveHandler. Additionally, fixed several bugs in the
SnapshotReceiveHandler logic.
Update baseline_resync branch with latest commits
Fix some bugs in baseline resync
* Enhance SnapshotReceiveHandler UTs
* Add test and flip for snapshot resync
* Blob duplication handling

This commit addresses blob duplication issues caused by resync, it mainly happens when resend snapshot mesg during baseline resync and apply log after snapshot completion. This helps avoid unnecessary GC due to duplicated data.
This mechanism is effective only for duplicated blobs. Since we do not record the blks of the `shardInfo` stored in the data service, we are unable to skip data writes for duplicated shards.

* Reset PG after failures
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 59.30902% with 212 lines in your changes missing coverage. Please review.

Project coverage is 64.14%. Comparing base (1746bcc) to head (9ecdab4).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
...ib/homestore_backend/replication_state_machine.cpp 16.85% 68 Missing and 6 partials ⚠️
...lib/homestore_backend/snapshot_receive_handler.cpp 60.62% 47 Missing and 16 partials ⚠️
src/lib/homestore_backend/pg_blob_iterator.cpp 80.14% 17 Missing and 10 partials ⚠️
src/lib/homestore_backend/hs_pg_manager.cpp 59.37% 8 Missing and 5 partials ⚠️
src/lib/homestore_backend/replication_message.hpp 59.37% 10 Missing and 3 partials ⚠️
src/lib/homestore_backend/hs_blob_manager.cpp 67.64% 8 Missing and 3 partials ⚠️
src/lib/homestore_backend/heap_chunk_selector.cpp 57.14% 4 Missing and 2 partials ⚠️
src/lib/homestore_backend/index_kv.cpp 25.00% 2 Missing and 1 partial ⚠️
src/lib/homestore_backend/hs_shard_manager.cpp 89.47% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #248      +/-   ##
==========================================
+ Coverage   63.15%   64.14%   +0.98%     
==========================================
  Files          32       33       +1     
  Lines        1900     2418     +518     
  Branches      204      281      +77     
==========================================
+ Hits         1200     1551     +351     
- Misses        600      720     +120     
- Partials      100      147      +47     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yuwmao
yuwmao previously approved these changes Dec 27, 2024
Copy link
Contributor

@yuwmao yuwmao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

bool success = local_add_blob_info(pg_id, blob_info);

if (ctx) {
ctx->promise_.setValue(success ? BlobManager::Result< BlobInfo >(blob_info)
Copy link
Contributor

@Besroy Besroy Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you pls help me understand why change the condition from 'status != success && !exist_alreay' to 'status!=success'? If add_to_index_table is failed because of already_exist, what is expect to happen? (or will it happen?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, if the blob already exists in index table, add_to_index_table fails but on_blob_put_commit returns success. I think it is intended to pass the replay logs scenario. But actually, there might be 2 cases: pbas is the same as the existing pbas and pbas is a different one(maybe by some mistakes). Currently, in add_to_index_table func, if the pbas is the same as the old one, it will return success, so we don't need && !exist_already condition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually it's not quite clear at this point in what cases we may encounter a conflict value in the index table. Let's wait for further tests to uncover more corner cases and optimize our handling accordingly.

}
// sort shard list by <vchunkid, lsn> to ensure open shards positioned after sealed shards
std::ranges::sort(shard_list_, [](ShardEntry& a, ShardEntry& b) {
return a.v_chunk_num != b.v_chunk_num ? a.v_chunk_num < b.v_chunk_num : a.info.lsn < b.info.lsn;
Copy link
Contributor

@Besroy Besroy Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly, here assume shards with smaller v_chunk_num is more likely to be sealed. Could you pls explain the reason? IIRC, shard is selected by available size (Besides, v_chunk_id may be easier to understand than v_chunk_num, as they are same)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't care about the shard sequence across different chunks. This is just to ensure that within each chunk the opened shard is positioned last.

Let me edit the comment for clarity in the last push.

@koujl
Copy link
Author

koujl commented Dec 27, 2024

To keep the commit log history clean, all additional changes in this PR will be placed in the latest commit Preparation for merging into main.

//Check if pg exists, if yes, clean the stale pg resources, may be due to previous snapshot failure. Let's resync on a pristine base
if (home_object_->pg_exists(pg_data->pg_id())) {
LOGI("pg already exists, clean pg resources before snapshot, pg_id:{} {}", pg_data->pg_id(), log_suffix);
home_object_->pg_destroy(pg_data->pg_id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the reason for src/lib/homestore_backend/hs_blob_manager.cpp:L363-369?

Copy link
Contributor

@yuwmao yuwmao Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/lib/homestore_backend/hs_blob_manager.cpp:L363-369 is more like a general defensive safety check, otherwise if pg is not exists, it can not proceed. Similar to the existing shard check.
In my understanding, reset pg and create pg is the first msg before create blob in the baseline resync(if fail, will retry the same msg), so this line should not fail that check.

@Besroy
Copy link
Contributor

Besroy commented Dec 27, 2024

I have an additional question regarding the baseline rsync: If the leader considers timeout during the baseline rsync, what will happen in leader and follower side? If leader resends snapshot, will the follower recreate a new snapshot? Or leader will not resend --- raft will try to re-read-save the snapshot obj until the obj is successfully written or fail?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants