-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Fix](cloud) calc_sync_versions should consider full compaction
#55630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix](cloud) calc_sync_versions should consider full compaction
#55630
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
calc_sync_version should consider full compactioncalc_sync_versions should consider full compaction
c0d4807 to
a89fabd
Compare
a89fabd to
f840a83
Compare
|
run buildall |
|
run buildall |
FE UT Coverage ReportIncrement line coverage `` 🎉 |
TPC-H: Total hot run time: 34425 ms |
TPC-DS: Total hot run time: 187810 ms |
ClickBench: Total hot run time: 33.71 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
FE UT Coverage ReportIncrement line coverage `` 🎉 |
TPC-H: Total hot run time: 34423 ms |
TPC-DS: Total hot run time: 187117 ms |
ClickBench: Total hot run time: 33.53 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…ache#55630) Currently, `MetaServiceImpl::get_rowset` use `calc_sync_versions` to eliminate unnecessary version ranges when BE sync rowset metas. One of the optimizations is as the following: ```cpp std::vector<std::pair<int64_t, int64_t>> calc_sync_versions(int64_t req_bc_cnt, int64_t bc_cnt, int64_t req_cc_cnt, int64_t cc_cnt, int64_t req_cp, int64_t cp, int64_t req_start, int64_t req_end) { // ... if (req_cc_cnt < cc_cnt) { Version cc_version; if (req_cp < cp && req_cc_cnt + 1 == cc_cnt) { // * only one CC happened and CP changed // BE [=][=][=][=][=====][=][=] // ^~~~~ req_cp // MS [=][=][=][=][xxxxxxxxxxxxxx][=======][=][=] // ^~~~~~~ ms_cp // ^____________^ related_versions: [req_cp, ms_cp - 1] // cc_version = {req_cp, cp - 1}; } else { // ... } ``` This optimization replies on the assumption that only cumulative compaction will change the cumulative point. However, full compaction can also change the cumulative point, which breaks the above replied assumption. This will cause data correctness problem in multi-cluster environment because it will make the tablet failed to sync some rowset metas forever. A data correctness problem has been observed in the following situaitions: 1. For a certain tablet, base_compaction_cnt=14, cumulative_compaction_cnt=804, cumu_point=7458. On node A of the write cluster (cluster 0), a full compaction of [2-7464] and a cumulative compaction of [7465-7486] were performed. The stats then became base_compaction_cnt=15, cumulative_compaction_cnt=805, cumu_point=7465. 2. On node B of the read cluster (cluster 1), during sync_rowset, we have: req_base_compaction_cnt=14, base_compaction_cnt=15, req_cumulative_compaction_cnt=804, cumulative_compaction_cnt=805, req_cp=7458, cp=7465, req_start=7487, req_end=int_max. 3. calc_sync_version computes that the rowsets to be pulled are [0-7464] and [7487-int_max], but it misses the rowset [7465-7486] produced by cumulative compaction. 4. Moreover, since the max_version of the tablet on cluster 1 node B has been updated, subsequent sync_rowset operations will also not pull the rowset [7465-7486]. 5. This causes duplicate keys problem on MOW table because new rowset will generate delete bitmap marks on [7465-7486]. --- This PR forbids the above optimization when full compaction cnt is changed. None - Test <!-- At least one of them must be included. --> - [x] Regression test - [x] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…ache#55630) Currently, `MetaServiceImpl::get_rowset` use `calc_sync_versions` to eliminate unnecessary version ranges when BE sync rowset metas. One of the optimizations is as the following: ```cpp std::vector<std::pair<int64_t, int64_t>> calc_sync_versions(int64_t req_bc_cnt, int64_t bc_cnt, int64_t req_cc_cnt, int64_t cc_cnt, int64_t req_cp, int64_t cp, int64_t req_start, int64_t req_end) { // ... if (req_cc_cnt < cc_cnt) { Version cc_version; if (req_cp < cp && req_cc_cnt + 1 == cc_cnt) { // * only one CC happened and CP changed // BE [=][=][=][=][=====][=][=] // ^~~~~ req_cp // MS [=][=][=][=][xxxxxxxxxxxxxx][=======][=][=] // ^~~~~~~ ms_cp // ^____________^ related_versions: [req_cp, ms_cp - 1] // cc_version = {req_cp, cp - 1}; } else { // ... } ``` This optimization replies on the assumption that only cumulative compaction will change the cumulative point. However, full compaction can also change the cumulative point, which breaks the above replied assumption. This will cause data correctness problem in multi-cluster environment because it will make the tablet failed to sync some rowset metas forever. A data correctness problem has been observed in the following situaitions: 1. For a certain tablet, base_compaction_cnt=14, cumulative_compaction_cnt=804, cumu_point=7458. On node A of the write cluster (cluster 0), a full compaction of [2-7464] and a cumulative compaction of [7465-7486] were performed. The stats then became base_compaction_cnt=15, cumulative_compaction_cnt=805, cumu_point=7465. 2. On node B of the read cluster (cluster 1), during sync_rowset, we have: req_base_compaction_cnt=14, base_compaction_cnt=15, req_cumulative_compaction_cnt=804, cumulative_compaction_cnt=805, req_cp=7458, cp=7465, req_start=7487, req_end=int_max. 3. calc_sync_version computes that the rowsets to be pulled are [0-7464] and [7487-int_max], but it misses the rowset [7465-7486] produced by cumulative compaction. 4. Moreover, since the max_version of the tablet on cluster 1 node B has been updated, subsequent sync_rowset operations will also not pull the rowset [7465-7486]. 5. This causes duplicate keys problem on MOW table because new rowset will generate delete bitmap marks on [7465-7486]. --- This PR forbids the above optimization when full compaction cnt is changed. None - Test <!-- At least one of them must be included. --> - [x] Regression test - [x] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…ache#55630) ### What problem does this PR solve? Currently, `MetaServiceImpl::get_rowset` use `calc_sync_versions` to eliminate unnecessary version ranges when BE sync rowset metas. One of the optimizations is as the following: ```cpp std::vector<std::pair<int64_t, int64_t>> calc_sync_versions(int64_t req_bc_cnt, int64_t bc_cnt, int64_t req_cc_cnt, int64_t cc_cnt, int64_t req_cp, int64_t cp, int64_t req_start, int64_t req_end) { // ... if (req_cc_cnt < cc_cnt) { Version cc_version; if (req_cp < cp && req_cc_cnt + 1 == cc_cnt) { // * only one CC happened and CP changed // BE [=][=][=][=][=====][=][=] // ^~~~~ req_cp // MS [=][=][=][=][xxxxxxxxxxxxxx][=======][=][=] // ^~~~~~~ ms_cp // ^____________^ related_versions: [req_cp, ms_cp - 1] // cc_version = {req_cp, cp - 1}; } else { // ... } ``` This optimization replies on the assumption that only cumulative compaction will change the cumulative point. However, full compaction can also change the cumulative point, which breaks the above replied assumption. This will cause data correctness problem in multi-cluster environment because it will make the tablet failed to sync some rowset metas forever. A data correctness problem has been observed in the following situaitions: 1. For a certain tablet, base_compaction_cnt=14, cumulative_compaction_cnt=804, cumu_point=7458. On node A of the write cluster (cluster 0), a full compaction of [2-7464] and a cumulative compaction of [7465-7486] were performed. The stats then became base_compaction_cnt=15, cumulative_compaction_cnt=805, cumu_point=7465. 2. On node B of the read cluster (cluster 1), during sync_rowset, we have: req_base_compaction_cnt=14, base_compaction_cnt=15, req_cumulative_compaction_cnt=804, cumulative_compaction_cnt=805, req_cp=7458, cp=7465, req_start=7487, req_end=int_max. 3. calc_sync_version computes that the rowsets to be pulled are [0-7464] and [7487-int_max], but it misses the rowset [7465-7486] produced by cumulative compaction. 4. Moreover, since the max_version of the tablet on cluster 1 node B has been updated, subsequent sync_rowset operations will also not pull the rowset [7465-7486]. 5. This causes duplicate keys problem on MOW table because new rowset will generate delete bitmap marks on [7465-7486]. --- This PR forbids the above optimization when full compaction cnt is changed. ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [x] Regression test - [x] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
What problem does this PR solve?
Currently,
MetaServiceImpl::get_rowsetusecalc_sync_versionsto eliminate unnecessary version ranges when BE sync rowset metas. One of the optimizations is as the following:This optimization replies on the assumption that only cumulative compaction will change the cumulative point. However, full compaction can also change the cumulative point, which breaks the above replied assumption. This will cause data correctness problem in multi-cluster environment because it will make the tablet failed to sync some rowset metas forever.
A data correctness problem has been observed in the following situaitions:
On node A of the write cluster (cluster 0), a full compaction of [2-7464] and a cumulative compaction of [7465-7486] were performed. The stats then became base_compaction_cnt=15, cumulative_compaction_cnt=805, cumu_point=7465.
req_base_compaction_cnt=14, base_compaction_cnt=15,
req_cumulative_compaction_cnt=804, cumulative_compaction_cnt=805,
req_cp=7458, cp=7465,
req_start=7487, req_end=int_max.
This PR forbids the above optimization when full compaction cnt is changed.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)