-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Fix](merge-on-write) cloud mow table should sync rowsets in publish phase if compaction on other BE finished during this load #37670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix](merge-on-write) cloud mow table should sync rowsets in publish phase if compaction on other BE finished during this load #37670
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 40373 ms |
TPC-DS: Total hot run time: 175142 ms |
ClickBench: Total hot run time: 30.44 s |
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 39991 ms |
TPC-DS: Total hot run time: 175580 ms |
ClickBench: Total hot run time: 31.06 s |
zhannngchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…phase if compaction on other BE finished during this load (#37670) ## Proposed changes Due to #35838, when executing load job, BE will not `sync_rowsets()` in publish phase if a compaction job is finished on another BE on the same tablet between the commit phase and the publish phase of the current load job. This PR let the meta service return the tablet compaction stats along with the getDeleteBitmapUpdateLockResponse to FE and FE will send them to BE to let the BE know whether it should `sync_rowsets()` due to compaction on other BEs.
…phase if compaction on other BE finished during this load (apache#37670) ## Proposed changes Due to apache#35838, when executing load job, BE will not `sync_rowsets()` in publish phase if a compaction job is finished on another BE on the same tablet between the commit phase and the publish phase of the current load job. This PR let the meta service return the tablet compaction stats along with the getDeleteBitmapUpdateLockResponse to FE and FE will send them to BE to let the BE know whether it should `sync_rowsets()` due to compaction on other BEs.
…publish phase if compaction on other BE finished during this load (apache#37670)" This reverts commit 5f51a85.
## Proposed changes fix a typo in apache#37670
…ad in calcDeleteBitmapForMow (#39791) ## Proposed changes Issue Number: close #xxx #37670 Let FE call get_delete_bitmap_update_lock in calcDeleteBitmapForMow to get the latest compaction stats of the corresponding partition at the same time, so that in the downstream calcDelete bitmap task, it can let BE determine if there is a concurrent compaction conflict. However, this PR uses snapshot read when fetching the partition stats, which makes it possible for us to fetch outdated stats, thus allowing the BE to miss the conflict processing of concurrent compaction, and generating duplicate keys.
…ad in calcDeleteBitmapForMow (#39791) ## Proposed changes Issue Number: close #xxx #37670 Let FE call get_delete_bitmap_update_lock in calcDeleteBitmapForMow to get the latest compaction stats of the corresponding partition at the same time, so that in the downstream calcDelete bitmap task, it can let BE determine if there is a concurrent compaction conflict. However, this PR uses snapshot read when fetching the partition stats, which makes it possible for us to fetch outdated stats, thus allowing the BE to miss the conflict processing of concurrent compaction, and generating duplicate keys.
…ck be able to be in different fdb txns (#45206) #37670 let the meta service return the tablet compaction stats along with the `getDeleteBitmapUpdateLockResponse` to FE to let the BE know whether it should `sync_rowsets()` due to successful compaction on other BEs on the same tablet. That PR makes the process of reading tablets' stats and writing the delete bitmap update lock KV in one fdb txn to achieve the atomic sematic. However, when a load involves a large number of tablets, the process of reading tablets' stats may take longer than fdb txn's 5 seconds limitation and cause `TXN_TOO_OLD` error. This PR re-arrange the process so that the read of tablet stats can be not necessarily in the same fdb txn with the txn which update the lock_info.lock_id. In detail, we do as the following: 1. gain the delete bitmap update lock in MS (write delete bitmap update lock KV) 2. read tablets' stats to get the compaction counts. 3. check if the delete bitmap update lock is still held by the current load.
…ck be able to be in different fdb txns (#45206) #37670 let the meta service return the tablet compaction stats along with the `getDeleteBitmapUpdateLockResponse` to FE to let the BE know whether it should `sync_rowsets()` due to successful compaction on other BEs on the same tablet. That PR makes the process of reading tablets' stats and writing the delete bitmap update lock KV in one fdb txn to achieve the atomic sematic. However, when a load involves a large number of tablets, the process of reading tablets' stats may take longer than fdb txn's 5 seconds limitation and cause `TXN_TOO_OLD` error. This PR re-arrange the process so that the read of tablet stats can be not necessarily in the same fdb txn with the txn which update the lock_info.lock_id. In detail, we do as the following: 1. gain the delete bitmap update lock in MS (write delete bitmap update lock KV) 2. read tablets' stats to get the compaction counts. 3. check if the delete bitmap update lock is still held by the current load.
…c_rowsets` in publish phase (#48400) ### What problem does this PR solve? considering the following situation: 1. heavy SC begins 2. alter task on tablet X(to tablet Y) is sent to be1 3. be1 shutdown for some reason 4. new loads on new tablet Y are routed to be2(which will skip to calculate delete bitmaps in commit phase and publish phase because the tablet's state is `NOT_READY`) 5. be1 restarted and resumed to do alter task 6. alter task on be1 finished and change the tablet's state to `RUNNING` in MS 7. some load on tablet Y on be2 skip to calculate delete bitmap because it doesn't know the tablet's state has changed, which will cause duplicate key problem Like #37670, this PR let the meta service return the tablet states along with the getDeleteBitmapUpdateLockResponse to FE and FE will send them to BE to let the BE know whether it should sync_rowsets() due to tablet state change on other BEs.
…c_rowsets` in publish phase (apache#48400) considering the following situation: 1. heavy SC begins 2. alter task on tablet X(to tablet Y) is sent to be1 3. be1 shutdown for some reason 4. new loads on new tablet Y are routed to be2(which will skip to calculate delete bitmaps in commit phase and publish phase because the tablet's state is `NOT_READY`) 5. be1 restarted and resumed to do alter task 6. alter task on be1 finished and change the tablet's state to `RUNNING` in MS 7. some load on tablet Y on be2 skip to calculate delete bitmap because it doesn't know the tablet's state has changed, which will cause duplicate key problem Like apache#37670, this PR let the meta service return the tablet states along with the getDeleteBitmapUpdateLockResponse to FE and FE will send them to BE to let the BE know whether it should sync_rowsets() due to tablet state change on other BEs.
…c_rowsets` in publish phase (apache#48400) considering the following situation: 1. heavy SC begins 2. alter task on tablet X(to tablet Y) is sent to be1 3. be1 shutdown for some reason 4. new loads on new tablet Y are routed to be2(which will skip to calculate delete bitmaps in commit phase and publish phase because the tablet's state is `NOT_READY`) 5. be1 restarted and resumed to do alter task 6. alter task on be1 finished and change the tablet's state to `RUNNING` in MS 7. some load on tablet Y on be2 skip to calculate delete bitmap because it doesn't know the tablet's state has changed, which will cause duplicate key problem Like apache#37670, this PR let the meta service return the tablet states along with the getDeleteBitmapUpdateLockResponse to FE and FE will send them to BE to let the BE know whether it should sync_rowsets() due to tablet state change on other BEs.
…c_rowsets` in publish phase (apache#48400) considering the following situation: 1. heavy SC begins 2. alter task on tablet X(to tablet Y) is sent to be1 3. be1 shutdown for some reason 4. new loads on new tablet Y are routed to be2(which will skip to calculate delete bitmaps in commit phase and publish phase because the tablet's state is `NOT_READY`) 5. be1 restarted and resumed to do alter task 6. alter task on be1 finished and change the tablet's state to `RUNNING` in MS 7. some load on tablet Y on be2 skip to calculate delete bitmap because it doesn't know the tablet's state has changed, which will cause duplicate key problem Like apache#37670, this PR let the meta service return the tablet states along with the getDeleteBitmapUpdateLockResponse to FE and FE will send them to BE to let the BE know whether it should sync_rowsets() due to tablet state change on other BEs.
…c_rowsets` in publish phase (apache#48400) considering the following situation: 1. heavy SC begins 2. alter task on tablet X(to tablet Y) is sent to be1 3. be1 shutdown for some reason 4. new loads on new tablet Y are routed to be2(which will skip to calculate delete bitmaps in commit phase and publish phase because the tablet's state is `NOT_READY`) 5. be1 restarted and resumed to do alter task 6. alter task on be1 finished and change the tablet's state to `RUNNING` in MS 7. some load on tablet Y on be2 skip to calculate delete bitmap because it doesn't know the tablet's state has changed, which will cause duplicate key problem Like apache#37670, this PR let the meta service return the tablet states along with the getDeleteBitmapUpdateLockResponse to FE and FE will send them to BE to let the BE know whether it should sync_rowsets() due to tablet state change on other BEs.
…c_rowsets` in publish phase (apache#48400) ### What problem does this PR solve? considering the following situation: 1. heavy SC begins 2. alter task on tablet X(to tablet Y) is sent to be1 3. be1 shutdown for some reason 4. new loads on new tablet Y are routed to be2(which will skip to calculate delete bitmaps in commit phase and publish phase because the tablet's state is `NOT_READY`) 5. be1 restarted and resumed to do alter task 6. alter task on be1 finished and change the tablet's state to `RUNNING` in MS 7. some load on tablet Y on be2 skip to calculate delete bitmap because it doesn't know the tablet's state has changed, which will cause duplicate key problem Like apache#37670, this PR let the meta service return the tablet states along with the getDeleteBitmapUpdateLockResponse to FE and FE will send them to BE to let the BE know whether it should sync_rowsets() due to tablet state change on other BEs.
Proposed changes
Due to #35838, when executing load job, BE will not
sync_rowsets()in publish phase if a compaction job is finished on another BE on the same tablet between the commit phase and the publish phase of the current load job. This PR let the meta service return the tablet compaction stats along with the getDeleteBitmapUpdateLockResponse to FE and FE will send them to BE to let the BE know whether it shouldsync_rowsets()due to compaction on other BEs.