-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](filecache) fix warm up cancel failure when BE is down #58035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixed issue where cancel flow would exit if a BE was offline,preventing subsequent BEs from receiving clear_job RPC.Now skips failed BEs and continues sending RPCs to others. Signed-off-by: zhengyu <zhangzhengyu@selectdb.com>
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 34254 ms |
TPC-DS: Total hot run time: 187417 ms |
ClickBench: Total hot run time: 27.66 s |
FE Regression Coverage ReportIncrement line coverage |
fe/fe-core/src/main/java/org/apache/doris/cloud/CloudWarmUpJob.java
Outdated
Show resolved
Hide resolved
| try { | ||
| TNetworkAddress addr = beToAddr == null ? null : beToAddr.get(beId); | ||
| if (addr != null) { | ||
| ClientPool.backendPool.invalidateObject(addr, client); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot invalidate the pool unless we check the exception to ensure it needs invalidation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pool is used locally, so invalidating one item won't affect other RPCs or jobs. Invalidation is to skip the failed BE and continue with the following.
Signed-off-by: zhengyu <zhangzhengyu@selectdb.com>
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 34679 ms |
TPC-DS: Total hot run time: 181379 ms |
ClickBench: Total hot run time: 27.36 s |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixed issue where cancel flow would exit if a BE was offline,preventing subsequent BEs from receiving clear_job RPC.Now skips failed BEs and continues sending RPCs to others. Signed-off-by: zhengyu <zhangzhengyu@selectdb.com>
Fixed issue where cancel flow would exit if a BE was offline,preventing subsequent BEs from receiving clear_job RPC.Now skips failed BEs and continues sending RPCs to others. Signed-off-by: zhengyu <zhangzhengyu@selectdb.com>
Fixed issue where cancel flow would exit if a BE was offline,preventing subsequent BEs from receiving clear_job RPC.Now skips failed BEs and continues sending RPCs to others.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)