Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](gc tablet) fix get shutdown tablet cost a lot time #27693

Merged
merged 4 commits into from
Nov 29, 2023

Conversation

yujun777
Copy link
Collaborator

@yujun777 yujun777 commented Nov 28, 2023

If gc shutdown tablets take a lot time, it will make TabletManager getting a tablet wait a lot time.

pr #26151 had fix get a running tablet wait a lot time. Also it fix drop a tablet hold a tablet meta lock too long.

But it hadn't fix get a deleted tablet wait a log time. So this pr fix this.

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@yujun777
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@yujun777 yujun777 force-pushed the fix-get-shutdown-tablet-cost-time branch from 48b82ea to 3a45836 Compare November 28, 2023 08:42
@yujun777
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

std::lock_guard<std::shared_mutex> wrlock(_shutdown_deleting_tablets_lock);
auto it = _shutdown_deleting_tablets.begin();
while (it != _shutdown_deleting_tablets.end()) {
auto it = _shutdown_tablets.begin();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is _shutdown_tablets thread safe to iterate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update

@yujun777
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@yujun777
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.89 seconds
stream load tsv: 563 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 28 seconds loaded 2358488459 Bytes, about 80 MB/s
stream load orc: 70 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17100699466 Bytes

@yujun777 yujun777 force-pushed the fix-get-shutdown-tablet-cost-time branch from cc74117 to 83e4574 Compare November 28, 2023 13:37
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@yujun777
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@yujun777
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@yujun777 yujun777 force-pushed the fix-get-shutdown-tablet-cost-time branch from 361e56e to 03159f5 Compare November 28, 2023 14:22
@yujun777
Copy link
Collaborator Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

std::lock_guard<std::shared_mutex> wrdlock(_shutdown_tablets_lock);
while (last_it != _shutdown_tablets.end() && batch_tablets.size() < limit) {
// it means current tablet is referenced by other thread
if (last_it->use_count() > 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if tablet is referenced during _move_tablet_to_trash, will the query being correct?

Copy link
Collaborator Author

@yujun777 yujun777 Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果tablet被其他线程拿住,不会回收这个tablet;
如果tablet正在执行_move_tablet_to_trash,其他线程不会拿到这个tablet(因为其他线程拿tablet需要获得shutdown_tablets_lock 的读锁)

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 03159f56fc1dc6442e82fef88de8457338efa3c1, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4918	4645	4657	4645
q2	352	150	158	150
q3	1522	1307	1302	1302
q4	1165	1036	940	940
q5	3250	3254	3282	3254
q6	257	127	134	127
q7	995	511	557	511
q8	2268	2251	2231	2231
q9	6964	6935	6917	6917
q10	3280	3386	3364	3364
q11	347	208	201	201
q12	353	218	219	218
q13	4666	4820	3877	3877
q14	249	219	222	219
q15	599	537	533	533
q16	423	365	419	365
q17	1014	641	602	602
q18	7976	8151	7464	7464
q19	1557	1514	1568	1514
q20	593	344	331	331
q21	3393	2943	2992	2943
q22	373	299	308	299
Total cold run time: 46514 ms
Total hot run time: 42007 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4571	4610	4576	4576
q2	318	187	240	187
q3	3730	3731	3725	3725
q4	2535	2516	2523	2516
q5	6158	6143	6194	6143
q6	246	122	125	122
q7	2572	1978	1960	1960
q8	3775	3715	3739	3715
q9	9379	9297	9346	9297
q10	4076	4138	4157	4138
q11	633	514	504	504
q12	798	638	628	628
q13	4346	3655	3651	3651
q14	271	243	252	243
q15	595	525	531	525
q16	524	496	502	496
q17	2109	2097	2090	2090
q18	9605	9256	8984	8984
q19	1801	1776	1763	1763
q20	2315	1999	1966	1966
q21	7371	6819	6976	6819
q22	642	560	566	560
Total cold run time: 68370 ms
Total hot run time: 64608 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.24 seconds
stream load tsv: 580 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17100467949 Bytes

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 29, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiaokang xiaokang merged commit e208072 into apache:master Nov 29, 2023
30 of 33 checks passed
@wm1581066 wm1581066 removed the p0_b label Dec 7, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants