Skip to content

Conversation

@sollhui
Copy link
Contributor

@sollhui sollhui commented Dec 3, 2024

What problem does this PR solve?

In production, we encountered an issue where the librdkafka consumer stucked during destruction, causing the heavy work pool to become saturated, which in turn made all heavy work pool-dependent functionalities, such as querying, unusable. To mitigate this impact, we replaced the heavy work pool with routine load threads for metadata fetching.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sollhui
Copy link
Contributor Author

sollhui commented Dec 3, 2024

run buildall

@sollhui sollhui changed the title [fix](routine load) replace heavy work pool with routine load threads for metadata fetching [fix](routine load) replace heavy work pool with routine load thread pool for metadata fetching Dec 3, 2024
@sollhui sollhui force-pushed the fetch_meta_thread_pool branch from 460175e to 5052ed7 Compare December 3, 2024 07:24
@sollhui
Copy link
Contributor Author

sollhui commented Dec 3, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40022 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5052ed7c41bb7f9ac48be09bdb7244cb30290ab6, data reload: false

------ Round 1 ----------------------------------
q1	17614	7512	7270	7270
q2	2041	172	173	172
q3	10542	1149	1259	1149
q4	10220	767	717	717
q5	7591	2761	2738	2738
q6	237	147	149	147
q7	1016	607	625	607
q8	9243	1834	1894	1834
q9	6852	6536	6542	6536
q10	7095	2324	2346	2324
q11	459	260	261	260
q12	520	222	223	222
q13	17792	3005	3019	3005
q14	244	210	211	210
q15	572	537	539	537
q16	676	603	586	586
q17	978	483	498	483
q18	7331	6662	6754	6662
q19	1334	945	1032	945
q20	477	186	178	178
q21	4040	3124	3122	3122
q22	379	318	323	318
Total cold run time: 107253 ms
Total hot run time: 40022 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7216	7213	7212	7212
q2	326	235	229	229
q3	2934	2801	2814	2801
q4	1984	1725	1741	1725
q5	5410	5382	5419	5382
q6	221	135	135	135
q7	2128	1731	1744	1731
q8	3247	3380	3418	3380
q9	8658	8662	8649	8649
q10	3518	3454	3426	3426
q11	583	492	492	492
q12	790	579	589	579
q13	14716	3063	3023	3023
q14	309	275	266	266
q15	573	522	518	518
q16	666	631	627	627
q17	1795	1604	1532	1532
q18	7737	7629	7351	7351
q19	1644	1585	1552	1552
q20	2031	1801	1793	1793
q21	5519	5243	5141	5141
q22	641	557	565	557
Total cold run time: 72646 ms
Total hot run time: 58101 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190848 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5052ed7c41bb7f9ac48be09bdb7244cb30290ab6, data reload: false

query1	920	403	371	371
query2	4779	2084	2072	2072
query3	6063	213	208	208
query4	33538	23458	23570	23458
query5	3557	462	460	460
query6	267	192	182	182
query7	4306	302	309	302
query8	303	240	240	240
query9	9116	2683	2691	2683
query10	459	256	269	256
query11	17985	15119	15227	15119
query12	160	102	103	102
query13	1526	412	397	397
query14	9725	7298	7073	7073
query15	265	172	177	172
query16	7596	450	472	450
query17	1208	552	526	526
query18	2028	290	288	288
query19	364	146	145	145
query20	115	111	114	111
query21	203	101	101	101
query22	4682	4398	4640	4398
query23	35230	34527	34411	34411
query24	11014	2419	2432	2419
query25	673	369	379	369
query26	1822	149	156	149
query27	2824	270	290	270
query28	8259	2414	2417	2414
query29	1027	399	402	399
query30	304	150	144	144
query31	1017	820	834	820
query32	95	57	55	55
query33	765	297	281	281
query34	975	510	521	510
query35	869	772	714	714
query36	1084	936	964	936
query37	175	77	77	77
query38	4285	4210	4203	4203
query39	1450	1430	1411	1411
query40	283	98	99	98
query41	46	45	43	43
query42	106	97	95	95
query43	536	488	490	488
query44	1221	798	791	791
query45	184	170	165	165
query46	1157	705	726	705
query47	1983	1904	1877	1877
query48	407	308	309	308
query49	1110	385	376	376
query50	802	386	389	386
query51	7157	7171	7046	7046
query52	99	92	89	89
query53	259	193	185	185
query54	1084	401	404	401
query55	89	77	123	77
query56	264	231	235	231
query57	1302	1115	1105	1105
query58	219	201	212	201
query59	3106	2976	2993	2976
query60	278	239	247	239
query61	111	106	109	106
query62	864	687	677	677
query63	209	180	180	180
query64	4689	698	643	643
query65	3307	3180	3230	3180
query66	1288	301	311	301
query67	16238	15582	15693	15582
query68	4980	540	558	540
query69	409	248	248	248
query70	1185	1093	1140	1093
query71	321	256	259	256
query72	6130	3738	4006	3738
query73	760	357	444	357
query74	10421	8891	9022	8891
query75	3430	2629	2657	2629
query76	2942	1075	1032	1032
query77	441	277	268	268
query78	10417	9469	9507	9469
query79	1529	602	604	602
query80	1143	437	465	437
query81	514	236	223	223
query82	911	120	119	119
query83	233	148	146	146
query84	242	72	72	72
query85	1252	297	294	294
query86	354	300	300	300
query87	4658	4536	4749	4536
query88	3444	2190	2174	2174
query89	394	295	298	295
query90	2119	188	192	188
query91	137	102	105	102
query92	60	51	52	51
query93	1089	530	541	530
query94	1061	291	280	280
query95	353	253	258	253
query96	605	284	291	284
query97	2901	2680	2712	2680
query98	218	199	209	199
query99	1539	1338	1315	1315
Total cold run time: 295623 ms
Total hot run time: 190848 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.50% (10007/25989)
Line Coverage: 29.50% (83781/284029)
Region Coverage: 28.62% (43095/150573)
Branch Coverage: 25.23% (21902/86826)
Coverage Report: http://coverage.selectdb-in.cc/coverage/5052ed7c41bb7f9ac48be09bdb7244cb30290ab6_5052ed7c41bb7f9ac48be09bdb7244cb30290ab6/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 33.5 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5052ed7c41bb7f9ac48be09bdb7244cb30290ab6, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.03
query3	0.23	0.07	0.07
query4	1.62	0.09	0.11
query5	0.42	0.38	0.40
query6	1.16	0.66	0.67
query7	0.02	0.02	0.01
query8	0.04	0.04	0.03
query9	0.59	0.51	0.50
query10	0.55	0.56	0.57
query11	0.14	0.11	0.11
query12	0.13	0.11	0.12
query13	0.61	0.61	0.59
query14	2.81	2.76	2.72
query15	0.90	0.85	0.82
query16	0.39	0.39	0.39
query17	0.96	1.02	1.08
query18	0.21	0.20	0.20
query19	1.96	1.89	2.00
query20	0.01	0.01	0.02
query21	15.37	0.60	0.58
query22	2.90	2.95	1.92
query23	17.04	0.87	0.80
query24	2.59	1.88	1.76
query25	0.24	0.16	0.09
query26	0.59	0.15	0.14
query27	0.04	0.04	0.05
query28	9.84	1.08	1.06
query29	12.56	3.28	3.25
query30	0.25	0.06	0.06
query31	2.87	0.39	0.38
query32	3.28	0.47	0.48
query33	3.00	3.00	3.00
query34	16.86	4.48	4.47
query35	4.52	4.48	4.48
query36	0.65	0.49	0.47
query37	0.09	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.03	0.02
query40	0.16	0.13	0.13
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 105.92 s
Total hot run time: 33.5 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 4, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Dec 4, 2024

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 4, 2024

PR approved by anyone and no changes requested.

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liaoxin01 liaoxin01 merged commit 0f48cfd into apache:master Dec 5, 2024
16 of 18 checks passed
github-actions bot pushed a commit that referenced this pull request Dec 5, 2024
…pool for metadata fetching (#44907)

In production, we encountered an issue where the librdkafka consumer
stucked during destruction, causing the heavy work pool to become
saturated, which in turn made all heavy work pool-dependent
functionalities, such as querying, unusable. To mitigate this impact, we
replaced the heavy work pool with routine load threads for metadata
fetching.
dataroaring pushed a commit that referenced this pull request Dec 5, 2024
…load thread pool for metadata fetching #44907 (#45039)

Cherry-picked from #44907

Co-authored-by: hui lai <laihui@selectdb.com>
sollhui added a commit to sollhui/doris that referenced this pull request Dec 31, 2024
…pool for metadata fetching (apache#44907)

In production, we encountered an issue where the librdkafka consumer
stucked during destruction, causing the heavy work pool to become
saturated, which in turn made all heavy work pool-dependent
functionalities, such as querying, unusable. To mitigate this impact, we
replaced the heavy work pool with routine load threads for metadata
fetching.
yiguolei pushed a commit that referenced this pull request Dec 31, 2024
…pool for metadata fetching (#44907) (#46186)

pick #44907

In production, we encountered an issue where the librdkafka consumer
stucked during destruction, causing the heavy work pool to become
saturated, which in turn made all heavy work pool-dependent
functionalities, such as querying, unusable. To mitigate this impact, we
replaced the heavy work pool with routine load threads for metadata
fetching.
hubgeter pushed a commit to hubgeter/doris that referenced this pull request Mar 12, 2025
…pool for metadata fetching (apache#44907) (apache#3705)

pick [apache#44907](apache#44907)

In production, we encountered an issue where the librdkafka consumer
stucked during destruction, causing the heavy work pool to become
saturated, which in turn made all heavy work pool-dependent
functionalities, such as querying, unusable. To mitigate this impact, we
replaced the heavy work pool with routine load threads for metadata
fetching.
deardeng pushed a commit to deardeng/incubator-doris that referenced this pull request Dec 19, 2025
…pool for metadata fetching (apache#44907) (apache#46186)

pick apache#44907

In production, we encountered an issue where the librdkafka consumer
stucked during destruction, causing the heavy work pool to become
saturated, which in turn made all heavy work pool-dependent
functionalities, such as querying, unusable. To mitigate this impact, we
replaced the heavy work pool with routine load threads for metadata
fetching.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants