Skip to content

Conversation

@Hastyshell
Copy link
Collaborator

What problem does this PR solve?

Problem Summary:

Avoid a running tablet keep not running state on single one BE with no query. Even if it is a tablet with high compaction score, compaction will fail on this BE since not running state.

Before this PR, scheduled tablet meta sync will skip not running tablets. In this PR, we include those tablets in meta sync procedure to avoid long-term inaccurate tablet state.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Hastyshell
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40005 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b780dd58a3c98c53ccb22505490847c1641f0f2f, data reload: false

------ Round 1 ----------------------------------
q1	17743	7527	7216	7216
q2	2042	180	178	178
q3	10623	1168	1179	1168
q4	10253	746	685	685
q5	7596	2743	2717	2717
q6	245	151	151	151
q7	989	611	613	611
q8	9249	1892	1935	1892
q9	6636	6439	6447	6439
q10	7032	2326	2301	2301
q11	470	265	263	263
q12	429	223	229	223
q13	17780	2950	2964	2950
q14	256	215	211	211
q15	551	499	491	491
q16	675	618	597	597
q17	1004	530	544	530
q18	7515	6786	6737	6737
q19	1342	1008	1056	1008
q20	491	183	190	183
q21	4076	3215	3136	3136
q22	378	327	318	318
Total cold run time: 107375 ms
Total hot run time: 40005 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7200	7208	7214	7208
q2	326	236	232	232
q3	2916	2772	2762	2762
q4	1946	1813	1647	1647
q5	5362	5397	5440	5397
q6	222	138	136	136
q7	2132	1721	1694	1694
q8	3265	3369	3384	3369
q9	8640	8589	8552	8552
q10	3481	3409	3410	3409
q11	608	501	503	501
q12	761	611	602	602
q13	11987	3003	2974	2974
q14	286	262	254	254
q15	549	496	503	496
q16	678	632	650	632
q17	1818	1587	1570	1570
q18	8156	7588	7414	7414
q19	1693	1590	1526	1526
q20	2080	1867	1828	1828
q21	5443	5212	5215	5212
q22	668	600	593	593
Total cold run time: 70217 ms
Total hot run time: 58008 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191463 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b780dd58a3c98c53ccb22505490847c1641f0f2f, data reload: false

query1	1008	390	393	390
query2	6540	2463	2390	2390
query3	6721	209	210	209
query4	33872	23534	23434	23434
query5	4355	475	452	452
query6	288	185	190	185
query7	4624	318	311	311
query8	314	239	244	239
query9	9655	2754	2744	2744
query10	489	253	247	247
query11	18128	15247	15313	15247
query12	168	107	113	107
query13	1673	437	433	433
query14	11311	7470	7510	7470
query15	290	182	187	182
query16	8202	434	431	431
query17	1808	590	582	582
query18	2164	305	330	305
query19	363	154	145	145
query20	113	115	109	109
query21	213	100	102	100
query22	4560	4176	4340	4176
query23	34408	33763	34382	33763
query24	11329	2469	2459	2459
query25	668	381	393	381
query26	1753	153	153	153
query27	2761	336	327	327
query28	7869	2439	2469	2439
query29	1038	402	406	402
query30	304	148	160	148
query31	1068	830	830	830
query32	100	58	62	58
query33	781	303	281	281
query34	985	523	515	515
query35	879	735	722	722
query36	1095	926	948	926
query37	283	78	75	75
query38	4317	4175	4091	4091
query39	1519	1445	1448	1445
query40	279	106	100	100
query41	53	45	45	45
query42	113	102	101	101
query43	540	507	503	503
query44	1239	803	803	803
query45	191	160	170	160
query46	1168	691	688	688
query47	1953	1845	1824	1824
query48	410	313	325	313
query49	1231	385	378	378
query50	811	384	380	380
query51	7162	7127	6938	6938
query52	101	96	92	92
query53	257	182	195	182
query54	1107	402	417	402
query55	80	84	83	83
query56	251	234	243	234
query57	1258	1129	1106	1106
query58	233	219	224	219
query59	3289	3258	3206	3206
query60	295	244	236	236
query61	107	112	122	112
query62	821	676	680	676
query63	222	198	188	188
query64	4970	676	649	649
query65	3241	3178	3192	3178
query66	1205	318	316	316
query67	15697	15574	15485	15485
query68	5899	575	566	566
query69	431	260	252	252
query70	1249	1158	1192	1158
query71	428	263	253	253
query72	6517	4168	4091	4091
query73	791	366	362	362
query74	10292	8954	8852	8852
query75	3419	2692	2656	2656
query76	3520	1157	1151	1151
query77	548	280	275	275
query78	10314	9463	9377	9377
query79	2585	615	635	615
query80	1054	425	431	425
query81	530	227	243	227
query82	686	117	121	117
query83	246	152	145	145
query84	235	75	74	74
query85	1580	314	306	306
query86	497	312	280	280
query87	4677	4514	4356	4356
query88	4128	2236	2212	2212
query89	412	293	292	292
query90	2106	193	190	190
query91	138	109	106	106
query92	68	53	56	53
query93	1619	559	553	553
query94	854	285	292	285
query95	368	252	254	252
query96	617	284	286	284
query97	2885	2692	2701	2692
query98	225	196	201	196
query99	1542	1321	1310	1310
Total cold run time: 306760 ms
Total hot run time: 191463 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.80% (10094/26015)
Line Coverage: 29.78% (85116/285850)
Region Coverage: 28.90% (43471/150397)
Branch Coverage: 25.43% (22155/87112)
Coverage Report: http://coverage.selectdb-in.cc/coverage/b780dd58a3c98c53ccb22505490847c1641f0f2f_b780dd58a3c98c53ccb22505490847c1641f0f2f/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 32.57 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b780dd58a3c98c53ccb22505490847c1641f0f2f, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.04
query3	0.23	0.08	0.07
query4	1.62	0.10	0.10
query5	0.42	0.38	0.40
query6	1.19	0.65	0.64
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.59	0.50	0.51
query10	0.56	0.59	0.56
query11	0.14	0.10	0.11
query12	0.15	0.12	0.12
query13	0.60	0.61	0.60
query14	2.75	2.74	2.76
query15	0.90	0.82	0.82
query16	0.39	0.38	0.37
query17	1.03	1.05	1.04
query18	0.22	0.22	0.21
query19	1.93	1.82	2.04
query20	0.01	0.01	0.01
query21	15.36	0.59	0.59
query22	2.98	2.73	2.15
query23	16.80	1.19	0.76
query24	2.94	1.01	0.69
query25	0.22	0.14	0.10
query26	0.41	0.15	0.14
query27	0.05	0.05	0.06
query28	11.15	1.09	1.07
query29	12.58	3.23	3.26
query30	0.25	0.07	0.06
query31	2.85	0.39	0.39
query32	3.23	0.48	0.46
query33	3.16	3.05	3.16
query34	16.64	4.39	4.46
query35	4.45	4.45	4.47
query36	0.68	0.49	0.50
query37	0.09	0.06	0.06
query38	0.05	0.03	0.03
query39	0.04	0.02	0.03
query40	0.17	0.13	0.14
query41	0.08	0.03	0.02
query42	0.04	0.03	0.02
query43	0.03	0.03	0.03
Total cold run time: 107.15 s
Total hot run time: 32.57 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 24, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring force-pushed the sync-tablet-include-not-running branch from b780dd5 to c3a5e9c Compare December 25, 2024 01:45
@dataroaring
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32574 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c3a5e9ca807e7a72cf8a7f0318fcbd9375925e16, data reload: false

------ Round 1 ----------------------------------
q1	17577	6195	6053	6053
q2	2038	304	185	185
q3	10490	1240	733	733
q4	10218	870	435	435
q5	7589	2199	1980	1980
q6	206	185	145	145
q7	906	751	608	608
q8	9240	1359	1140	1140
q9	5267	4928	4982	4928
q10	6751	2306	1857	1857
q11	499	272	261	261
q12	352	362	219	219
q13	17802	3594	2980	2980
q14	225	231	212	212
q15	551	508	497	497
q16	651	602	586	586
q17	568	856	317	317
q18	7177	6468	6393	6393
q19	2516	965	569	569
q20	298	317	185	185
q21	2805	2235	1987	1987
q22	354	331	304	304
Total cold run time: 104080 ms
Total hot run time: 32574 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6390	6224	6211	6211
q2	234	318	236	236
q3	2234	2660	2329	2329
q4	1391	1821	1336	1336
q5	4356	4753	4832	4753
q6	192	175	142	142
q7	2061	2039	1832	1832
q8	2636	2802	2666	2666
q9	7279	7235	7227	7227
q10	3055	3342	2830	2830
q11	573	497	488	488
q12	631	719	587	587
q13	3381	3813	3102	3102
q14	291	306	277	277
q15	566	523	517	517
q16	642	695	636	636
q17	1202	1778	1271	1271
q18	7491	7410	7275	7275
q19	838	1095	1157	1095
q20	2006	2011	1918	1918
q21	5855	5355	4971	4971
q22	602	605	574	574
Total cold run time: 53906 ms
Total hot run time: 52273 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.80% (10094/26016)
Line Coverage: 29.80% (85218/285935)
Region Coverage: 28.92% (43507/150444)
Branch Coverage: 25.45% (22178/87134)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c3a5e9ca807e7a72cf8a7f0318fcbd9375925e16_c3a5e9ca807e7a72cf8a7f0318fcbd9375925e16/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 196456 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c3a5e9ca807e7a72cf8a7f0318fcbd9375925e16, data reload: false

query1	1350	973	944	944
query2	6339	2426	2361	2361
query3	10927	4732	4946	4732
query4	33002	24310	23509	23509
query5	4505	625	467	467
query6	285	197	181	181
query7	3994	500	303	303
query8	299	253	228	228
query9	9540	2722	2729	2722
query10	444	299	232	232
query11	18081	15422	15289	15289
query12	161	114	103	103
query13	1575	556	394	394
query14	11627	6956	7283	6956
query15	239	212	187	187
query16	7208	597	424	424
query17	1537	760	569	569
query18	1230	362	316	316
query19	204	196	156	156
query20	120	119	118	118
query21	214	125	110	110
query22	4653	4676	4583	4583
query23	34750	33239	34053	33239
query24	6930	2355	2315	2315
query25	515	464	392	392
query26	1063	283	157	157
query27	2664	459	347	347
query28	5551	2512	2477	2477
query29	605	544	428	428
query30	228	187	149	149
query31	987	933	873	873
query32	78	59	56	56
query33	494	359	307	307
query34	778	879	528	528
query35	820	837	764	764
query36	1027	1051	967	967
query37	122	102	81	81
query38	4227	4187	4088	4088
query39	1528	1518	1449	1449
query40	208	118	114	114
query41	49	49	46	46
query42	129	107	103	103
query43	520	549	500	500
query44	1386	835	831	831
query45	189	179	185	179
query46	893	1044	659	659
query47	2040	2006	1995	1995
query48	394	413	333	333
query49	745	483	373	373
query50	636	683	414	414
query51	7234	7315	7215	7215
query52	102	103	91	91
query53	224	264	182	182
query54	470	504	413	413
query55	80	79	81	79
query56	258	284	272	272
query57	1246	1243	1173	1173
query58	250	223	217	217
query59	3248	3266	3141	3141
query60	273	272	264	264
query61	108	106	103	103
query62	885	795	757	757
query63	224	193	186	186
query64	3813	1034	653	653
query65	3306	3341	3255	3255
query66	1013	407	312	312
query67	16499	15870	15539	15539
query68	9744	755	500	500
query69	474	304	244	244
query70	1196	1145	1126	1126
query71	429	282	253	253
query72	5799	3866	3877	3866
query73	656	749	357	357
query74	10545	9097	8998	8998
query75	4543	3203	2669	2669
query76	5404	1194	753	753
query77	1021	371	282	282
query78	10170	10567	9350	9350
query79	4885	871	586	586
query80	713	507	426	426
query81	478	278	237	237
query82	342	144	121	121
query83	192	163	157	157
query84	274	92	70	70
query85	746	412	307	307
query86	355	324	296	296
query87	4624	4433	4408	4408
query88	3666	2249	2212	2212
query89	444	326	298	298
query90	2089	183	183	183
query91	131	131	105	105
query92	62	55	51	51
query93	3135	861	518	518
query94	670	401	308	308
query95	332	259	287	259
query96	483	614	280	280
query97	2750	2807	2691	2691
query98	220	228	193	193
query99	1616	1571	1452	1452
Total cold run time: 304511 ms
Total hot run time: 196456 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.59 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c3a5e9ca807e7a72cf8a7f0318fcbd9375925e16, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.03
query3	0.23	0.07	0.06
query4	1.63	0.10	0.11
query5	0.41	0.42	0.41
query6	1.14	0.66	0.65
query7	0.02	0.01	0.01
query8	0.04	0.04	0.03
query9	0.58	0.50	0.54
query10	0.55	0.60	0.55
query11	0.14	0.10	0.11
query12	0.14	0.11	0.10
query13	0.60	0.61	0.60
query14	2.72	2.75	2.73
query15	0.91	0.83	0.83
query16	0.39	0.38	0.38
query17	1.03	1.08	1.08
query18	0.22	0.20	0.20
query19	1.95	1.73	1.96
query20	0.01	0.01	0.02
query21	15.35	0.95	0.57
query22	0.75	0.70	0.81
query23	15.29	1.46	0.59
query24	3.36	1.32	1.49
query25	0.18	0.15	0.10
query26	0.26	0.14	0.12
query27	0.08	0.06	0.03
query28	14.34	1.48	1.05
query29	12.55	3.92	3.23
query30	0.25	0.09	0.06
query31	2.84	0.59	0.40
query32	3.22	0.54	0.46
query33	3.08	3.06	3.14
query34	16.65	5.08	4.50
query35	4.48	4.47	4.47
query36	0.67	0.50	0.48
query37	0.10	0.06	0.06
query38	0.05	0.04	0.04
query39	0.04	0.02	0.03
query40	0.17	0.13	0.13
query41	0.07	0.03	0.02
query42	0.04	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.67 s
Total hot run time: 31.59 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 141e6fe into apache:master Dec 25, 2024
22 of 23 checks passed
github-actions bot pushed a commit that referenced this pull request Dec 25, 2024
…ng (#45821)

Avoid a running tablet keep not running state on single one BE with no
query. Even if it is a tablet with high compaction score, compaction
will fail on this BE since not running state.

Before this PR, scheduled tablet meta sync will skip not running
tablets. In this PR, we include those tablets in meta sync procedure to
avoid long-term inaccurate tablet state.
dataroaring pushed a commit that referenced this pull request Dec 26, 2024
…is not running #45821 (#45962)

Cherry-picked from #45821

Co-authored-by: Siyang Tang <tangsiyang@selectdb.com>
@gavinchou gavinchou mentioned this pull request Feb 18, 2025
dataroaring pushed a commit that referenced this pull request Feb 27, 2025
…r committing sc job (#48219)

After modification #45821, tablet state cloud have been updated before
schema change job updating BE local tablet state.
github-actions bot pushed a commit that referenced this pull request Feb 27, 2025
…r committing sc job (#48219)

After modification #45821, tablet state cloud have been updated before
schema change job updating BE local tablet state.
seawinde pushed a commit to seawinde/doris that referenced this pull request Feb 28, 2025
…r committing sc job (apache#48219)

After modification apache#45821, tablet state cloud have been updated before
schema change job updating BE local tablet state.
mymeiyi pushed a commit to mymeiyi/doris that referenced this pull request Mar 4, 2025
…r committing sc job (apache#48219)

After modification apache#45821, tablet state cloud have been updated before
schema change job updating BE local tablet state.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…r committing sc job (apache#48219)

After modification apache#45821, tablet state cloud have been updated before
schema change job updating BE local tablet state.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.4-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants