Skip to content

Conversation

@liaoxin01
Copy link
Contributor

@liaoxin01 liaoxin01 commented Sep 12, 2025

liaoxin01 and others added 6 commits September 12, 2025 09:47
…rowset_metadata (apache#55604)

Related PR: apache#54395

Problem Summary:

The _rs_metas and _rs_version_map information in tmp_tablet meta are
inconsistent, causing the attempt to fetch rowset by version to fail
(getting null pointer). The tmp_tablet meta was copied from new tablet,
and its rowset information is actually useless since the real rowset
data will be obtained later through sync rowset. The sync rowset
operation failed to remove the old rowsets, resulting in this
inconsistency. We need to first clean up the obsolete rowsets in
tmp_tablet meta.

*** SIGSEGV address not mapped to object (@0x38) received by PID 2824014
(TID 2824488 OR 0x7f59e8eff640) from PID 56; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at
/home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:420
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F5B3AC5F520 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::cloud::CloudMetaMgr::fill_version_holes(doris::CloudTablet*,
long, std::unique_lock<std::shared_mutex>&) at
/home/zcp/repo_center/doris_master/doris/be/src/cloud/cloud_meta_mgr.cpp:1650
5#
doris::cloud::CloudMetaMgr::sync_tablet_rowsets_unlocked(doris::CloudTablet*,
std::unique_lock<bthread::Mutex>&, doris::SyncOptions const&,
doris::SyncRowsetStats*) in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
6# doris::cloud::CloudMetaMgr::sync_tablet_rowsets(doris::CloudTablet*,
doris::SyncOptions const&, doris::SyncRowsetStats*) at
/home/zcp/repo_center/doris_master/doris/be/src/cloud/cloud_meta_mgr.cpp:477
7# doris::CloudSchemaChangeJob::_process_delete_bitmap(long, long, long)
at
/home/zcp/repo_center/doris_master/doris/be/src/cloud/cloud_schema_change_job.cpp:519
8#
doris::CloudSchemaChangeJob::_convert_historical_rowsets(doris::SchemaChangeParams
const&, doris::cloud::TabletJobInfoPB&) at
/home/zcp/repo_center/doris_master/doris/be/src/cloud/cloud_schema_change_job.cpp:424
9#
doris::CloudSchemaChangeJob::process_alter_tablet(doris::TAlterTabletReqV2
const&) in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
10# doris::alter_cloud_tablet_callback(doris::CloudStorageEngine&,
doris::TAgentTaskRequest const&) at
/home/zcp/repo_center/doris_master/doris/be/src/agent/task_worker_pool.cpp:2176
11# std::_Function_handler<void (),
doris::TaskWorkerPool::submit_task(doris::TAgentTaskRequest
const&)::$_0::operator()<doris::TAgentTaskRequest
const&>(doris::TAgentTaskRequest const&)
const::{lambda()apache#1}>::_M_invoke(std::_Any_data const&) at
/usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:292
12# doris::ThreadPool::dispatch_thread() at
/home/zcp/repo_center/doris_master/doris/be/src/util/threadpool.cpp:621
13# doris::Thread::supervise_thread(void*) at
/home/zcp/repo_center/doris_master/doris/be/src/util/thread.cpp:461
… when enable skip_writing_empty_rowset_metadata (apache#55742)

Issue Number: close #xxx

Related PR: apache#54395
…mpty_rowset_metadata (apache#55837)

Rowsets generated by multiple empty rowset compactions may also lack
resource id, and this condition was missed in the DCHECK.
@Thearas
Copy link
Contributor

Thearas commented Sep 12, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@liaoxin01
Copy link
Contributor Author

run buildall

@morrySnow morrySnow changed the title branch-3.1: [opt](cloud) Reduce empty rowset pressure on meta service #54395 branch-3.1: [opt](cloud) Reduce empty rowset pressure on meta service #54395 #55171 #55604 #55742 #55837 Sep 12, 2025
@liaoxin01
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

@liaoxin01
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 42.86% (3/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 82.20% (1233/1500)
Line Coverage 66.07% (22160/33538)
Region Coverage 67.60% (11135/16473)
Branch Coverage 57.21% (5885/10286)

@doris-robot
Copy link

TPC-H: Total hot run time: 32635 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit be194910af65675c9e89b35a42460277b0f4d0f2, data reload: false

------ Round 1 ----------------------------------
q1	17927	5548	5406	5406
q2	2027	394	287	287
q3	11798	1245	750	750
q4	10292	886	459	459
q5	8790	2386	2159	2159
q6	183	164	135	135
q7	898	739	613	613
q8	9351	1487	1160	1160
q9	5350	5022	4896	4896
q10	6774	2242	1808	1808
q11	463	280	263	263
q12	344	358	206	206
q13	17789	3637	3058	3058
q14	220	233	213	213
q15	526	482	460	460
q16	419	421	373	373
q17	578	886	370	370
q18	7061	6613	6254	6254
q19	2827	956	543	543
q20	313	330	199	199
q21	2756	2172	2032	2032
q22	1052	1046	991	991
Total cold run time: 107738 ms
Total hot run time: 32635 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5641	5470	5554	5470
q2	235	328	233	233
q3	2201	2636	2340	2340
q4	1412	1837	1378	1378
q5	4402	5079	5047	5047
q6	170	165	127	127
q7	2060	1944	1870	1870
q8	2673	2824	2719	2719
q9	7347	7301	7252	7252
q10	3040	3200	2758	2758
q11	567	517	486	486
q12	668	774	569	569
q13	3419	3744	3155	3155
q14	275	291	296	291
q15	524	473	472	472
q16	446	484	433	433
q17	1241	1756	1275	1275
q18	7620	7477	7330	7330
q19	800	812	1093	812
q20	2017	2044	1930	1930
q21	5412	4968	4516	4516
q22	1112	1062	1042	1042
Total cold run time: 53282 ms
Total hot run time: 51505 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192316 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit be194910af65675c9e89b35a42460277b0f4d0f2, data reload: false

query1	977	398	401	398
query2	6116	1921	1896	1896
query3	8682	208	205	205
query4	33684	23818	23391	23391
query5	3682	611	439	439
query6	296	199	205	199
query7	4222	479	307	307
query8	302	231	227	227
query9	9424	2641	2625	2625
query10	471	329	281	281
query11	18418	15729	15097	15097
query12	166	104	104	104
query13	1565	554	418	418
query14	9631	7124	6701	6701
query15	238	195	175	175
query16	8048	700	491	491
query17	1562	746	592	592
query18	2157	416	318	318
query19	215	194	186	186
query20	132	127	124	124
query21	204	130	107	107
query22	4630	4657	4451	4451
query23	35139	34471	34120	34120
query24	7473	2695	2733	2695
query25	539	515	432	432
query26	1307	300	177	177
query27	2001	504	375	375
query28	5237	2233	2206	2206
query29	803	597	486	486
query30	249	200	166	166
query31	999	955	869	869
query32	87	60	63	60
query33	530	366	344	344
query34	769	850	538	538
query35	775	809	735	735
query36	1028	1058	967	967
query37	105	95	67	67
query38	4076	4025	4110	4025
query39	1543	1481	1452	1452
query40	212	112	99	99
query41	48	45	47	45
query42	126	116	103	103
query43	519	534	485	485
query44	1392	842	827	827
query45	188	178	170	170
query46	896	1046	676	676
query47	1999	1968	1905	1905
query48	446	428	357	357
query49	760	482	393	393
query50	694	702	440	440
query51	7257	7388	7227	7227
query52	104	103	90	90
query53	238	255	195	195
query54	551	562	487	487
query55	81	84	84	84
query56	278	291	264	264
query57	1302	1278	1238	1238
query58	257	218	216	216
query59	2998	3256	3108	3108
query60	286	283	260	260
query61	114	115	121	115
query62	804	749	714	714
query63	229	197	194	194
query64	4446	992	649	649
query65	3361	3279	3309	3279
query66	949	405	306	306
query67	16090	15781	15653	15653
query68	7737	811	556	556
query69	493	308	267	267
query70	1209	1145	1138	1138
query71	370	291	263	263
query72	5722	3755	3864	3755
query73	655	737	352	352
query74	10183	9367	9042	9042
query75	3170	3109	2654	2654
query76	3041	1189	775	775
query77	558	361	274	274
query78	10371	10537	9714	9714
query79	3097	898	620	620
query80	647	525	467	467
query81	503	256	218	218
query82	567	123	89	89
query83	178	162	144	144
query84	288	101	80	80
query85	794	385	306	306
query86	351	321	291	291
query87	4402	4416	4213	4213
query88	4732	2403	2398	2398
query89	414	342	288	288
query90	1798	189	189	189
query91	142	132	109	109
query92	66	56	50	50
query93	1507	887	536	536
query94	692	406	314	314
query95	343	279	266	266
query96	491	615	284	284
query97	3205	3247	3140	3140
query98	236	206	198	198
query99	1552	1428	1301	1301
Total cold run time: 293621 ms
Total hot run time: 192316 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.15 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit be194910af65675c9e89b35a42460277b0f4d0f2, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.04	0.04
query3	0.24	0.06	0.06
query4	1.65	0.09	0.09
query5	0.51	0.54	0.52
query6	1.13	0.74	0.76
query7	0.02	0.01	0.01
query8	0.06	0.04	0.04
query9	0.57	0.49	0.51
query10	0.56	0.56	0.56
query11	0.16	0.12	0.12
query12	0.16	0.12	0.13
query13	0.61	0.60	0.59
query14	0.77	0.81	0.82
query15	0.86	0.83	0.82
query16	0.39	0.37	0.38
query17	1.07	1.05	1.07
query18	0.18	0.18	0.20
query19	1.93	1.85	1.86
query20	0.02	0.01	0.01
query21	15.37	0.96	0.64
query22	0.76	0.74	0.68
query23	14.85	1.51	0.68
query24	2.18	0.37	0.22
query25	0.14	0.08	0.08
query26	0.28	0.19	0.19
query27	0.08	0.08	0.08
query28	13.43	1.25	0.55
query29	12.64	4.06	3.38
query30	0.25	0.08	0.06
query31	2.85	0.61	0.39
query32	3.22	0.57	0.48
query33	3.03	2.99	3.07
query34	16.44	5.26	4.58
query35	4.64	4.60	4.57
query36	0.62	0.49	0.47
query37	0.20	0.16	0.17
query38	0.16	0.15	0.15
query39	0.05	0.04	0.04
query40	0.17	0.13	0.12
query41	0.09	0.05	0.05
query42	0.06	0.05	0.06
query43	0.05	0.04	0.05
Total cold run time: 102.55 s
Total hot run time: 29.15 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 68.79% (108/157) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 45.58% (12778/28033)
Line Coverage 36.41% (113959/312977)
Region Coverage 34.02% (65140/191478)
Branch Coverage 31.05% (34183/110104)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 85.00% (136/160) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 66.14% (18234/27570)
Line Coverage 57.67% (179936/312008)
Region Coverage 55.33% (106371/192255)
Branch Coverage 49.68% (54950/110618)

1 similar comment
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 85.00% (136/160) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 66.14% (18234/27570)
Line Coverage 57.67% (179936/312008)
Region Coverage 55.33% (106371/192255)
Branch Coverage 49.68% (54950/110618)

@liaoxin01
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 42.86% (3/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 82.20% (1233/1500)
Line Coverage 66.11% (22173/33538)
Region Coverage 67.52% (11123/16473)
Branch Coverage 57.15% (5878/10286)

@doris-robot
Copy link

TPC-H: Total hot run time: 33130 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3ba8c4359376b9b79bfcaf679fd3c08eb9f03796, data reload: false

------ Round 1 ----------------------------------
q1	17757	5539	5528	5528
q2	2022	415	290	290
q3	12223	1261	775	775
q4	10271	911	465	465
q5	8808	2451	2213	2213
q6	184	164	133	133
q7	901	765	620	620
q8	9350	1448	1192	1192
q9	5214	5022	4947	4947
q10	6792	2306	1810	1810
q11	467	285	260	260
q12	344	361	211	211
q13	17763	3614	3050	3050
q14	227	237	218	218
q15	520	472	466	466
q16	423	438	371	371
q17	620	893	366	366
q18	6939	6499	6412	6412
q19	1459	961	589	589
q20	340	361	208	208
q21	2973	2250	2046	2046
q22	1049	1006	960	960
Total cold run time: 106646 ms
Total hot run time: 33130 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5629	5715	5501	5501
q2	241	336	236	236
q3	2295	2652	2344	2344
q4	1377	1794	1369	1369
q5	4451	5124	5035	5035
q6	172	166	130	130
q7	2103	1960	1804	1804
q8	2625	2844	2745	2745
q9	7341	7279	7227	7227
q10	3068	3307	2682	2682
q11	582	514	511	511
q12	693	740	607	607
q13	3449	3804	3105	3105
q14	278	310	276	276
q15	535	477	468	468
q16	444	490	446	446
q17	1233	1770	1275	1275
q18	7745	7554	7190	7190
q19	824	1159	1101	1101
q20	2037	2063	1905	1905
q21	5401	4839	4612	4612
q22	1086	1043	1015	1015
Total cold run time: 53609 ms
Total hot run time: 51584 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 68.79% (108/157) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 45.56% (12780/28048)
Line Coverage 36.39% (113964/313153)
Region Coverage 34.01% (65163/191574)
Branch Coverage 31.04% (34194/110160)

@doris-robot
Copy link

TPC-DS: Total hot run time: 192611 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3ba8c4359376b9b79bfcaf679fd3c08eb9f03796, data reload: false

query1	973	391	383	383
query2	6277	1926	1886	1886
query3	8687	210	202	202
query4	33567	23892	24108	23892
query5	4248	601	441	441
query6	283	188	181	181
query7	4200	500	325	325
query8	314	253	242	242
query9	9319	2640	2629	2629
query10	496	327	265	265
query11	18226	15682	15608	15608
query12	162	110	107	107
query13	1556	539	416	416
query14	9932	6647	7303	6647
query15	225	193	177	177
query16	7981	636	495	495
query17	1581	764	598	598
query18	2141	423	343	343
query19	237	197	173	173
query20	135	135	125	125
query21	216	126	104	104
query22	4618	4667	4404	4404
query23	35056	34533	34181	34181
query24	7216	2679	2698	2679
query25	560	509	442	442
query26	896	293	173	173
query27	2182	494	371	371
query28	5213	2242	2187	2187
query29	721	594	463	463
query30	253	193	162	162
query31	1038	980	861	861
query32	87	60	59	59
query33	497	378	329	329
query34	760	872	526	526
query35	794	802	722	722
query36	1011	1058	947	947
query37	102	96	67	67
query38	4032	4068	3980	3980
query39	1504	1440	1445	1440
query40	219	117	103	103
query41	49	48	45	45
query42	125	107	101	101
query43	520	505	492	492
query44	1347	841	842	841
query45	192	181	173	173
query46	895	1062	699	699
query47	1978	1984	1914	1914
query48	409	432	344	344
query49	755	500	418	418
query50	693	721	429	429
query51	7266	7351	7302	7302
query52	108	102	92	92
query53	237	268	197	197
query54	563	576	478	478
query55	85	79	81	79
query56	278	271	265	265
query57	1263	1270	1228	1228
query58	240	225	218	218
query59	2986	3213	3003	3003
query60	303	285	268	268
query61	123	120	110	110
query62	793	752	684	684
query63	225	188	194	188
query64	3712	1041	671	671
query65	3350	3320	3303	3303
query66	880	424	303	303
query67	16223	15770	15487	15487
query68	7222	820	544	544
query69	502	315	267	267
query70	1186	1142	1119	1119
query71	388	310	255	255
query72	5753	3773	3816	3773
query73	640	742	347	347
query74	10668	9321	8833	8833
query75	3286	3166	2659	2659
query76	3097	1171	783	783
query77	628	360	283	283
query78	10390	10557	9623	9623
query79	3411	914	586	586
query80	625	526	441	441
query81	499	258	225	225
query82	612	120	88	88
query83	162	162	143	143
query84	248	98	78	78
query85	771	378	302	302
query86	385	319	299	299
query87	4367	4327	4180	4180
query88	4718	2410	2381	2381
query89	403	328	297	297
query90	1771	196	186	186
query91	132	140	114	114
query92	66	57	53	53
query93	2085	900	544	544
query94	678	414	305	305
query95	336	287	269	269
query96	501	611	278	278
query97	3224	3346	3190	3190
query98	229	207	190	190
query99	2301	1406	1309	1309
Total cold run time: 294397 ms
Total hot run time: 192611 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.2 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3ba8c4359376b9b79bfcaf679fd3c08eb9f03796, data reload: false

query1	0.03	0.03	0.03
query2	0.08	0.04	0.04
query3	0.24	0.05	0.06
query4	1.65	0.09	0.08
query5	0.53	0.50	0.50
query6	1.14	0.75	0.74
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.58	0.51	0.52
query10	0.55	0.56	0.55
query11	0.16	0.12	0.12
query12	0.16	0.12	0.13
query13	0.61	0.60	0.60
query14	0.77	0.80	0.78
query15	0.85	0.84	0.83
query16	0.38	0.37	0.37
query17	1.03	1.07	1.03
query18	0.19	0.19	0.20
query19	1.94	1.81	1.84
query20	0.02	0.01	0.02
query21	15.40	0.95	0.67
query22	0.75	0.76	0.69
query23	14.86	1.49	0.69
query24	2.22	0.36	0.23
query25	0.15	0.09	0.09
query26	0.29	0.19	0.18
query27	0.09	0.08	0.07
query28	13.39	1.22	0.55
query29	12.62	4.05	3.33
query30	0.26	0.08	0.06
query31	2.83	0.60	0.39
query32	3.22	0.56	0.48
query33	3.01	3.05	3.08
query34	16.31	5.23	4.60
query35	4.55	4.59	4.66
query36	0.63	0.49	0.48
query37	0.20	0.16	0.17
query38	0.17	0.16	0.16
query39	0.05	0.04	0.04
query40	0.17	0.14	0.13
query41	0.09	0.05	0.05
query42	0.06	0.05	0.06
query43	0.05	0.05	0.05
Total cold run time: 102.35 s
Total hot run time: 29.2 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 88.75% (142/160) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 76.57% (21123/27585)
Line Coverage 69.87% (218128/312187)
Region Coverage 67.83% (130470/192351)
Branch Coverage 61.36% (67915/110674)

@morrySnow morrySnow merged commit 96711ec into apache:branch-3.1 Sep 15, 2025
21 of 22 checks passed
@morrySnow morrySnow mentioned this pull request Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants