Skip to content

Conversation

@liaoxin01
Copy link
Contributor

@liaoxin01 liaoxin01 commented Aug 6, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

This pull request introduces several improvements and new features to the cloud storage engine, focusing on handling empty rowsets and filling version holes in tablets. The changes enhance data consistency, optimize metadata operations, and improve compaction statistics by introducing configuration options and new logic for managing rowset versions.

Handling empty rowsets and version holes:

  • Added configuration options skip_writing_empty_rowset_metadata and enable_fill_version_holes to control whether empty rowset metadata is written to the meta service and whether missing version holes are automatically filled during tablet synchronization.
  • Implemented logic in CloudMetaMgr to detect and fill version holes by creating deterministic empty rowsets for missing versions, ensuring continuous versioning and tracking the number of holes filled.
  • Modified CloudDeltaWriter and CloudRowsetBuilder to support skipping metadata writes for empty rowsets, including batch initialization and commit logic based on the new configuration.
  • Updated Rowset and related classes to distinguish hole rowsets from normal rowsets, preventing them from affecting statistics and compaction counts.
  • Adjusted compaction logic to accurately count input rowsets by excluding hole rowsets, ensuring correct reporting and resource management during compaction jobs.

These changes collectively improve the robustness of version management in the cloud storage engine, reduce unnecessary metadata operations for empty rowsets, and maintain accurate tablet statistics.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@liaoxin01 liaoxin01 marked this pull request as draft August 6, 2025 07:50
@liaoxin01 liaoxin01 changed the title [opt](ms) ‌Reduce empty rowset pressure on meta service‌ [opt](cloud) ‌Reduce empty rowset pressure on meta service‌ Aug 6, 2025
@liaoxin01
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.79% (1406/1719)
Line Coverage 66.04% (23968/36291)
Region Coverage 67.29% (11899/17684)
Branch Coverage 56.86% (6222/10942)

// missing versions are those that are not in the existing_versions
if (version.first > last_version + 1) {
// there is a hole between versions
auto prev_rowset = tablet->get_rowset_by_version(version);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next_non_hole_rowset?

for (int64_t ver = last_version + 1; ver < version.first; ++ver) {
RowsetSharedPtr hole_rowset;
RETURN_IF_ERROR(create_empty_rowset_for_hole(
tablet, ver, prev_rowset->rowset_meta(), &hole_rowset));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use rowset meta from previous rowset. e.g. dropped column may happens.

RETURN_IF_ERROR(_rowset_builder->init());
RETURN_IF_ERROR(_rowset_builder->build_rowset());
if (config::skip_writing_empty_rowset_metadata) {
return Status::OK();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about prepare record?

@dataroaring
Copy link
Contributor

Testing advice:

  1. sc
  2. compaction
  3. checker

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

TPC-H: Total hot run time: 33914 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 30745b03251d1bf28a74044f7443690eaaba4c3c, data reload: false

------ Round 1 ----------------------------------
q1	17588	5350	5389	5350
q2	1961	302	181	181
q3	10620	1302	704	704
q4	10316	995	512	512
q5	10000	2318	2303	2303
q6	232	168	137	137
q7	886	791	619	619
q8	9311	1288	1171	1171
q9	6961	5026	5151	5026
q10	6973	2388	2032	2032
q11	473	285	270	270
q12	354	358	217	217
q13	17777	3450	2984	2984
q14	237	239	223	223
q15	540	480	457	457
q16	431	458	382	382
q17	582	826	342	342
q18	7324	7154	7049	7049
q19	1240	963	539	539
q20	323	314	211	211
q21	3429	3085	2227	2227
q22	1074	1047	978	978
Total cold run time: 108632 ms
Total hot run time: 33914 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5480	5389	5316	5316
q2	234	316	220	220
q3	2158	2572	2244	2244
q4	1315	1706	1314	1314
q5	4378	4576	4328	4328
q6	213	177	132	132
q7	1962	2009	1766	1766
q8	2552	2617	2531	2531
q9	7657	7185	7269	7185
q10	3140	3337	2925	2925
q11	567	515	505	505
q12	714	806	651	651
q13	3405	3734	3210	3210
q14	295	303	298	298
q15	506	475	450	450
q16	446	498	487	487
q17	1206	1335	1379	1335
q18	8164	7884	7594	7594
q19	5622	857	852	852
q20	1953	1926	1811	1811
q21	14968	4272	4301	4272
q22	1028	1013	974	974
Total cold run time: 67963 ms
Total hot run time: 50400 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 161817 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 30745b03251d1bf28a74044f7443690eaaba4c3c, data reload: false

reason	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 22:43:56	2023-12-26 22:44:01	NULL	utf-8	NULL	NULL	
============================================
query1	996	430	408	408
query2	6531	2035	1719	1719
query3	6740	231	218	218
query4	27086	23685	23149	23149
query5	4378	629	521	521
query6	329	232	226	226
query7	4636	509	294	294
query8	263	224	235	224
query9	8554	2973	2919	2919
query10	474	324	287	287
query11	15361	14966	14824	14824
query12	186	140	138	138
query13	1647	547	423	423
query14	8673	5784	5873	5784
query15	211	191	168	168
query16	7816	658	469	469
query17	1616	795	674	674
query18	2055	444	329	329
query19	271	210	185	185
query20	162	149	142	142
query21	221	126	114	114
query22	3953	4116	3918	3918
query23	34974	34451	34439	34439
query24	7498	2423	2463	2423
query25	574	540	469	469
query26	736	309	166	166
query27	2366	524	373	373
query28	2998	2347	2368	2347
query29	683	640	558	558
query30	295	233	204	204
query31	900	780	748	748
query32	92	77	78	77
query33	539	419	379	379
query34	788	857	521	521
query35	809	822	774	774
query36	1040	1053	936	936
query37	140	114	96	96
query38	3980	3984	4020	3984
query39	1480	1367	1365	1365
query40	242	144	130	130
query41	77	58	57	57
query42	137	122	132	122
query43	521	510	482	482
query44	1404	860	882	860
query45	203	190	180	180
query46	951	1069	680	680
query47	1802	1878	1791	1791
query48	418	428	316	316
query49	675	522	423	423
query50	672	690	429	429
query51	4156	4163	4185	4163
query52	125	132	122	122
query53	257	299	210	210
query54	651	670	583	583
query55	96	87	92	87
query56	351	365	359	359
query57	1210	1247	1134	1134
query58	350	338	344	338
query59	2585	2665	2671	2665
query60	412	399	407	399
query61	124	120	127	120
query62	770	736	646	646
query63	253	215	212	212
query64	2785	1166	777	777
query65	4237	4095	4132	4095
query66	894	454	333	333
query67	query68	18297	869	978	869
query69	1032	275	285	275
query70	1318	1108	1119	1108
query71	710	325	330	325
query72	9175	2282	2232	2232
query73	3559	694	355	355
query74	9043	8974	8899	8899
query75	7498	3145	2678	2678
query76	8793	1220	790	790
query77	1173	412	332	332
query78	query79	17016	638	586	586
query80	2433	561	496	496
query81	554	268	241	241
query82	468	155	124	124
query83	419	292	286	286
query84	308	101	95	95
query85	1613	381	342	342
query86	367	301	290	290
query87	4254	4181	4128	4128
query88	5063	2193	2253	2193
query89	545	362	329	329
query90	2584	242	238	238
query91	144	143	114	114
query92	89	77	68	68
query93	6496	980	668	668
query94	1108	404	289	289
query95	405	325	310	310
query96	510	597	286	286
query97	2709	2694	2618	2618
query98	254	237	228	228
query99	1542	1434	1320	1320
Total cold run time: 297888 ms
Total hot run time: 161817 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 35.16 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 30745b03251d1bf28a74044f7443690eaaba4c3c, data reload: false

query1	0.04	0.04	0.03
query2	0.12	0.06	0.06
query3	0.30	0.06	0.07
query4	1.60	0.09	0.08
query5	0.41	0.42	0.40
query6	1.18	0.66	0.67
query7	0.02	0.02	0.01
query8	0.07	0.06	0.05
query9	0.59	0.48	0.50
query10	0.54	0.54	0.55
query11	0.25	0.14	0.13
query12	0.26	0.13	0.14
query13	0.66	0.67	0.66
query14	0.98	1.14	1.10
query15	1.04	0.93	0.93
query16	0.39	0.38	0.39
query17	1.10	1.10	1.10
query18	0.25	0.24	0.24
query19	1.94	1.80	2.00
query20	0.01	0.02	0.01
query21	15.37	0.97	0.72
query22	0.98	1.17	0.92
query23	14.70	1.56	0.91
query24	5.40	0.61	0.35
query25	0.18	0.11	0.11
query26	0.57	0.23	0.20
query27	0.11	0.11	0.11
query28	11.01	1.16	0.65
query29	12.77	3.99	3.53
query30	3.10	3.11	3.01
query31	2.81	0.64	0.46
query32	3.25	0.66	0.57
query33	3.11	3.29	3.31
query34	16.69	5.65	5.00
query35	4.89	5.12	5.06
query36	0.65	0.54	0.52
query37	0.26	0.23	0.23
query38	0.23	0.24	0.23
query39	0.07	0.06	0.07
query40	0.22	0.17	0.18
query41	0.12	0.07	0.08
query42	0.09	0.08	0.08
query43	0.08	0.06	0.06
Total cold run time: 108.41 s
Total hot run time: 35.16 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/122) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 58.28% (16382/28107)
Line Coverage 47.17% (148043/313837)
Region Coverage 36.10% (110702/306630)
Branch Coverage 39.01% (49179/126071)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/122) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 64.39% (17762/27584)
Line Coverage 53.85% (168954/313775)
Region Coverage 41.97% (132135/314833)
Branch Coverage 44.70% (57039/127599)

@liaoxin01
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.79% (1406/1719)
Line Coverage 66.09% (23985/36291)
Region Coverage 67.30% (11901/17684)
Branch Coverage 56.94% (6230/10942)

@liaoxin01 liaoxin01 force-pushed the opt_empty_rowset branch 2 times, most recently from d7a4356 to e64d79b Compare August 6, 2025 14:19
@liaoxin01
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.79% (1406/1719)
Line Coverage 66.10% (23987/36291)
Region Coverage 67.30% (11901/17684)
Branch Coverage 56.94% (6230/10942)

@doris-robot
Copy link

TPC-H: Total hot run time: 33753 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e64d79b0afad10666a60d8a5b023da0b83443512, data reload: false

------ Round 1 ----------------------------------
q1	17604	5262	5253	5253
q2	1926	314	186	186
q3	10216	1321	686	686
q4	10217	958	529	529
q5	7507	2300	2326	2300
q6	186	165	134	134
q7	881	739	614	614
q8	9295	1308	1074	1074
q9	6879	5105	5090	5090
q10	6889	2346	1965	1965
q11	457	289	262	262
q12	345	367	222	222
q13	17777	3488	2957	2957
q14	241	241	225	225
q15	523	473	455	455
q16	426	426	384	384
q17	555	822	349	349
q18	7311	7094	7187	7094
q19	1485	985	543	543
q20	322	307	221	221
q21	3450	3000	2216	2216
q22	1057	1057	994	994
Total cold run time: 105549 ms
Total hot run time: 33753 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5403	5313	5384	5313
q2	237	311	213	213
q3	2103	2555	2182	2182
q4	1338	1717	1304	1304
q5	4122	4287	4470	4287
q6	217	180	137	137
q7	1955	1923	1811	1811
q8	2467	2506	2569	2506
q9	7405	7333	7299	7299
q10	3191	3294	2900	2900
q11	557	500	487	487
q12	777	778	588	588
q13	3731	3709	3110	3110
q14	319	339	300	300
q15	520	463	469	463
q16	737	540	440	440
q17	1188	1468	1427	1427
q18	8046	7620	7846	7620
q19	12437	987	993	987
q20	1933	2011	1867	1867
q21	14867	4208	4174	4174
q22	1051	1017	1016	1016
Total cold run time: 74601 ms
Total hot run time: 50431 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

TPC-DS: Total hot run time: 161206 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e64d79b0afad10666a60d8a5b023da0b83443512, data reload: false

reason	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 22:43:56	2023-12-26 22:44:01	NULL	utf-8	NULL	NULL	
============================================
query1	990	383	401	383
query2	6520	1730	1715	1715
query3	6740	228	222	222
query4	27605	23350	23362	23350
query5	4388	643	519	519
query6	329	231	247	231
query7	4637	525	288	288
query8	276	230	238	230
query9	8609	2973	2952	2952
query10	470	364	294	294
query11	15978	15035	14741	14741
query12	181	138	138	138
query13	1661	561	421	421
query14	8724	5889	5949	5889
query15	216	190	172	172
query16	7130	651	471	471
query17	986	768	644	644
query18	2008	445	339	339
query19	222	217	182	182
query20	159	159	145	145
query21	228	130	117	117
query22	3914	3942	3875	3875
query23	34474	34161	34234	34161
query24	5306	2433	2481	2433
query25	489	504	446	446
query26	716	288	163	163
query27	2274	512	352	352
query28	3042	2343	2331	2331
query29	612	607	539	539
query30	293	237	199	199
query31	868	792	735	735
query32	90	80	80	80
query33	495	427	383	383
query34	811	847	535	535
query35	816	857	764	764
query36	1031	1046	953	953
query37	139	110	97	97
query38	3890	3982	3891	3891
query39	1427	1397	1362	1362
query40	235	147	133	133
query41	59	58	54	54
query42	141	123	131	123
query43	522	529	480	480
query44	1395	872	856	856
query45	200	186	188	186
query46	940	1057	678	678
query47	1803	1791	1745	1745
query48	412	424	319	319
query49	685	502	421	421
query50	673	674	415	415
query51	4128	4190	4102	4102
query52	125	129	115	115
query53	259	298	221	221
query54	652	640	560	560
query55	92	91	86	86
query56	350	359	362	359
query57	1214	1217	1135	1135
query58	343	342	332	332
query59	2601	2748	2623	2623
query60	411	392	377	377
query61	133	136	136	136
query62	757	723	672	672
query63	250	216	208	208
query64	2425	1130	772	772
query65	4264	4112	4120	4112
query66	1076	462	330	330
query67	query68	16525	634	593	593
query69	1021	316	297	297
query70	1441	1148	1165	1148
query71	748	342	323	323
query72	9198	2330	2406	2330
query73	3437	645	357	357
query74	9011	9034	8768	8768
query75	7538	3127	2661	2661
query76	8843	1223	799	799
query77	1160	412	340	340
query78	query79	17788	763	801	763
query80	3006	512	481	481
query81	528	236	232	232
query82	494	147	113	113
query83	358	295	275	275
query84	300	107	87	87
query85	1721	376	347	347
query86	386	323	308	308
query87	4292	4143	4187	4143
query88	5492	2199	2219	2199
query89	530	364	322	322
query90	2598	246	230	230
query91	147	139	110	110
query92	91	71	68	68
query93	6836	963	655	655
query94	1162	391	277	277
query95	481	327	330	327
query96	504	590	279	279
query97	2730	2718	2598	2598
query98	247	226	220	220
query99	1560	1362	1264	1264
Total cold run time: 295062 ms
Total hot run time: 161206 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 35.12 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e64d79b0afad10666a60d8a5b023da0b83443512, data reload: false

query1	0.04	0.04	0.03
query2	0.11	0.05	0.06
query3	0.29	0.06	0.07
query4	1.59	0.08	0.09
query5	0.42	0.40	0.42
query6	1.17	0.66	0.66
query7	0.02	0.02	0.02
query8	0.07	0.05	0.05
query9	0.59	0.50	0.49
query10	0.54	0.53	0.54
query11	0.25	0.12	0.12
query12	0.26	0.14	0.13
query13	0.68	0.68	0.66
query14	0.98	1.09	1.12
query15	1.02	0.95	0.93
query16	0.38	0.40	0.38
query17	1.08	1.07	1.12
query18	0.24	0.24	0.23
query19	1.99	1.86	1.92
query20	0.01	0.01	0.02
query21	15.37	0.97	0.72
query22	0.97	1.15	1.05
query23	14.70	1.46	0.95
query24	5.21	0.61	0.39
query25	0.19	0.12	0.11
query26	0.58	0.24	0.19
query27	0.13	0.11	0.10
query28	11.07	1.15	0.65
query29	12.58	4.05	3.52
query30	3.08	3.02	3.04
query31	2.83	0.62	0.47
query32	3.27	0.64	0.57
query33	3.10	3.22	3.26
query34	16.92	5.43	4.90
query35	4.84	5.07	4.98
query36	0.71	0.55	0.53
query37	0.26	0.23	0.22
query38	0.24	0.25	0.24
query39	0.06	0.07	0.06
query40	0.22	0.18	0.17
query41	0.13	0.08	0.08
query42	0.08	0.07	0.07
query43	0.07	0.06	0.06
Total cold run time: 108.34 s
Total hot run time: 35.12 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/133) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 58.28% (16382/28109)
Line Coverage 47.18% (148059/313845)
Region Coverage 36.13% (110784/306639)
Branch Coverage 39.02% (49195/126077)

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 63.77% (88/138) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 59.61% (16793/28173)
Line Coverage 48.52% (152758/314810)
Region Coverage 37.49% (116151/309780)
Branch Coverage 40.39% (51256/126909)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 95.65% (132/138) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 82.09% (22661/27604)
Line Coverage 74.78% (235291/314646)
Region Coverage 61.96% (195338/315247)
Branch Coverage 66.16% (84603/127872)

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 22, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit f36e3da into apache:master Aug 22, 2025
26 of 28 checks passed
dataroaring pushed a commit that referenced this pull request Aug 24, 2025
dataroaring pushed a commit that referenced this pull request Sep 3, 2025
…rowset_metadata (#55604)

Related PR: #54395

Problem Summary:

The _rs_metas and _rs_version_map information in tmp_tablet meta are
inconsistent, causing the attempt to fetch rowset by version to fail
(getting null pointer). The tmp_tablet meta was copied from new tablet,
and its rowset information is actually useless since the real rowset
data will be obtained later through sync rowset. The sync rowset
operation failed to remove the old rowsets, resulting in this
inconsistency. We need to first clean up the obsolete rowsets in
tmp_tablet meta.


*** SIGSEGV address not mapped to object (@0x38) received by PID 2824014
(TID 2824488 OR 0x7f59e8eff640) from PID 56; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at
/home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:420
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F5B3AC5F520 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::cloud::CloudMetaMgr::fill_version_holes(doris::CloudTablet*,
long, std::unique_lock<std::shared_mutex>&) at
/home/zcp/repo_center/doris_master/doris/be/src/cloud/cloud_meta_mgr.cpp:1650
5#
doris::cloud::CloudMetaMgr::sync_tablet_rowsets_unlocked(doris::CloudTablet*,
std::unique_lock<bthread::Mutex>&, doris::SyncOptions const&,
doris::SyncRowsetStats*) in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
6# doris::cloud::CloudMetaMgr::sync_tablet_rowsets(doris::CloudTablet*,
doris::SyncOptions const&, doris::SyncRowsetStats*) at
/home/zcp/repo_center/doris_master/doris/be/src/cloud/cloud_meta_mgr.cpp:477
7# doris::CloudSchemaChangeJob::_process_delete_bitmap(long, long, long)
at
/home/zcp/repo_center/doris_master/doris/be/src/cloud/cloud_schema_change_job.cpp:519
8#
doris::CloudSchemaChangeJob::_convert_historical_rowsets(doris::SchemaChangeParams
const&, doris::cloud::TabletJobInfoPB&) at
/home/zcp/repo_center/doris_master/doris/be/src/cloud/cloud_schema_change_job.cpp:424
9#
doris::CloudSchemaChangeJob::process_alter_tablet(doris::TAlterTabletReqV2
const&) in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
10# doris::alter_cloud_tablet_callback(doris::CloudStorageEngine&,
doris::TAgentTaskRequest const&) at
/home/zcp/repo_center/doris_master/doris/be/src/agent/task_worker_pool.cpp:2176
11# std::_Function_handler<void (),
doris::TaskWorkerPool::submit_task(doris::TAgentTaskRequest
const&)::$_0::operator()<doris::TAgentTaskRequest
const&>(doris::TAgentTaskRequest const&)
const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at
/usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:292
12# doris::ThreadPool::dispatch_thread() at
/home/zcp/repo_center/doris_master/doris/be/src/util/threadpool.cpp:621
13# doris::Thread::supervise_thread(void*) at
/home/zcp/repo_center/doris_master/doris/be/src/util/thread.cpp:461
uchenily pushed a commit to uchenily/doris that referenced this pull request Sep 5, 2025
…rowset_metadata (apache#55604)

Related PR: apache#54395

Problem Summary:

The _rs_metas and _rs_version_map information in tmp_tablet meta are
inconsistent, causing the attempt to fetch rowset by version to fail
(getting null pointer). The tmp_tablet meta was copied from new tablet,
and its rowset information is actually useless since the real rowset
data will be obtained later through sync rowset. The sync rowset
operation failed to remove the old rowsets, resulting in this
inconsistency. We need to first clean up the obsolete rowsets in
tmp_tablet meta.


*** SIGSEGV address not mapped to object (@0x38) received by PID 2824014
(TID 2824488 OR 0x7f59e8eff640) from PID 56; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at
/home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:420
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F5B3AC5F520 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::cloud::CloudMetaMgr::fill_version_holes(doris::CloudTablet*,
long, std::unique_lock<std::shared_mutex>&) at
/home/zcp/repo_center/doris_master/doris/be/src/cloud/cloud_meta_mgr.cpp:1650
5#
doris::cloud::CloudMetaMgr::sync_tablet_rowsets_unlocked(doris::CloudTablet*,
std::unique_lock<bthread::Mutex>&, doris::SyncOptions const&,
doris::SyncRowsetStats*) in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
6# doris::cloud::CloudMetaMgr::sync_tablet_rowsets(doris::CloudTablet*,
doris::SyncOptions const&, doris::SyncRowsetStats*) at
/home/zcp/repo_center/doris_master/doris/be/src/cloud/cloud_meta_mgr.cpp:477
7# doris::CloudSchemaChangeJob::_process_delete_bitmap(long, long, long)
at
/home/zcp/repo_center/doris_master/doris/be/src/cloud/cloud_schema_change_job.cpp:519
8#
doris::CloudSchemaChangeJob::_convert_historical_rowsets(doris::SchemaChangeParams
const&, doris::cloud::TabletJobInfoPB&) at
/home/zcp/repo_center/doris_master/doris/be/src/cloud/cloud_schema_change_job.cpp:424
9#
doris::CloudSchemaChangeJob::process_alter_tablet(doris::TAlterTabletReqV2
const&) in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
10# doris::alter_cloud_tablet_callback(doris::CloudStorageEngine&,
doris::TAgentTaskRequest const&) at
/home/zcp/repo_center/doris_master/doris/be/src/agent/task_worker_pool.cpp:2176
11# std::_Function_handler<void (),
doris::TaskWorkerPool::submit_task(doris::TAgentTaskRequest
const&)::$_0::operator()<doris::TAgentTaskRequest
const&>(doris::TAgentTaskRequest const&)
const::{lambda()apache#1}>::_M_invoke(std::_Any_data const&) at
/usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:292
12# doris::ThreadPool::dispatch_thread() at
/home/zcp/repo_center/doris_master/doris/be/src/util/threadpool.cpp:621
13# doris::Thread::supervise_thread(void*) at
/home/zcp/repo_center/doris_master/doris/be/src/util/thread.cpp:461
dataroaring pushed a commit that referenced this pull request Sep 8, 2025
… when enable skip_writing_empty_rowset_metadata (#55742)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #54395
liaoxin01 added a commit to liaoxin01/doris that referenced this pull request Sep 12, 2025
liaoxin01 added a commit to liaoxin01/doris that referenced this pull request Sep 12, 2025
…rowset_metadata (apache#55604)

Related PR: apache#54395

Problem Summary:

The _rs_metas and _rs_version_map information in tmp_tablet meta are
inconsistent, causing the attempt to fetch rowset by version to fail
(getting null pointer). The tmp_tablet meta was copied from new tablet,
and its rowset information is actually useless since the real rowset
data will be obtained later through sync rowset. The sync rowset
operation failed to remove the old rowsets, resulting in this
inconsistency. We need to first clean up the obsolete rowsets in
tmp_tablet meta.

*** SIGSEGV address not mapped to object (@0x38) received by PID 2824014
(TID 2824488 OR 0x7f59e8eff640) from PID 56; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at
/home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:420
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F5B3AC5F520 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::cloud::CloudMetaMgr::fill_version_holes(doris::CloudTablet*,
long, std::unique_lock<std::shared_mutex>&) at
/home/zcp/repo_center/doris_master/doris/be/src/cloud/cloud_meta_mgr.cpp:1650
5#
doris::cloud::CloudMetaMgr::sync_tablet_rowsets_unlocked(doris::CloudTablet*,
std::unique_lock<bthread::Mutex>&, doris::SyncOptions const&,
doris::SyncRowsetStats*) in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
6# doris::cloud::CloudMetaMgr::sync_tablet_rowsets(doris::CloudTablet*,
doris::SyncOptions const&, doris::SyncRowsetStats*) at
/home/zcp/repo_center/doris_master/doris/be/src/cloud/cloud_meta_mgr.cpp:477
7# doris::CloudSchemaChangeJob::_process_delete_bitmap(long, long, long)
at
/home/zcp/repo_center/doris_master/doris/be/src/cloud/cloud_schema_change_job.cpp:519
8#
doris::CloudSchemaChangeJob::_convert_historical_rowsets(doris::SchemaChangeParams
const&, doris::cloud::TabletJobInfoPB&) at
/home/zcp/repo_center/doris_master/doris/be/src/cloud/cloud_schema_change_job.cpp:424
9#
doris::CloudSchemaChangeJob::process_alter_tablet(doris::TAlterTabletReqV2
const&) in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
10# doris::alter_cloud_tablet_callback(doris::CloudStorageEngine&,
doris::TAgentTaskRequest const&) at
/home/zcp/repo_center/doris_master/doris/be/src/agent/task_worker_pool.cpp:2176
11# std::_Function_handler<void (),
doris::TaskWorkerPool::submit_task(doris::TAgentTaskRequest
const&)::$_0::operator()<doris::TAgentTaskRequest
const&>(doris::TAgentTaskRequest const&)
const::{lambda()apache#1}>::_M_invoke(std::_Any_data const&) at
/usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:292
12# doris::ThreadPool::dispatch_thread() at
/home/zcp/repo_center/doris_master/doris/be/src/util/threadpool.cpp:621
13# doris::Thread::supervise_thread(void*) at
/home/zcp/repo_center/doris_master/doris/be/src/util/thread.cpp:461
liaoxin01 added a commit to liaoxin01/doris that referenced this pull request Sep 12, 2025
… when enable skip_writing_empty_rowset_metadata (apache#55742)

Issue Number: close #xxx

Related PR: apache#54395
@morrySnow morrySnow mentioned this pull request Sep 22, 2025
liaoxin01 added a commit that referenced this pull request Jan 13, 2026
…owsets (#59710)

### What problem does this PR solve?

Related PR: #54395

Problem Summary:
When skip_writing_empty_rowset_metadata is enabled, empty rowsets are
not committed to meta-service. Previously,
set_txn_related_delete_bitmap() would either:
1. Store full rowset info in cache (causing memory leak since
sync_tablet_delete_bitmap_by_cache() never cleans it up), or
2. Skip storing entirely (causing CalcDeleteBitmapTask to fail with
NOT_FOUND error)

This fix introduces a lightweight marker mechanism:
- For empty rowsets, store only a TxnKey marker (~16 bytes) instead of
full rowset info
- CalcDeleteBitmapTask checks for marker via is_empty_rowset() and
returns success if found (marker is NOT removed to support task retry)
- Cleanup is handled by expiration-based removal in
remove_expired_tablet_txn_info()
- Expiration time is consistent with set_tablet_txn_info()
github-actions bot pushed a commit that referenced this pull request Jan 13, 2026
…owsets (#59710)

### What problem does this PR solve?

Related PR: #54395

Problem Summary:
When skip_writing_empty_rowset_metadata is enabled, empty rowsets are
not committed to meta-service. Previously,
set_txn_related_delete_bitmap() would either:
1. Store full rowset info in cache (causing memory leak since
sync_tablet_delete_bitmap_by_cache() never cleans it up), or
2. Skip storing entirely (causing CalcDeleteBitmapTask to fail with
NOT_FOUND error)

This fix introduces a lightweight marker mechanism:
- For empty rowsets, store only a TxnKey marker (~16 bytes) instead of
full rowset info
- CalcDeleteBitmapTask checks for marker via is_empty_rowset() and
returns success if found (marker is NOT removed to support task retry)
- Cleanup is handled by expiration-based removal in
remove_expired_tablet_txn_info()
- Expiration time is consistent with set_tablet_txn_info()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants