Skip to content

Conversation

@liaoxin01
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #54395

Problem Summary:
When skip_writing_empty_rowset_metadata is enabled, empty rowsets are not committed to meta-service. Previously, set_txn_related_delete_bitmap() would either:

  1. Store full rowset info in cache (causing memory leak since sync_tablet_delete_bitmap_by_cache() never cleans it up), or
  2. Skip storing entirely (causing CalcDeleteBitmapTask to fail with NOT_FOUND error)

This fix introduces a lightweight marker mechanism:

  • For empty rowsets, store only a TxnKey marker (~16 bytes) instead of full rowset info
  • CalcDeleteBitmapTask checks for marker via is_empty_rowset() and returns success if found (marker is NOT removed to support task retry)
  • Cleanup is handled by expiration-based removal in remove_expired_tablet_txn_info()
  • Expiration time is consistent with set_tablet_txn_info()

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Copilot AI review requested due to automatic review settings January 9, 2026 04:54
@Thearas
Copy link
Contributor

Thearas commented Jan 9, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@liaoxin01 liaoxin01 force-pushed the fix-empty-rowset-delete-bitmap-cache-leak-master branch 2 times, most recently from 9d55040 to 1cd2f1c Compare January 9, 2026 04:56
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a memory leak in CloudTxnDeleteBitmapCache when handling empty rowsets with skip_writing_empty_rowset_metadata enabled. Previously, empty rowsets either stored full rowset info (causing memory leaks) or were skipped entirely (causing NOT_FOUND errors). The fix introduces a lightweight marker mechanism using only ~16 bytes per empty rowset.

Key changes:

  • Introduced _empty_rowset_markers set to track empty rowsets with minimal memory overhead
  • Added mark_empty_rowset() and is_empty_rowset() methods for marker management
  • Modified cleanup logic to handle both regular txn entries and empty rowset markers

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
be/src/cloud/cloud_txn_delete_bitmap_cache.h Added new methods and member variable for empty rowset marker tracking
be/src/cloud/cloud_txn_delete_bitmap_cache.cpp Implemented marker methods and updated cleanup logic to handle markers
be/src/cloud/cloud_rowset_builder.cpp Modified to call mark_empty_rowset for empty rowsets instead of storing full info
be/src/cloud/cloud_engine_calc_delete_bitmap_task.cpp Added check for empty rowset markers to skip calculation gracefully

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

gavinchou
gavinchou previously approved these changes Jan 9, 2026
Copy link
Contributor

@gavinchou gavinchou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 9, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

PR approved by anyone and no changes requested.

…owsets

When skip_writing_empty_rowset_metadata is enabled, empty rowsets are not
committed to meta-service. Previously, set_txn_related_delete_bitmap()
would either:
1. Store full rowset info in cache (causing memory leak since
   sync_tablet_delete_bitmap_by_cache() never cleans it up), or
2. Skip storing entirely (causing CalcDeleteBitmapTask to fail with
   NOT_FOUND error)

This fix introduces a lightweight marker mechanism:
- For empty rowsets, store only a TxnKey marker (~16 bytes) instead of
  full rowset info
- CalcDeleteBitmapTask checks for marker via is_empty_rowset() and returns
  success if found (marker is NOT removed to support task retry)
- Cleanup is handled by expiration-based removal in
  remove_expired_tablet_txn_info()
- Expiration time is consistent with set_tablet_txn_info()
@liaoxin01 liaoxin01 force-pushed the fix-empty-rowset-delete-bitmap-cache-leak-master branch from 1cd2f1c to 2363e19 Compare January 9, 2026 05:12
@liaoxin01
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Jan 9, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@doris-robot
Copy link

TPC-H: Total hot run time: 31408 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2363e19a83d981de49503db0c72c85f398749074, data reload: false

------ Round 1 ----------------------------------
q1	17630	4289	4064	4064
q2	2009	356	238	238
q3	10229	1333	757	757
q4	10683	878	325	325
q5	7970	2128	1870	1870
q6	197	176	141	141
q7	969	811	667	667
q8	9380	1445	1090	1090
q9	5262	4626	4636	4626
q10	6851	1831	1428	1428
q11	547	288	281	281
q12	716	750	650	650
q13	17771	3826	3076	3076
q14	282	302	277	277
q15	588	513	494	494
q16	682	671	625	625
q17	685	779	493	493
q18	6625	6449	6285	6285
q19	1095	958	574	574
q20	383	353	252	252
q21	2935	2437	2237	2237
q22	1083	1002	958	958
Total cold run time: 104572 ms
Total hot run time: 31408 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4128	4087	4059	4059
q2	324	414	314	314
q3	2079	2578	2229	2229
q4	1350	1757	1295	1295
q5	4165	3964	4001	3964
q6	217	172	129	129
q7	1865	1813	1992	1813
q8	2631	2390	2416	2390
q9	7291	7145	7100	7100
q10	2532	2792	2284	2284
q11	565	499	464	464
q12	727	763	666	666
q13	3599	4052	3491	3491
q14	320	310	289	289
q15	546	516	526	516
q16	666	678	665	665
q17	1153	1419	1596	1419
q18	7935	7839	7925	7839
q19	855	899	861	861
q20	2057	2059	1947	1947
q21	4916	4427	4218	4218
q22	1091	1007	980	980
Total cold run time: 51012 ms
Total hot run time: 48932 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173119 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2363e19a83d981de49503db0c72c85f398749074, data reload: false

query5	4404	587	441	441
query6	338	242	224	224
query7	4212	475	264	264
query8	342	254	242	242
query9	8729	2629	2673	2629
query10	491	382	308	308
query11	15220	15175	14990	14990
query12	176	114	115	114
query13	1261	484	372	372
query14	6162	3031	2804	2804
query14_1	2704	2648	2663	2648
query15	201	193	175	175
query16	987	475	442	442
query17	1118	694	553	553
query18	2447	417	331	331
query19	217	215	189	189
query20	123	126	115	115
query21	214	150	115	115
query22	3891	3943	4100	3943
query23	15905	15514	15375	15375
query23_1	15350	15386	15420	15386
query24	7372	1546	1149	1149
query24_1	1190	1158	1162	1158
query25	529	435	382	382
query26	1233	263	156	156
query27	2772	460	284	284
query28	4564	2126	2111	2111
query29	744	555	429	429
query30	312	243	208	208
query31	813	631	551	551
query32	74	65	69	65
query33	533	330	277	277
query34	894	872	520	520
query35	735	768	662	662
query36	860	862	813	813
query37	133	97	78	78
query38	2733	2758	2643	2643
query39	781	752	741	741
query39_1	719	721	725	721
query40	211	130	116	116
query41	66	61	61	61
query42	112	108	105	105
query43	487	440	434	434
query44	1324	728	723	723
query45	193	186	175	175
query46	849	959	589	589
query47	1459	1378	1361	1361
query48	317	330	244	244
query49	607	419	331	331
query50	633	277	210	210
query51	3883	3796	3779	3779
query52	106	108	99	99
query53	299	331	274	274
query54	282	272	251	251
query55	78	73	69	69
query56	296	297	300	297
query57	1028	1051	918	918
query58	262	246	240	240
query59	1943	2188	2093	2093
query60	314	331	300	300
query61	161	156	159	156
query62	393	353	343	343
query63	300	270	275	270
query64	4941	1290	993	993
query65	3674	3786	3725	3725
query66	1435	435	299	299
query67	14808	15533	14874	14874
query68	2704	1024	754	754
query69	455	346	302	302
query70	949	922	849	849
query71	312	296	280	280
query72	6072	3689	3655	3655
query73	583	726	304	304
query74	8822	8730	8618	8618
query75	2773	2826	2494	2494
query76	2901	1067	637	637
query77	348	394	286	286
query78	9859	10063	9208	9208
query79	1117	831	592	592
query80	1316	574	465	465
query81	549	261	227	227
query82	1109	145	110	110
query83	364	246	233	233
query84	252	122	101	101
query85	933	518	454	454
query86	408	319	317	317
query87	2845	2886	2727	2727
query88	3205	2211	2194	2194
query89	394	352	319	319
query90	1948	170	141	141
query91	171	161	143	143
query92	68	66	63	63
query93	966	912	526	526
query94	651	312	308	308
query95	563	323	307	307
query96	590	482	216	216
query97	2339	2375	2294	2294
query98	226	200	201	200
query99	602	591	525	525
Total cold run time: 247287 ms
Total hot run time: 173119 ms

@liaoxin01
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32202 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 82a960d567e253369fed21a35d3c5ccc75e79b20, data reload: false

------ Round 1 ----------------------------------
q1	17645	4273	4065	4065
q2	2106	347	240	240
q3	10142	1261	750	750
q4	10213	834	320	320
q5	7540	2131	1866	1866
q6	193	175	147	147
q7	958	807	662	662
q8	9262	1450	1259	1259
q9	4917	4655	4591	4591
q10	6755	1807	1406	1406
q11	504	293	291	291
q12	676	747	567	567
q13	17839	3835	3073	3073
q14	304	293	287	287
q15	582	526	514	514
q16	721	709	643	643
q17	689	828	499	499
q18	6561	6455	6934	6455
q19	1184	1066	629	629
q20	451	388	265	265
q21	3366	2709	2631	2631
q22	1158	1087	1042	1042
Total cold run time: 103766 ms
Total hot run time: 32202 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4342	4419	4275	4275
q2	328	425	318	318
q3	2289	2816	2499	2499
q4	1397	1874	1384	1384
q5	4448	4296	4252	4252
q6	216	179	134	134
q7	1979	1906	1822	1822
q8	2518	2430	2404	2404
q9	7299	7114	7124	7114
q10	2424	2887	2316	2316
q11	556	505	468	468
q12	721	786	626	626
q13	3608	4103	3389	3389
q14	287	306	284	284
q15	537	496	501	496
q16	609	656	620	620
q17	1109	1260	1292	1260
q18	7416	7277	7452	7277
q19	851	804	828	804
q20	1882	1942	1805	1805
q21	4539	4323	4276	4276
q22	1092	1059	992	992
Total cold run time: 50447 ms
Total hot run time: 48815 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172709 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 82a960d567e253369fed21a35d3c5ccc75e79b20, data reload: false

query5	5583	575	426	426
query6	329	234	227	227
query7	4219	480	264	264
query8	362	248	231	231
query9	8778	2662	2651	2651
query10	553	369	317	317
query11	15378	15160	14870	14870
query12	184	115	114	114
query13	1274	499	377	377
query14	7810	3073	2896	2896
query14_1	2692	2658	2698	2658
query15	253	197	176	176
query16	982	483	465	465
query17	1293	683	578	578
query18	2710	440	359	359
query19	279	247	202	202
query20	130	120	118	118
query21	222	138	125	125
query22	3951	3983	3876	3876
query23	15988	15760	15267	15267
query23_1	15476	15375	15497	15375
query24	6993	1545	1199	1199
query24_1	1215	1191	1192	1191
query25	554	486	431	431
query26	1228	267	166	166
query27	2705	456	295	295
query28	4467	2154	2139	2139
query29	803	555	461	461
query30	321	246	213	213
query31	825	633	567	567
query32	80	76	71	71
query33	535	361	302	302
query34	872	886	582	582
query35	713	762	680	680
query36	870	880	836	836
query37	122	90	72	72
query38	2787	2717	2662	2662
query39	774	748	730	730
query39_1	722	706	733	706
query40	213	132	114	114
query41	69	62	61	61
query42	107	102	104	102
query43	451	466	442	442
query44	1342	727	719	719
query45	191	187	178	178
query46	844	953	605	605
query47	1359	1459	1375	1375
query48	314	324	234	234
query49	607	417	357	357
query50	633	268	200	200
query51	3813	3810	3779	3779
query52	109	107	99	99
query53	309	324	272	272
query54	293	259	243	243
query55	76	71	72	71
query56	284	285	324	285
query57	1045	1035	976	976
query58	266	257	261	257
query59	2037	2016	1885	1885
query60	314	326	286	286
query61	165	157	153	153
query62	395	361	306	306
query63	293	269	272	269
query64	4818	1305	981	981
query65	3785	3750	3710	3710
query66	1357	426	309	309
query67	15234	14777	14809	14777
query68	6708	989	730	730
query69	504	343	303	303
query70	1074	990	973	973
query71	363	304	278	278
query72	6087	3471	3473	3471
query73	769	730	305	305
query74	8727	8783	8575	8575
query75	2849	2801	2454	2454
query76	3872	1075	647	647
query77	512	375	287	287
query78	9804	9816	9131	9131
query79	1523	920	581	581
query80	667	561	478	478
query81	528	269	230	230
query82	223	148	112	112
query83	273	258	243	243
query84	259	116	103	103
query85	890	500	451	451
query86	366	326	310	310
query87	2899	2894	2812	2812
query88	3171	2219	2201	2201
query89	392	354	338	338
query90	2327	156	148	148
query91	173	159	143	143
query92	88	69	66	66
query93	1530	915	532	532
query94	563	329	280	280
query95	557	334	307	307
query96	592	466	205	205
query97	2340	2412	2313	2313
query98	237	206	200	200
query99	601	573	494	494
Total cold run time: 256433 ms
Total hot run time: 172709 ms

@liaoxin01
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32074 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit cf4031baccf1a81a3ea0db0d909aa064ee22340d, data reload: false

------ Round 1 ----------------------------------
q1	17640	4253	4036	4036
q2	2019	352	247	247
q3	10160	1260	708	708
q4	10233	886	318	318
q5	7542	2074	1911	1911
q6	194	168	140	140
q7	941	796	658	658
q8	9287	1360	1198	1198
q9	4782	4657	4595	4595
q10	6825	1808	1389	1389
q11	504	300	271	271
q12	685	740	594	594
q13	17776	3789	3038	3038
q14	293	292	273	273
q15	598	525	502	502
q16	694	684	634	634
q17	668	827	492	492
q18	6545	6558	6808	6558
q19	1149	1055	647	647
q20	421	366	268	268
q21	3146	2603	2548	2548
q22	1139	1064	1049	1049
Total cold run time: 103241 ms
Total hot run time: 32074 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4330	4208	4325	4208
q2	348	428	322	322
q3	2248	2769	2446	2446
q4	1512	1835	1380	1380
q5	4484	4357	4204	4204
q6	216	167	127	127
q7	1989	1930	1721	1721
q8	2515	2407	2312	2312
q9	7043	7265	7248	7248
q10	2602	2712	2164	2164
q11	565	497	472	472
q12	699	742	633	633
q13	3463	3875	3040	3040
q14	286	284	263	263
q15	523	492	493	492
q16	605	652	613	613
q17	1051	1244	1285	1244
q18	7495	7577	7321	7321
q19	815	799	779	779
q20	1911	1968	1795	1795
q21	4550	4212	4042	4042
q22	1055	1004	968	968
Total cold run time: 50305 ms
Total hot run time: 47794 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 171672 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit cf4031baccf1a81a3ea0db0d909aa064ee22340d, data reload: false

query5	4467	567	460	460
query6	330	233	201	201
query7	4214	462	262	262
query8	324	241	258	241
query9	8758	2633	2629	2629
query10	515	371	314	314
query11	14999	15156	14686	14686
query12	178	119	118	118
query13	1247	466	368	368
query14	6321	2981	2756	2756
query14_1	2672	2633	2682	2633
query15	207	194	178	178
query16	991	506	456	456
query17	1112	680	581	581
query18	2644	430	348	348
query19	222	225	198	198
query20	127	121	117	117
query21	221	142	116	116
query22	4237	4186	3830	3830
query23	15813	15488	15331	15331
query23_1	15461	15373	15456	15373
query24	7461	1530	1170	1170
query24_1	1199	1170	1215	1170
query25	558	463	415	415
query26	1244	273	151	151
query27	2767	448	290	290
query28	4523	2121	2094	2094
query29	807	565	498	498
query30	312	235	209	209
query31	764	647	544	544
query32	73	68	67	67
query33	526	329	274	274
query34	877	886	517	517
query35	724	760	662	662
query36	843	873	763	763
query37	134	92	84	84
query38	2744	2677	2633	2633
query39	771	742	724	724
query39_1	707	729	720	720
query40	212	130	112	112
query41	65	60	63	60
query42	103	101	99	99
query43	478	482	430	430
query44	1304	704	713	704
query45	187	181	174	174
query46	830	951	580	580
query47	1456	1520	1369	1369
query48	311	312	233	233
query49	604	413	316	316
query50	628	271	197	197
query51	3708	3913	3745	3745
query52	100	106	94	94
query53	284	324	270	270
query54	281	259	242	242
query55	77	76	74	74
query56	286	289	274	274
query57	1012	1029	960	960
query58	264	240	243	240
query59	2043	2125	1943	1943
query60	310	315	291	291
query61	192	160	155	155
query62	376	348	327	327
query63	309	266	271	266
query64	4859	1305	988	988
query65	3821	3769	3754	3754
query66	1396	411	295	295
query67	15056	15638	14772	14772
query68	6504	994	704	704
query69	493	357	309	309
query70	1038	965	946	946
query71	361	300	273	273
query72	5947	3433	3494	3433
query73	770	721	292	292
query74	8764	8753	8590	8590
query75	2805	2790	2441	2441
query76	3347	1052	631	631
query77	520	360	277	277
query78	9622	9903	9099	9099
query79	1001	901	576	576
query80	645	544	464	464
query81	481	262	228	228
query82	206	147	113	113
query83	259	248	241	241
query84	257	112	104	104
query85	863	506	447	447
query86	323	309	317	309
query87	2864	2923	2732	2732
query88	3056	2205	2202	2202
query89	387	352	322	322
query90	1960	154	144	144
query91	167	162	136	136
query92	67	63	58	58
query93	938	896	529	529
query94	564	321	296	296
query95	557	328	357	328
query96	574	444	201	201
query97	2340	2367	2298	2298
query98	211	211	195	195
query99	585	626	515	515
Total cold run time: 250013 ms
Total hot run time: 171672 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/52) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.02% (18978/35791)
Line Coverage 39.09% (175882/449925)
Region Coverage 33.68% (136280/404603)
Branch Coverage 34.70% (58878/169696)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 90.38% (47/52) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.85% (25837/34986)
Line Coverage 61.28% (274963/448692)
Region Coverage 56.15% (229480/408719)
Branch Coverage 58.09% (98899/170245)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 90.91% (50/55) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.84% (25833/34986)
Line Coverage 61.27% (274921/448692)
Region Coverage 56.13% (229432/408719)
Branch Coverage 58.08% (98882/170245)

@yiguolei yiguolei added the p0_b label Jan 13, 2026
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 13, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@liaoxin01 liaoxin01 merged commit e751087 into apache:master Jan 13, 2026
28 of 29 checks passed
@liaoxin01 liaoxin01 deleted the fix-empty-rowset-delete-bitmap-cache-leak-master branch January 13, 2026 07:33
github-actions bot pushed a commit that referenced this pull request Jan 13, 2026
…owsets (#59710)

### What problem does this PR solve?

Related PR: #54395

Problem Summary:
When skip_writing_empty_rowset_metadata is enabled, empty rowsets are
not committed to meta-service. Previously,
set_txn_related_delete_bitmap() would either:
1. Store full rowset info in cache (causing memory leak since
sync_tablet_delete_bitmap_by_cache() never cleans it up), or
2. Skip storing entirely (causing CalcDeleteBitmapTask to fail with
NOT_FOUND error)

This fix introduces a lightweight marker mechanism:
- For empty rowsets, store only a TxnKey marker (~16 bytes) instead of
full rowset info
- CalcDeleteBitmapTask checks for marker via is_empty_rowset() and
returns success if found (marker is NOT removed to support task retry)
- Cleanup is handled by expiration-based removal in
remove_expired_tablet_txn_info()
- Expiration time is consistent with set_tablet_txn_info()
yiguolei pushed a commit that referenced this pull request Jan 14, 2026
… for empty rowsets #59710 (#59819)

Cherry-picked from #59710

Co-authored-by: Xin Liao <liaoxin@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants