Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #46561

将原来的 同步/异步删除 cache meta + 同步/异步删除 cache data file 多维度的删除策略降维简化: 所有 cache
meta 都是同步删除(除正在使用,此case处理方式见下文),data file在 critical 场景同步删除、gc 场景下异步删除

异步清理调度的优化:
- 之前的调度逻辑会提前中断,导致清理效率低下
- 甚至调度会有概率进入某些状态导致清理无法继续进行
- 优化 CPU 使用,避免额外无效队列遍历
- 增加窗口算法对异步删除 data file 进行 qps 限制

优化标记删除:
- 之前的标记删除机制对 TTL data file 有两个方面的空间泄漏问题
- 扩展应用场景:从原来只能用于 clear_cache、reset_capacity缩容,扩展任意异步删除的场景
- 将新的标记删除机制 除应用在 正在引用的数据之外,还解决了 DOWNLOADING 状态数据的删除泄漏问题

fix 删除正在引用的数据过程的多处泄漏:
- 之前没有机制对于正在引用的数据进行标记删除,只能放任赦免
- 现在配合优化后的标记删除机制,使用析构函数在释放引用后自动删除

发现并修复队列操作中存在的内存写飞隐患
- reset_capacity 在迭代内部 erase容器条目,可能会导致指针悬空

其它小优化:
- 使用 concurrentqueue 代替之前的静态无锁队列:保持性能的同时减少队列满、进入同步删文件带来的 IO burst 及伴随的
cache lock 开销
- 清理弃用的 file_cache_ttl_valid_check_interval_second 配置:现在 ttl 支持 LRU
了,不用额外定时清理
- 多线程拆分:避免 metrics、resource limit、data file 清理、ttl 超时清理 相互影响

Signed-off-by: zhengyu <zhangzhengyu@selectdb.com>
@github-actions github-actions bot requested a review from dataroaring as a code owner January 23, 2025 11:20
@Thearas
Copy link
Contributor

Thearas commented Jan 23, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Jan 23, 2025
@Thearas
Copy link
Contributor

Thearas commented Jan 23, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41488 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 87023279e87a6fb53bf9cddb548e119030b735e1, data reload: false

------ Round 1 ----------------------------------
q1	17611	7463	7278	7278
q2	2066	174	167	167
q3	10639	1159	1239	1159
q4	10921	767	713	713
q5	8056	2942	2876	2876
q6	243	149	155	149
q7	991	636	623	623
q8	9494	2041	2094	2041
q9	6869	6520	6495	6495
q10	7217	2335	2384	2335
q11	599	273	268	268
q12	416	219	220	219
q13	17925	3075	3055	3055
q14	254	213	218	213
q15	583	526	510	510
q16	704	618	633	618
q17	1013	590	576	576
q18	7489	6798	6780	6780
q19	1404	1079	990	990
q20	472	200	192	192
q21	4192	3308	3248	3248
q22	1112	983	1002	983
Total cold run time: 110270 ms
Total hot run time: 41488 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7278	7253	7215	7215
q2	328	232	235	232
q3	3201	3035	2967	2967
q4	2052	1761	1822	1761
q5	5645	5765	5705	5705
q6	223	135	142	135
q7	2269	1781	1872	1781
q8	3359	3519	3494	3494
q9	8908	9012	8872	8872
q10	3570	3551	3565	3551
q11	606	506	498	498
q12	821	629	595	595
q13	9338	3123	3196	3123
q14	314	265	282	265
q15	565	514	508	508
q16	715	681	683	681
q17	1820	1622	1601	1601
q18	8300	7810	7601	7601
q19	1700	1600	1586	1586
q20	2039	1825	1801	1801
q21	5347	5162	5081	5081
q22	1098	1020	1004	1004
Total cold run time: 69496 ms
Total hot run time: 60057 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191915 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 87023279e87a6fb53bf9cddb548e119030b735e1, data reload: false

query1	973	373	366	366
query2	6508	2066	2157	2066
query3	6704	226	220	220
query4	33849	23475	23511	23475
query5	4357	461	461	461
query6	284	189	197	189
query7	4628	318	316	316
query8	286	220	230	220
query9	9876	2716	2698	2698
query10	468	279	260	260
query11	18014	15273	15146	15146
query12	165	101	103	101
query13	1648	447	417	417
query14	8774	7244	7098	7098
query15	254	175	186	175
query16	8138	460	500	460
query17	1639	578	553	553
query18	2133	321	316	316
query19	383	161	156	156
query20	120	110	111	110
query21	215	104	106	104
query22	4598	4309	4341	4309
query23	34837	34876	33988	33988
query24	11822	2783	2874	2783
query25	559	384	381	381
query26	1103	167	167	167
query27	2881	340	341	340
query28	7972	2448	2441	2441
query29	659	439	440	439
query30	334	165	165	165
query31	1025	807	826	807
query32	103	67	58	58
query33	792	307	302	302
query34	931	490	519	490
query35	880	707	705	705
query36	1109	952	945	945
query37	203	73	73	73
query38	4067	3879	3927	3879
query39	1472	1402	1443	1402
query40	289	97	97	97
query41	51	48	48	48
query42	113	105	100	100
query43	547	508	491	491
query44	1271	803	795	795
query45	181	166	166	166
query46	1126	709	702	702
query47	1921	1813	1869	1813
query48	471	374	373	373
query49	1263	381	390	381
query50	822	406	410	406
query51	7241	6939	7234	6939
query52	106	94	96	94
query53	259	189	188	188
query54	1173	465	465	465
query55	78	78	81	78
query56	264	256	259	256
query57	1254	1078	1077	1077
query58	235	209	206	206
query59	3291	3317	2855	2855
query60	282	260	253	253
query61	111	109	111	109
query62	876	679	673	673
query63	228	191	186	186
query64	4970	660	653	653
query65	3269	3205	3216	3205
query66	1283	321	324	321
query67	15846	15754	15509	15509
query68	3465	607	601	601
query69	408	274	279	274
query70	1199	1125	1147	1125
query71	349	267	258	258
query72	5794	4235	4140	4140
query73	746	348	361	348
query74	9948	8976	9105	8976
query75	3375	2657	2638	2638
query76	2015	1063	987	987
query77	398	287	299	287
query78	10359	9695	9609	9609
query79	2166	602	610	602
query80	902	474	447	447
query81	555	248	250	248
query82	783	122	127	122
query83	272	153	153	153
query84	252	85	86	85
query85	1397	374	288	288
query86	416	311	311	311
query87	4423	4380	4281	4281
query88	4100	2370	2372	2370
query89	394	288	291	288
query90	2145	186	189	186
query91	181	147	150	147
query92	70	54	53	53
query93	1510	562	560	560
query94	925	305	297	297
query95	356	261	257	257
query96	611	275	279	275
query97	3261	3165	3225	3165
query98	212	205	197	197
query99	1535	1320	1352	1320
Total cold run time: 298106 ms
Total hot run time: 191915 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.04 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 87023279e87a6fb53bf9cddb548e119030b735e1, data reload: false

query1	0.04	0.03	0.03
query2	0.06	0.04	0.03
query3	0.23	0.06	0.06
query4	1.62	0.10	0.10
query5	0.53	0.52	0.52
query6	1.14	0.73	0.72
query7	0.02	0.01	0.01
query8	0.04	0.03	0.04
query9	0.56	0.51	0.49
query10	0.54	0.54	0.55
query11	0.15	0.10	0.10
query12	0.15	0.11	0.11
query13	0.61	0.60	0.59
query14	3.10	2.99	2.92
query15	0.90	0.82	0.82
query16	0.39	0.39	0.40
query17	0.98	1.02	1.06
query18	0.23	0.24	0.22
query19	1.95	1.88	2.02
query20	0.01	0.01	0.02
query21	15.36	0.59	0.58
query22	2.97	2.44	0.98
query23	16.89	1.38	0.79
query24	3.07	0.86	1.19
query25	0.28	0.15	0.14
query26	0.42	0.14	0.14
query27	0.04	0.04	0.04
query28	10.62	1.10	1.07
query29	12.60	3.30	3.27
query30	0.25	0.06	0.06
query31	2.85	0.39	0.39
query32	3.24	0.46	0.46
query33	3.04	2.98	3.04
query34	17.30	4.46	4.54
query35	4.56	4.52	4.45
query36	0.67	0.48	0.50
query37	0.09	0.06	0.06
query38	0.04	0.04	0.03
query39	0.04	0.02	0.02
query40	0.16	0.12	0.12
query41	0.07	0.02	0.02
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 107.87 s
Total hot run time: 32.04 s

@dataroaring dataroaring merged commit bff7283 into branch-3.0 Jan 24, 2025
21 of 22 checks passed
@github-actions github-actions bot deleted the auto-pick-46561-branch-3.0 branch January 24, 2025 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants