Skip to content

Conversation

@zy-kkk
Copy link
Member

@zy-kkk zy-kkk commented Sep 10, 2025

What problem does this PR solve?

Problem:

Paimon's CachedClientPool uses a static cache with keys based on clientClassName, metastore.uris, and metastore type. For DLF catalogs, all these
values are identical, causing different DLF catalogs with different dlf.catalog_id configurations to incorrectly share the same HMS client pool.
This results in the last created catalog's configuration overriding previous ones.

Root Cause:

The cache key construction in CachedClientPool.extractKey() doesn't include DLF-specific configuration differences. Multiple catalogs with different
dlf.catalog_id values generate identical cache keys, leading to client pool pollution.

Solution:

Add dlf.catalog_id to the cache key by configuring client-pool-cache.keys = "conf:dlf.catalog.id" in
PaimonAliyunDLFMetaStoreProperties.appendCustomCatalogOptions(). This ensures each DLF catalog with a unique catalog_id gets its own HMS client
pool.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zy-kkk
Copy link
Member Author

zy-kkk commented Sep 10, 2025

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 10, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 34765 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 601b0129493b567ad3b8dc626dd47480a78c73d4, data reload: false

------ Round 1 ----------------------------------
q1	17606	5222	5144	5144
q2	2010	311	213	213
q3	10269	1290	702	702
q4	10231	1009	540	540
q5	7576	2463	2336	2336
q6	185	174	137	137
q7	938	747	642	642
q8	9363	1288	1121	1121
q9	6835	5142	5087	5087
q10	6938	2371	1977	1977
q11	496	317	271	271
q12	352	361	239	239
q13	17806	3639	3032	3032
q14	243	240	220	220
q15	560	483	491	483
q16	992	1001	953	953
q17	594	864	377	377
q18	7566	7142	7155	7142
q19	1483	954	562	562
q20	340	342	240	240
q21	3820	2553	2354	2354
q22	1068	1023	993	993
Total cold run time: 107271 ms
Total hot run time: 34765 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5395	5100	5146	5100
q2	249	333	227	227
q3	2148	2631	2302	2302
q4	1329	1763	1334	1334
q5	4234	4462	4605	4462
q6	222	174	134	134
q7	2047	2003	1837	1837
q8	2726	2642	2565	2565
q9	7330	7426	7458	7426
q10	3041	3355	2884	2884
q11	566	532	501	501
q12	680	776	635	635
q13	3461	3921	3567	3567
q14	293	309	280	280
q15	528	467	498	467
q16	1053	1085	1085	1085
q17	1264	1552	1423	1423
q18	8117	7670	7885	7670
q19	815	797	865	797
q20	1929	2155	1963	1963
q21	4930	4306	4321	4306
q22	1094	1041	1019	1019
Total cold run time: 53451 ms
Total hot run time: 51984 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 189124 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 601b0129493b567ad3b8dc626dd47480a78c73d4, data reload: false

query1	1070	439	427	427
query2	6573	1659	1690	1659
query3	6747	227	227	227
query4	26275	23492	23177	23177
query5	4447	641	494	494
query6	362	244	232	232
query7	4654	507	304	304
query8	295	257	249	249
query9	8631	2918	2886	2886
query10	477	380	303	303
query11	15981	15048	14845	14845
query12	171	120	116	116
query13	1666	545	441	441
query14	10342	9209	9200	9200
query15	213	194	175	175
query16	7151	675	518	518
query17	962	764	616	616
query18	2001	446	341	341
query19	202	197	171	171
query20	149	133	123	123
query21	214	132	117	117
query22	4076	4163	4238	4163
query23	33986	32829	32922	32829
query24	8167	2386	2416	2386
query25	592	525	452	452
query26	1241	280	174	174
query27	2722	512	356	356
query28	4384	2259	2265	2259
query29	836	638	510	510
query30	291	230	197	197
query31	891	806	728	728
query32	90	80	78	78
query33	582	398	356	356
query34	835	894	518	518
query35	825	810	788	788
query36	980	1012	915	915
query37	123	118	94	94
query38	3537	3530	3512	3512
query39	1517	1431	1490	1431
query40	233	137	128	128
query41	65	64	60	60
query42	135	119	122	119
query43	504	506	472	472
query44	1404	867	870	867
query45	183	186	165	165
query46	845	1011	652	652
query47	1776	1834	1767	1767
query48	411	449	326	326
query49	761	512	417	417
query50	656	688	413	413
query51	3904	4000	3880	3880
query52	120	120	113	113
query53	259	284	221	221
query54	634	609	570	570
query55	102	93	95	93
query56	367	362	341	341
query57	1203	1204	1131	1131
query58	301	294	296	294
query59	2513	2727	2499	2499
query60	380	365	356	356
query61	193	187	190	187
query62	831	741	672	672
query63	236	200	198	198
query64	4688	1256	861	861
query65	4074	3972	3977	3972
query66	1189	449	368	368
query67	15686	15276	15130	15130
query68	8792	933	581	581
query69	493	336	296	296
query70	1340	1297	1307	1297
query71	586	347	326	326
query72	6150	5001	5033	5001
query73	745	617	362	362
query74	8857	9255	8730	8730
query75	4118	3246	2771	2771
query76	3717	1160	747	747
query77	811	419	337	337
query78	9643	9732	8889	8889
query79	2811	852	582	582
query80	680	600	531	531
query81	477	262	225	225
query82	465	170	141	141
query83	294	259	243	243
query84	312	115	107	107
query85	949	468	447	447
query86	349	316	306	306
query87	3769	3750	3624	3624
query88	3152	2220	2197	2197
query89	417	343	292	292
query90	1925	228	227	227
query91	173	170	138	138
query92	93	80	75	75
query93	1815	954	656	656
query94	704	418	326	326
query95	419	342	339	339
query96	484	586	286	286
query97	2933	2997	2881	2881
query98	258	226	230	226
query99	1442	1415	1299	1299
Total cold run time: 277084 ms
Total hot run time: 189124 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.07 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 601b0129493b567ad3b8dc626dd47480a78c73d4, data reload: false

query1	0.06	0.05	0.04
query2	0.09	0.05	0.05
query3	0.25	0.09	0.08
query4	1.61	0.11	0.12
query5	0.28	0.27	0.25
query6	1.17	0.67	0.66
query7	0.04	0.03	0.03
query8	0.06	0.05	0.04
query9	0.62	0.53	0.52
query10	0.59	0.58	0.59
query11	0.17	0.11	0.11
query12	0.15	0.12	0.12
query13	0.63	0.62	0.66
query14	1.02	1.06	1.04
query15	0.87	0.86	0.86
query16	0.40	0.40	0.39
query17	1.07	1.03	1.08
query18	0.22	0.20	0.21
query19	1.91	1.80	1.85
query20	0.02	0.01	0.02
query21	15.39	0.94	0.58
query22	0.80	1.26	0.70
query23	14.78	1.39	0.60
query24	6.70	0.89	1.06
query25	0.49	0.21	0.09
query26	0.56	0.17	0.13
query27	0.08	0.04	0.05
query28	9.98	0.95	0.43
query29	12.59	3.86	3.31
query30	0.28	0.13	0.11
query31	2.83	0.60	0.38
query32	3.24	0.57	0.48
query33	3.12	3.16	3.17
query34	16.20	5.49	4.84
query35	4.88	4.90	4.91
query36	0.71	0.51	0.52
query37	0.11	0.07	0.07
query38	0.06	0.04	0.05
query39	0.04	0.03	0.02
query40	0.19	0.17	0.15
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.04
Total cold run time: 104.43 s
Total hot run time: 30.07 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/1) 🎉
Increment coverage report
Complete coverage report

@morningman morningman merged commit a71e750 into apache:master Sep 10, 2025
30 of 32 checks passed
github-actions bot pushed a commit that referenced this pull request Sep 10, 2025
…log.id to cache key (#55875)

### What problem does this PR solve?

#### Problem:
Paimon's CachedClientPool uses a static cache with keys based on
clientClassName, metastore.uris, and metastore type. For DLF catalogs,
all these
values are identical, causing different DLF catalogs with different
dlf.catalog_id configurations to incorrectly share the same HMS client
pool.
This results in the last created catalog's configuration overriding
previous ones.

#### Root Cause:
The cache key construction in CachedClientPool.extractKey() doesn't
include DLF-specific configuration differences. Multiple catalogs with
different
dlf.catalog_id values generate identical cache keys, leading to client
pool pollution.

#### Solution:
Add dlf.catalog_id to the cache key by configuring
client-pool-cache.keys = "conf:dlf.catalog.id" in
PaimonAliyunDLFMetaStoreProperties.appendCustomCatalogOptions(). This
ensures each DLF catalog with a unique catalog_id gets its own HMS
client
  pool.
morrySnow pushed a commit that referenced this pull request Sep 11, 2025
…ing dlf.catalog.id to cache key #55875 (#55888)

Cherry-picked from #55875

Co-authored-by: zy-kkk <zhongyk10@gmail.com>
@zy-kkk zy-kkk deleted the fix_paimon_dlf_client branch September 11, 2025 07:27
@morrySnow morrySnow mentioned this pull request Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.x dev/3.0.x-conflict dev/3.1.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants