Skip to content

Conversation

@wuwenchi
Copy link
Contributor

@wuwenchi wuwenchi commented Apr 24, 2025

What problem does this PR solve?

Followup #49956

Problem Summary:

When a snapshot is specified in the query, the corresponding schema should be used for parsing, otherwise the latest snapshot should be used for parsing.

  1. When using the HMS type, you also need to initialize the executor pool.
  2. Set the size of the thread pool to be equal to the number of cores of the current machine.
  3. When no snapshot is specified, the latest schema is used.
  4. When specifying a snapshot, you need to use the schema corresponding to the snapshot.
  5. When generating a scannode, save the schema information and no longer obtain it from the cache to prevent the cache from being refreshed.
  6. When refreshing the schema, you need to refresh all schemas of related tables.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@wuwenchi wuwenchi marked this pull request as ready for review April 24, 2025 07:34
@wuwenchi
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33871 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d22a8e24b415021a158788e6cda9eb9a8f3a2746, data reload: false

------ Round 1 ----------------------------------
q1	26257	5019	4990	4990
q2	2065	279	179	179
q3	10400	1255	692	692
q4	10230	1014	530	530
q5	7541	2430	2402	2402
q6	190	165	132	132
q7	947	765	622	622
q8	9323	1302	1075	1075
q9	6780	5111	5113	5111
q10	6802	2327	1869	1869
q11	471	282	260	260
q12	351	357	212	212
q13	17761	3673	3131	3131
q14	225	230	213	213
q15	530	466	481	466
q16	439	449	414	414
q17	618	872	372	372
q18	7648	7284	7033	7033
q19	1762	953	559	559
q20	322	329	215	215
q21	4252	3405	2439	2439
q22	1093	982	955	955
Total cold run time: 116007 ms
Total hot run time: 33871 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5169	5052	5518	5052
q2	238	328	227	227
q3	2220	2654	2285	2285
q4	1426	1871	1466	1466
q5	4478	4395	4442	4395
q6	213	167	126	126
q7	1997	1948	1749	1749
q8	2619	2595	2566	2566
q9	7369	7175	7093	7093
q10	3027	3222	2776	2776
q11	578	546	494	494
q12	710	795	602	602
q13	3476	3928	3420	3420
q14	277	282	267	267
q15	510	476	470	470
q16	456	504	475	475
q17	1168	1594	1377	1377
q18	7765	7587	7388	7388
q19	836	845	943	845
q20	1973	1953	1875	1875
q21	5246	4979	4889	4889
q22	1118	1064	985	985
Total cold run time: 52869 ms
Total hot run time: 50822 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192706 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d22a8e24b415021a158788e6cda9eb9a8f3a2746, data reload: false

query1	1416	1084	1064	1064
query2	6108	1779	1789	1779
query3	11158	4643	4648	4643
query4	53072	24665	23088	23088
query5	5205	565	448	448
query6	333	193	193	193
query7	4880	509	289	289
query8	329	255	233	233
query9	5765	2621	2597	2597
query10	422	330	257	257
query11	15078	15039	14871	14871
query12	158	113	106	106
query13	1049	524	398	398
query14	10101	6341	6376	6341
query15	210	201	179	179
query16	7103	650	481	481
query17	1109	713	584	584
query18	1567	412	316	316
query19	195	211	160	160
query20	134	125	128	125
query21	214	121	105	105
query22	4483	4559	4503	4503
query23	34102	33360	33679	33360
query24	6650	2438	2394	2394
query25	465	463	429	429
query26	679	272	153	153
query27	2258	507	347	347
query28	3026	2136	2128	2128
query29	585	569	423	423
query30	278	228	196	196
query31	939	869	772	772
query32	70	64	65	64
query33	458	381	319	319
query34	771	851	535	535
query35	782	809	743	743
query36	956	998	905	905
query37	113	102	76	76
query38	4154	4152	4230	4152
query39	1683	1467	1427	1427
query40	220	118	111	111
query41	56	56	55	55
query42	125	116	109	109
query43	502	518	478	478
query44	1355	833	852	833
query45	177	174	164	164
query46	835	1026	653	653
query47	1864	1858	1819	1819
query48	381	422	310	310
query49	671	503	409	409
query50	680	781	407	407
query51	4227	4402	4134	4134
query52	108	106	100	100
query53	232	262	198	198
query54	610	568	513	513
query55	85	85	82	82
query56	332	295	299	295
query57	1177	1185	1134	1134
query58	268	264	258	258
query59	2653	2794	2625	2625
query60	365	377	331	331
query61	134	131	129	129
query62	736	780	678	678
query63	228	195	193	193
query64	1728	1062	701	701
query65	4412	4338	4258	4258
query66	731	398	297	297
query67	15720	15740	15415	15415
query68	4626	892	518	518
query69	498	298	267	267
query70	1162	1144	1181	1144
query71	414	323	289	289
query72	5841	4678	4687	4678
query73	680	585	342	342
query74	9189	9050	9005	9005
query75	3195	3172	2693	2693
query76	3468	1180	746	746
query77	532	370	296	296
query78	9926	10000	9339	9339
query79	2647	802	568	568
query80	650	522	440	440
query81	481	250	211	211
query82	304	125	92	92
query83	249	243	237	237
query84	293	106	85	85
query85	773	363	344	344
query86	349	322	295	295
query87	4442	4284	4362	4284
query88	3647	2232	2227	2227
query89	414	323	291	291
query90	1719	207	210	207
query91	141	141	110	110
query92	66	57	61	57
query93	2530	911	568	568
query94	691	397	315	315
query95	370	292	297	292
query96	489	559	283	283
query97	3114	3232	3104	3104
query98	235	206	198	198
query99	1327	1391	1309	1309
Total cold run time: 293132 ms
Total hot run time: 192706 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.24 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d22a8e24b415021a158788e6cda9eb9a8f3a2746, data reload: false

query1	0.04	0.04	0.03
query2	0.12	0.10	0.12
query3	0.25	0.19	0.19
query4	1.59	0.18	0.20
query5	0.60	0.58	0.59
query6	1.19	0.71	0.72
query7	0.03	0.02	0.01
query8	0.04	0.04	0.04
query9	0.57	0.52	0.50
query10	0.57	0.57	0.56
query11	0.15	0.11	0.11
query12	0.14	0.11	0.12
query13	0.62	0.60	0.60
query14	1.19	1.17	1.23
query15	0.87	0.85	0.84
query16	0.39	0.39	0.37
query17	1.04	1.04	1.05
query18	0.21	0.20	0.20
query19	1.94	1.81	1.82
query20	0.01	0.01	0.01
query21	15.40	0.86	0.55
query22	0.75	1.04	0.64
query23	15.14	1.33	0.64
query24	6.83	2.02	1.47
query25	0.52	0.20	0.07
query26	0.62	0.16	0.14
query27	0.06	0.04	0.04
query28	9.54	0.82	0.43
query29	12.52	4.04	3.35
query30	0.25	0.09	0.06
query31	2.82	0.58	0.38
query32	3.28	0.55	0.47
query33	2.98	3.02	3.07
query34	15.84	5.13	4.51
query35	4.56	4.52	4.50
query36	0.68	0.49	0.48
query37	0.08	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.03	0.03
query40	0.17	0.13	0.12
query41	0.07	0.03	0.03
query42	0.04	0.02	0.02
query43	0.04	0.03	0.04
Total cold run time: 103.83 s
Total hot run time: 30.24 s

@morningman
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33756 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7a7cd47cb8c13990e5d665a47bc6cc1f00572641, data reload: false

------ Round 1 ----------------------------------
q1	26019	5085	5000	5000
q2	2063	265	180	180
q3	10404	1230	680	680
q4	10218	977	568	568
q5	7530	2355	2305	2305
q6	181	160	131	131
q7	898	738	596	596
q8	9299	1235	1079	1079
q9	6779	5111	5109	5109
q10	6785	2278	1886	1886
q11	486	279	265	265
q12	362	348	226	226
q13	17757	3598	3120	3120
q14	221	234	207	207
q15	527	498	484	484
q16	450	444	398	398
q17	589	834	353	353
q18	7511	7100	7022	7022
q19	1652	940	552	552
q20	330	342	221	221
q21	3830	2631	2402	2402
q22	1042	1019	972	972
Total cold run time: 114933 ms
Total hot run time: 33756 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5128	5051	5099	5051
q2	242	325	228	228
q3	2138	2646	2271	2271
q4	1426	1791	1381	1381
q5	4387	4360	4363	4360
q6	226	173	127	127
q7	1989	1935	1773	1773
q8	2623	2539	2499	2499
q9	7189	7234	6915	6915
q10	2980	3225	2746	2746
q11	582	513	505	505
q12	693	770	603	603
q13	3499	3944	3327	3327
q14	274	304	273	273
q15	527	487	478	478
q16	475	514	465	465
q17	1139	1552	1357	1357
q18	7696	7551	7397	7397
q19	784	801	877	801
q20	1964	2003	1835	1835
q21	5223	4880	4860	4860
q22	1089	1063	1027	1027
Total cold run time: 52273 ms
Total hot run time: 50279 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192072 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7a7cd47cb8c13990e5d665a47bc6cc1f00572641, data reload: false

query1	1389	1077	1054	1054
query2	6289	1809	1799	1799
query3	11032	4488	4368	4368
query4	53555	26046	22938	22938
query5	5221	494	483	483
query6	371	197	188	188
query7	4947	510	292	292
query8	333	250	243	243
query9	5872	2586	2596	2586
query10	425	312	248	248
query11	15176	14960	14741	14741
query12	154	114	103	103
query13	1050	493	381	381
query14	10112	6248	6404	6248
query15	210	194	188	188
query16	7009	639	476	476
query17	1083	716	560	560
query18	1554	391	320	320
query19	193	217	169	169
query20	136	122	116	116
query21	210	127	105	105
query22	4324	4545	4393	4393
query23	34239	33556	33677	33556
query24	6700	2445	2411	2411
query25	461	481	403	403
query26	666	276	156	156
query27	2165	513	334	334
query28	2946	2160	2114	2114
query29	597	572	474	474
query30	283	234	196	196
query31	864	896	820	820
query32	78	64	66	64
query33	463	383	326	326
query34	786	851	527	527
query35	796	828	745	745
query36	933	1017	903	903
query37	110	100	76	76
query38	4186	4222	4157	4157
query39	1513	1435	1429	1429
query40	215	117	106	106
query41	55	58	51	51
query42	117	109	113	109
query43	506	493	492	492
query44	1377	823	823	823
query45	182	174	168	168
query46	844	1053	645	645
query47	1825	1881	1790	1790
query48	397	419	319	319
query49	691	515	467	467
query50	651	692	423	423
query51	4207	4204	4118	4118
query52	107	104	96	96
query53	230	265	191	191
query54	624	605	518	518
query55	78	81	81	81
query56	298	314	289	289
query57	1167	1174	1137	1137
query58	269	265	257	257
query59	2677	2725	2682	2682
query60	316	328	297	297
query61	145	130	134	130
query62	757	759	673	673
query63	221	189	188	188
query64	1463	1140	795	795
query65	4516	4358	4212	4212
query66	833	396	304	304
query67	15693	15504	15269	15269
query68	7378	883	523	523
query69	544	297	268	268
query70	1157	1113	1106	1106
query71	498	321	291	291
query72	5580	4831	4893	4831
query73	1376	671	349	349
query74	8959	9203	8975	8975
query75	3750	3438	2679	2679
query76	4266	1192	744	744
query77	631	369	278	278
query78	10052	10081	9293	9293
query79	2070	878	557	557
query80	617	527	453	453
query81	484	256	220	220
query82	472	135	99	99
query83	252	248	235	235
query84	297	105	84	84
query85	887	355	313	313
query86	403	295	270	270
query87	4342	4398	4339	4339
query88	3579	2185	2174	2174
query89	401	322	280	280
query90	1784	205	214	205
query91	143	147	113	113
query92	72	59	58	58
query93	1893	948	588	588
query94	677	418	310	310
query95	367	289	287	287
query96	486	563	273	273
query97	3194	3225	3092	3092
query98	222	200	201	200
query99	1416	1405	1265	1265
Total cold run time: 297141 ms
Total hot run time: 192072 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.3 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7a7cd47cb8c13990e5d665a47bc6cc1f00572641, data reload: false

query1	0.03	0.04	0.03
query2	0.12	0.11	0.11
query3	0.25	0.19	0.19
query4	1.59	0.19	0.19
query5	0.59	0.59	0.58
query6	1.18	0.72	0.72
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.58	0.52	0.53
query10	0.57	0.57	0.56
query11	0.16	0.10	0.11
query12	0.14	0.11	0.11
query13	0.60	0.59	0.59
query14	1.15	1.21	1.16
query15	0.90	0.85	0.85
query16	0.37	0.37	0.38
query17	1.02	1.01	1.05
query18	0.21	0.20	0.19
query19	1.86	1.80	1.79
query20	0.01	0.01	0.01
query21	15.39	0.90	0.57
query22	0.76	1.27	0.64
query23	14.88	1.38	0.59
query24	7.46	1.95	0.53
query25	0.48	0.18	0.14
query26	0.67	0.16	0.15
query27	0.05	0.05	0.04
query28	9.00	0.85	0.43
query29	12.56	4.08	3.34
query30	0.26	0.09	0.07
query31	2.83	0.61	0.39
query32	3.22	0.55	0.48
query33	3.06	3.05	3.07
query34	15.68	5.07	4.46
query35	4.53	4.50	4.51
query36	0.67	0.51	0.48
query37	0.09	0.07	0.06
query38	0.06	0.04	0.03
query39	0.03	0.03	0.03
query40	0.17	0.13	0.13
query41	0.08	0.03	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.02
Total cold run time: 103.4 s
Total hot run time: 29.3 s

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

CREATE_TIME,
USE_META_CACHE);

protected static final int ICEBERG_CATALOG_EXECUTOR_THREAD_NUM = Runtime.getRuntime().availableProcessors();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it a bit of a luxury for each catalog to hold as many threads as the number of CPU cores?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it a bit of a luxury for each catalog to hold as many threads as the number of CPU cores?

Yes, it will be optimized later. This PR does not change the origin logic, just remove the code to a new place

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 29, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit 02306b4 into apache:master Apr 29, 2025
31 of 32 checks passed
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
### What problem does this PR solve?

Followup apache#49956

Problem Summary:

When a snapshot is specified in the query, the corresponding schema
should be used for parsing, otherwise the latest snapshot should be used
for parsing.

1. When using the HMS type, you also need to initialize the executor
pool.
2. Set the size of the thread pool to be equal to the number of cores of
the current machine.
3. When no snapshot is specified, the latest schema is used.
4. When specifying a snapshot, you need to use the schema corresponding
to the snapshot.
5. When generating a scannode, save the schema information and no longer
obtain it from the cache to prevent the cache from being refreshed.
6. When refreshing the schema, you need to refresh all schemas of
related tables.
morningman pushed a commit to morningman/doris that referenced this pull request Jun 24, 2025
Followup apache#49956

Problem Summary:

When a snapshot is specified in the query, the corresponding schema
should be used for parsing, otherwise the latest snapshot should be used
for parsing.

1. When using the HMS type, you also need to initialize the executor
pool.
2. Set the size of the thread pool to be equal to the number of cores of
the current machine.
3. When no snapshot is specified, the latest schema is used.
4. When specifying a snapshot, you need to use the schema corresponding
to the snapshot.
5. When generating a scannode, save the schema information and no longer
obtain it from the cache to prevent the cache from being refreshed.
6. When refreshing the schema, you need to refresh all schemas of
related tables.
morningman pushed a commit to morningman/doris that referenced this pull request Jun 25, 2025
Followup apache#49956

Problem Summary:

When a snapshot is specified in the query, the corresponding schema
should be used for parsing, otherwise the latest snapshot should be used
for parsing.

1. When using the HMS type, you also need to initialize the executor
pool.
2. Set the size of the thread pool to be equal to the number of cores of
the current machine.
3. When no snapshot is specified, the latest schema is used.
4. When specifying a snapshot, you need to use the schema corresponding
to the snapshot.
5. When generating a scannode, save the schema information and no longer
obtain it from the cache to prevent the cache from being refreshed.
6. When refreshing the schema, you need to refresh all schemas of
related tables.
morningman pushed a commit to morningman/doris that referenced this pull request Jun 30, 2025
Followup apache#49956

Problem Summary:

When a snapshot is specified in the query, the corresponding schema
should be used for parsing, otherwise the latest snapshot should be used
for parsing.

1. When using the HMS type, you also need to initialize the executor
pool.
2. Set the size of the thread pool to be equal to the number of cores of
the current machine.
3. When no snapshot is specified, the latest schema is used.
4. When specifying a snapshot, you need to use the schema corresponding
to the snapshot.
5. When generating a scannode, save the schema information and no longer
obtain it from the cache to prevent the cache from being refreshed.
6. When refreshing the schema, you need to refresh all schemas of
related tables.
morningman pushed a commit to morningman/doris that referenced this pull request Jun 30, 2025
Followup apache#49956

Problem Summary:

When a snapshot is specified in the query, the corresponding schema
should be used for parsing, otherwise the latest snapshot should be used
for parsing.

1. When using the HMS type, you also need to initialize the executor
pool.
2. Set the size of the thread pool to be equal to the number of cores of
the current machine.
3. When no snapshot is specified, the latest schema is used.
4. When specifying a snapshot, you need to use the schema corresponding
to the snapshot.
5. When generating a scannode, save the schema information and no longer
obtain it from the cache to prevent the cache from being refreshed.
6. When refreshing the schema, you need to refresh all schemas of
related tables.
morningman added a commit that referenced this pull request Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants