Skip to content

Conversation

@kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented Dec 25, 2024

What problem does this PR solve?

related: apache/doris-thirdparty#270

Problem Summary:

The original merge io mechanism MergeRangeFileReader requires that the range must be read in order, and the ranges can be out of order, so the range cannot be read back.
And if you turn on delayed materialization of orc complex types, you will need to present a stream readback scenario, such as select struct_element(info, 'age'), id from test_orc_struct, where struct_element(info, 'name') = 'Alice'.
When late materialization is turned on, the present stream of the parent node info will be read first after name is read. When reading age, the parent node info needs to be read back. So the late materialization of the orc complex type cannot be turned on at present.

Release note

The new merge io mechanism classifies the ranges read by the stream of orc stripe into small ranges and large ranges according to the orc_once_max_read_bytes size. The ranges smaller than the orc_once_max_read_bytes size are divided into small ranges, and the ranges exceeding the orc_once_max_read_bytes size are divided into large ranges.
Finally, the merging of adjacent intervals for small ranges is established. The maximum merging length is orc_once_max_read_bytes, and the maximum merging distance allowed between intervals is orc_max_merge_distance_bytes. The merged range is established through a cache of the merged range to a reader in memory, and a corresponding inputstream is builded for the lower layer orc reader to read. Large ranges are read directly through the underlying file reader. The current implementation is able to read arbitrarily in the merged range.

Future Work

Currently, implementations like OrcMergeRangeFileReader and RangeCacheFileReader must finally use memcpy from the cache to the result slice due to the limitations of the FileReader interface. But in theory, it is possible not to do memcpy, but to directly point to the cache location to represent the slice. This can be reconstructed and optimized in the future.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Dec 25, 2024

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the new_merge_io_for_orc_reader branch from 95af48c to 772ffb6 Compare December 25, 2024 15:54
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the new_merge_io_for_orc_reader branch from 772ffb6 to 7df1d9d Compare December 25, 2024 17:34
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the new_merge_io_for_orc_reader branch from 7df1d9d to 2fecd9c Compare December 25, 2024 18:12
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the new_merge_io_for_orc_reader branch from 2fecd9c to ee35b47 Compare December 26, 2024 01:21
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the new_merge_io_for_orc_reader branch from ee35b47 to 5b1e090 Compare December 27, 2024 08:55
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32432 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5b1e0902555b2fac4d90160d04ecd0671bd2b6ad, data reload: false

------ Round 1 ----------------------------------
q1	17658	6117	6041	6041
q2	2055	308	162	162
q3	10498	1276	706	706
q4	10240	876	427	427
q5	7920	2244	1976	1976
q6	202	183	150	150
q7	912	730	620	620
q8	9252	1373	1172	1172
q9	5293	4896	4992	4896
q10	6747	2339	1865	1865
q11	473	277	241	241
q12	347	359	219	219
q13	17788	3658	2956	2956
q14	226	238	226	226
q15	563	491	505	491
q16	633	632	591	591
q17	575	860	322	322
q18	7258	6484	6358	6358
q19	2198	969	574	574
q20	301	311	182	182
q21	2836	2154	1946	1946
q22	365	325	311	311
Total cold run time: 104340 ms
Total hot run time: 32432 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6319	6206	6230	6206
q2	240	328	236	236
q3	2230	2650	2325	2325
q4	1421	1854	1343	1343
q5	4370	4781	4960	4781
q6	182	175	141	141
q7	2133	2008	1788	1788
q8	2624	2803	2674	2674
q9	7485	7339	7238	7238
q10	3046	3384	2812	2812
q11	580	539	494	494
q12	671	753	616	616
q13	3332	3762	3070	3070
q14	288	325	291	291
q15	571	514	507	507
q16	662	694	666	666
q17	1210	1712	1245	1245
q18	7730	7350	6953	6953
q19	787	997	1099	997
q20	1979	2032	1789	1789
q21	5502	5005	4771	4771
q22	594	631	546	546
Total cold run time: 53956 ms
Total hot run time: 51489 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190964 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5b1e0902555b2fac4d90160d04ecd0671bd2b6ad, data reload: false

query1	998	416	383	383
query2	6513	2475	2468	2468
query3	6712	228	219	219
query4	33586	23619	23448	23448
query5	4326	623	460	460
query6	298	210	197	197
query7	4614	493	323	323
query8	307	253	235	235
query9	9697	2751	2742	2742
query10	481	312	267	267
query11	18287	15459	15137	15137
query12	165	107	114	107
query13	1685	536	425	425
query14	11039	6770	7251	6770
query15	249	188	187	187
query16	8090	578	469	469
query17	1552	743	550	550
query18	2135	390	290	290
query19	204	188	148	148
query20	120	111	106	106
query21	203	120	101	101
query22	4329	4509	4321	4321
query23	34489	33612	33710	33612
query24	6466	2254	2210	2210
query25	502	444	384	384
query26	1188	266	155	155
query27	2037	463	342	342
query28	5353	2452	2424	2424
query29	751	550	425	425
query30	225	180	153	153
query31	980	924	798	798
query32	99	61	59	59
query33	498	354	288	288
query34	774	843	511	511
query35	815	801	745	745
query36	1018	1052	947	947
query37	121	96	78	78
query38	4126	4137	4155	4137
query39	1501	1461	1389	1389
query40	212	118	103	103
query41	52	47	51	47
query42	120	108	105	105
query43	523	534	499	499
query44	1337	815	831	815
query45	188	181	172	172
query46	895	1041	695	695
query47	1946	1955	1860	1860
query48	385	409	322	322
query49	764	464	382	382
query50	621	656	390	390
query51	7145	7100	6960	6960
query52	108	102	93	93
query53	232	258	182	182
query54	473	493	399	399
query55	81	78	82	78
query56	259	252	236	236
query57	1225	1186	1139	1139
query58	228	218	226	218
query59	3164	3184	3082	3082
query60	271	271	243	243
query61	109	108	111	108
query62	867	810	744	744
query63	273	192	190	190
query64	4584	982	651	651
query65	3259	3242	3270	3242
query66	1057	408	315	315
query67	15958	15863	15541	15541
query68	8984	774	509	509
query69	468	280	245	245
query70	1226	1106	1155	1106
query71	441	282	249	249
query72	5785	3849	3825	3825
query73	658	754	361	361
query74	9898	9079	9104	9079
query75	4566	3166	2638	2638
query76	4182	1166	809	809
query77	807	372	296	296
query78	9976	10089	9612	9612
query79	3547	907	596	596
query80	720	525	438	438
query81	469	265	243	243
query82	609	153	123	123
query83	202	169	146	146
query84	282	94	77	77
query85	844	370	319	319
query86	353	331	299	299
query87	4628	4635	4653	4635
query88	4433	2216	2194	2194
query89	412	327	299	299
query90	1896	194	194	194
query91	138	139	114	114
query92	65	60	51	51
query93	995	882	524	524
query94	741	386	292	292
query95	333	271	258	258
query96	483	615	289	289
query97	2752	2824	2696	2696
query98	234	203	190	190
query99	1721	1582	1437	1437
Total cold run time: 295717 ms
Total hot run time: 190964 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.76 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5b1e0902555b2fac4d90160d04ecd0671bd2b6ad, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.03
query3	0.24	0.07	0.07
query4	1.60	0.11	0.12
query5	0.43	0.42	0.39
query6	1.19	0.65	0.66
query7	0.02	0.01	0.01
query8	0.04	0.02	0.03
query9	0.58	0.50	0.50
query10	0.55	0.57	0.55
query11	0.15	0.09	0.10
query12	0.13	0.11	0.12
query13	0.61	0.61	0.59
query14	2.71	2.79	2.74
query15	0.89	0.82	0.84
query16	0.38	0.39	0.38
query17	1.11	1.04	1.03
query18	0.22	0.20	0.20
query19	1.89	1.78	1.99
query20	0.01	0.01	0.02
query21	15.35	0.94	0.58
query22	0.76	0.72	0.71
query23	15.31	1.40	0.57
query24	2.69	0.63	1.73
query25	0.23	0.15	0.08
query26	0.24	0.14	0.12
query27	0.05	0.04	0.05
query28	14.24	1.60	1.05
query29	12.57	3.91	3.21
query30	0.25	0.09	0.06
query31	2.83	0.61	0.38
query32	3.22	0.53	0.47
query33	3.19	3.01	3.08
query34	16.59	5.05	4.48
query35	4.45	4.47	4.46
query36	0.63	0.51	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.16	0.14	0.14
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.03	0.02	0.03
Total cold run time: 105.93 s
Total hot run time: 30.76 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.88% (10125/26044)
Line Coverage: 29.88% (85541/286297)
Region Coverage: 29.02% (43719/150669)
Branch Coverage: 25.55% (22296/87270)
Coverage Report: http://coverage.selectdb-in.cc/coverage/5b1e0902555b2fac4d90160d04ecd0671bd2b6ad_5b1e0902555b2fac4d90160d04ecd0671bd2b6ad/report/index.html

@morningman
Copy link
Contributor

This pull request introduces a new OrcMergeRangeFileReader class and enhances the ORC file reading process with improved profiling and optimized I/O operations. The most important changes include adding new classes and methods, updating existing methods for better performance, and incorporating new profiling capabilities.

Enhancements to ORC file reading:

Updates to ORC reader implementation:

Profiling improvements:

These changes aim to optimize the ORC file reading process by merging small I/O operations, improving profiling, and handling large I/O operations more efficiently.

@kaka11chen kaka11chen force-pushed the new_merge_io_for_orc_reader branch from 5b1e090 to d9c405d Compare January 10, 2025 01:56
@kaka11chen kaka11chen marked this pull request as ready for review January 10, 2025 01:56
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33560 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d9c405db1cbb8caea5f8ab3c683e1af67d2d55a8, data reload: false

------ Round 1 ----------------------------------
q1	17629	6335	6279	6279
q2	2070	320	179	179
q3	10488	1312	775	775
q4	10218	924	460	460
q5	7703	2308	2087	2087
q6	218	187	150	150
q7	937	760	611	611
q8	9240	1510	1293	1293
q9	5401	5069	5028	5028
q10	6814	2322	1875	1875
q11	526	297	268	268
q12	362	388	232	232
q13	17758	3731	3094	3094
q14	250	246	220	220
q15	574	518	498	498
q16	644	613	575	575
q17	609	902	347	347
q18	7229	6459	6511	6459
q19	2474	1053	565	565
q20	308	335	203	203
q21	3049	2243	2045	2045
q22	369	346	317	317
Total cold run time: 104870 ms
Total hot run time: 33560 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6588	6508	6549	6508
q2	265	344	237	237
q3	2308	2728	2316	2316
q4	1471	1868	1367	1367
q5	4413	4901	4971	4901
q6	212	182	148	148
q7	2195	1985	1848	1848
q8	2725	2967	2844	2844
q9	7357	7262	7242	7242
q10	3042	3273	2922	2922
q11	611	521	526	521
q12	701	753	630	630
q13	3553	3855	3221	3221
q14	292	306	271	271
q15	577	517	517	517
q16	688	683	661	661
q17	1283	1801	1299	1299
q18	7773	7653	7430	7430
q19	855	1159	1309	1159
q20	2037	2041	1933	1933
q21	5822	5174	5192	5174
q22	639	629	603	603
Total cold run time: 55407 ms
Total hot run time: 53752 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 194809 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d9c405db1cbb8caea5f8ab3c683e1af67d2d55a8, data reload: false

query1	1320	971	925	925
query2	6338	2331	2332	2331
query3	10994	4573	4657	4573
query4	32968	23904	23160	23160
query5	4586	608	455	455
query6	303	202	193	193
query7	3983	489	304	304
query8	291	243	234	234
query9	9435	2710	2698	2698
query10	481	324	245	245
query11	17857	15185	15121	15121
query12	151	103	102	102
query13	1528	517	421	421
query14	10767	7334	7109	7109
query15	245	199	193	193
query16	8032	644	447	447
query17	1585	768	610	610
query18	2093	424	333	333
query19	208	206	180	180
query20	130	119	123	119
query21	209	128	105	105
query22	4504	4443	4246	4246
query23	34018	33359	33261	33261
query24	6450	2312	2306	2306
query25	489	457	405	405
query26	773	278	155	155
query27	2120	460	332	332
query28	5893	2487	2449	2449
query29	619	563	425	425
query30	210	182	162	162
query31	955	875	769	769
query32	72	61	57	57
query33	481	356	331	331
query34	751	855	522	522
query35	793	796	767	767
query36	1033	1043	959	959
query37	127	106	78	78
query38	4018	4261	4343	4261
query39	1522	1442	1481	1442
query40	204	121	106	106
query41	53	54	47	47
query42	132	107	103	103
query43	536	546	486	486
query44	1371	852	850	850
query45	188	171	161	161
query46	891	1051	666	666
query47	1897	1876	1868	1868
query48	385	416	333	333
query49	733	494	400	400
query50	645	651	386	386
query51	7140	6980	6956	6956
query52	104	99	91	91
query53	228	260	179	179
query54	485	510	423	423
query55	96	84	81	81
query56	258	251	248	248
query57	1212	1177	1160	1160
query58	236	242	219	219
query59	3156	3397	3225	3225
query60	285	296	266	266
query61	123	110	113	110
query62	859	793	719	719
query63	226	190	186	186
query64	3437	1014	702	702
query65	3419	3242	3198	3198
query66	782	403	306	306
query67	15955	15719	15391	15391
query68	7785	708	529	529
query69	496	295	259	259
query70	1212	1144	1088	1088
query71	444	303	253	253
query72	6516	3831	3876	3831
query73	658	745	358	358
query74	10442	9079	8560	8560
query75	4088	3150	2667	2667
query76	3698	1171	769	769
query77	773	366	281	281
query78	10088	9996	9356	9356
query79	3776	791	584	584
query80	716	523	448	448
query81	508	264	225	225
query82	645	154	118	118
query83	170	176	146	146
query84	252	93	80	80
query85	795	369	384	369
query86	401	303	302	302
query87	4495	4513	4426	4426
query88	4764	2157	2174	2157
query89	426	330	288	288
query90	1800	191	186	186
query91	134	134	169	134
query92	63	56	52	52
query93	2490	867	521	521
query94	674	397	281	281
query95	335	266	250	250
query96	500	606	281	281
query97	2851	2931	2770	2770
query98	223	199	192	192
query99	1476	1509	1397	1397
Total cold run time: 297062 ms
Total hot run time: 194809 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 39.34% (10259/26080)
Line Coverage: 30.52% (87450/286573)
Region Coverage: 29.56% (44564/150779)
Branch Coverage: 26.11% (22811/87374)
Coverage Report: http://coverage.selectdb-in.cc/coverage/d9c405db1cbb8caea5f8ab3c683e1af67d2d55a8_d9c405db1cbb8caea5f8ab3c683e1af67d2d55a8/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 30.93 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d9c405db1cbb8caea5f8ab3c683e1af67d2d55a8, data reload: false

query1	0.03	0.03	0.04
query2	0.08	0.03	0.04
query3	0.23	0.06	0.07
query4	1.63	0.11	0.10
query5	0.42	0.42	0.42
query6	1.15	0.66	0.66
query7	0.03	0.02	0.01
query8	0.05	0.03	0.03
query9	0.59	0.49	0.51
query10	0.56	0.56	0.55
query11	0.15	0.11	0.10
query12	0.14	0.11	0.11
query13	0.60	0.61	0.61
query14	2.86	2.72	2.73
query15	0.90	0.82	0.83
query16	0.39	0.39	0.38
query17	0.99	1.03	1.04
query18	0.23	0.20	0.21
query19	1.97	1.95	1.88
query20	0.01	0.01	0.01
query21	15.39	0.90	0.59
query22	0.77	0.74	0.69
query23	15.29	1.45	0.56
query24	3.36	1.65	0.56
query25	0.25	0.08	0.05
query26	0.25	0.15	0.13
query27	0.06	0.04	0.07
query28	13.57	1.50	1.04
query29	12.61	3.98	3.26
query30	0.25	0.09	0.07
query31	2.85	0.61	0.39
query32	3.23	0.55	0.47
query33	3.17	3.08	3.10
query34	16.92	5.07	4.48
query35	4.41	4.49	4.48
query36	0.64	0.49	0.49
query37	0.09	0.07	0.06
query38	0.04	0.04	0.04
query39	0.04	0.03	0.02
query40	0.16	0.13	0.12
query41	0.08	0.04	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.52 s
Total hot run time: 30.93 s

@morningman morningman force-pushed the new_merge_io_for_orc_reader branch from d9c405d to 97f3975 Compare February 8, 2025 05:29
@morningman
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31560 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 97f39752b05aba277bb2269054867979328506df, data reload: false

------ Round 1 ----------------------------------
q1	17617	5335	5092	5092
q2	2057	320	177	177
q3	10437	1262	751	751
q4	10200	1019	537	537
q5	7521	2392	2376	2376
q6	185	165	132	132
q7	908	757	600	600
q8	9309	1303	1004	1004
q9	4906	4521	4824	4521
q10	6803	2339	1879	1879
q11	475	282	251	251
q12	345	343	214	214
q13	17779	3692	3135	3135
q14	233	244	229	229
q15	511	488	484	484
q16	636	620	608	608
q17	565	868	349	349
q18	6790	6252	6250	6250
q19	1681	951	538	538
q20	316	321	186	186
q21	2758	2186	1949	1949
q22	363	324	298	298
Total cold run time: 102395 ms
Total hot run time: 31560 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5138	5173	5076	5076
q2	232	330	227	227
q3	2136	2700	2294	2294
q4	1424	1823	1336	1336
q5	4232	4176	4155	4155
q6	209	168	124	124
q7	1846	1849	1687	1687
q8	2598	2591	2581	2581
q9	7200	7191	7096	7096
q10	3005	3214	2798	2798
q11	586	510	504	504
q12	675	782	609	609
q13	3483	3853	3385	3385
q14	291	298	273	273
q15	512	466	465	465
q16	642	678	635	635
q17	1132	1654	1316	1316
q18	7571	7558	7271	7271
q19	800	820	811	811
q20	1993	2022	1915	1915
q21	5499	4992	4966	4966
q22	620	582	559	559
Total cold run time: 51824 ms
Total hot run time: 50083 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190810 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 97f39752b05aba277bb2269054867979328506df, data reload: false

query1	1312	948	934	934
query2	6216	1876	1839	1839
query3	11008	4635	4615	4615
query4	54199	25750	23095	23095
query5	5330	535	477	477
query6	411	205	185	185
query7	5192	513	302	302
query8	338	248	233	233
query9	6887	2573	2590	2573
query10	414	310	273	273
query11	15356	15080	15211	15080
query12	164	107	110	107
query13	1178	546	396	396
query14	10580	6528	6921	6528
query15	210	199	192	192
query16	7010	662	491	491
query17	1094	744	592	592
query18	1548	418	322	322
query19	214	205	180	180
query20	142	132	167	132
query21	206	124	100	100
query22	4609	4410	4341	4341
query23	34040	33490	33289	33289
query24	5689	2477	2437	2437
query25	478	469	401	401
query26	787	278	158	158
query27	2063	496	332	332
query28	2754	2445	2442	2442
query29	586	590	441	441
query30	219	184	168	168
query31	899	855	810	810
query32	78	66	61	61
query33	447	349	312	312
query34	758	859	518	518
query35	794	843	760	760
query36	947	1023	912	912
query37	133	113	82	82
query38	4355	4273	4294	4273
query39	1492	1467	1450	1450
query40	220	121	104	104
query41	55	50	48	48
query42	125	114	115	114
query43	527	517	486	486
query44	1365	829	824	824
query45	180	175	167	167
query46	908	1110	683	683
query47	1832	1861	1799	1799
query48	398	426	337	337
query49	723	508	435	435
query50	743	758	428	428
query51	4283	4350	4268	4268
query52	110	108	99	99
query53	241	267	188	188
query54	494	514	411	411
query55	79	80	81	80
query56	287	275	262	262
query57	1131	1160	1136	1136
query58	246	245	240	240
query59	2779	2877	2796	2796
query60	290	290	279	279
query61	123	122	140	122
query62	757	779	666	666
query63	230	206	194	194
query64	1900	1093	713	713
query65	3223	3146	3171	3146
query66	763	389	302	302
query67	15786	15615	15544	15544
query68	5309	785	509	509
query69	516	298	274	274
query70	1233	1137	1140	1137
query71	448	299	270	270
query72	6002	3682	3752	3682
query73	952	750	350	350
query74	9022	8922	8668	8668
query75	3361	3146	2699	2699
query76	3849	1166	763	763
query77	531	380	275	275
query78	10047	10256	9329	9329
query79	1857	807	596	596
query80	689	555	463	463
query81	497	282	239	239
query82	245	151	120	120
query83	171	166	152	152
query84	297	96	75	75
query85	736	358	306	306
query86	337	292	295	292
query87	4504	4632	4417	4417
query88	2987	2180	2231	2180
query89	404	312	290	290
query90	1888	197	199	197
query91	134	139	107	107
query92	73	59	54	54
query93	2375	992	579	579
query94	667	406	290	290
query95	355	268	260	260
query96	480	549	274	274
query97	2729	2856	2732	2732
query98	228	209	204	204
query99	1341	1404	1323	1323
Total cold run time: 294199 ms
Total hot run time: 190810 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.07 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 97f39752b05aba277bb2269054867979328506df, data reload: false

query1	0.03	0.03	0.06
query2	0.08	0.03	0.03
query3	0.24	0.07	0.06
query4	1.62	0.10	0.10
query5	0.40	0.42	0.39
query6	1.18	0.67	0.66
query7	0.03	0.02	0.01
query8	0.04	0.03	0.03
query9	0.59	0.53	0.51
query10	0.58	0.59	0.58
query11	0.16	0.11	0.10
query12	0.14	0.11	0.11
query13	0.62	0.59	0.60
query14	2.70	2.84	2.84
query15	0.92	0.84	0.85
query16	0.36	0.38	0.38
query17	0.99	1.04	1.05
query18	0.21	0.19	0.19
query19	1.94	1.74	1.98
query20	0.02	0.01	0.01
query21	15.35	0.89	0.54
query22	0.74	1.25	0.70
query23	14.83	1.39	0.60
query24	7.15	1.91	1.20
query25	0.52	0.22	0.18
query26	0.61	0.16	0.14
query27	0.05	0.04	0.05
query28	10.17	0.82	0.44
query29	12.52	3.96	3.27
query30	0.25	0.09	0.06
query31	2.83	0.58	0.39
query32	3.23	0.54	0.45
query33	3.05	3.06	3.02
query34	15.76	5.12	4.52
query35	4.48	4.55	4.50
query36	0.67	0.50	0.47
query37	0.09	0.06	0.06
query38	0.04	0.04	0.03
query39	0.03	0.03	0.03
query40	0.16	0.13	0.12
query41	0.08	0.03	0.03
query42	0.03	0.02	0.02
query43	0.04	0.03	0.02
Total cold run time: 105.53 s
Total hot run time: 31.07 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 42.59% (11167/26218)
Line Coverage: 32.62% (93920/287901)
Region Coverage: 31.79% (48180/151539)
Branch Coverage: 27.71% (24318/87764)
Coverage Report: http://coverage.selectdb-in.cc/coverage/97f39752b05aba277bb2269054867979328506df_97f39752b05aba277bb2269054867979328506df/report/index.html

@doris-robot
Copy link

BE UT Coverage Report


Increment line coverage 11.84% (49/414) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 44.75% (11941/26681)
Line Coverage 34.20% (99910/292111)
Region Coverage 33.36% (51140/153304)
Branch Coverage 28.91% (25679/88822)

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 27, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@hubgeter hubgeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit aed3e84 into apache:master Mar 3, 2025
28 of 30 checks passed
morningman pushed a commit that referenced this pull request Apr 8, 2025
…49718)

### What problem does this PR solve?

Related PR: #45966

### Release note
[opt] (orc-reader) Turn on late materialization of orc complex types.

After implementing the new merge io function in #45966 to adapt the
complex type delayed materialization and the need to backtrack to solve
the reading characteristics, turn on the late materialization of orc
complex types in orc reader.
morningman pushed a commit that referenced this pull request May 24, 2025
… of orc-reader. (#51102)

### What problem does this PR solve?

Related PR: #45966

Fix merge range not sorted in new merge io facility of orc-reader.
Because the ranges taken from std::unordered_map<orc::StreamId, io::PrefetchRange>&ranges are not sorted, merging adjacent ranges will have a very poor effect.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…apache#45966)

### What problem does this PR solve?

related: apache/doris-thirdparty#270

Problem Summary:

The original merge io mechanism `MergeRangeFileReader` requires that the
range must be read in order, and the ranges can be out of order, so the
range cannot be read back.
And if you turn on delayed materialization of orc complex types, you
will need to present a stream readback scenario, such as `select
struct_element(info, 'age'), id from test_orc_struct, where
struct_element(info, 'name') = 'Alice'`.
When late materialization is turned on, the present stream of the parent
node `info` will be read first after `name` is read. When reading `age`,
the parent node `info` needs to be read back. So the late
materialization of the orc complex type cannot be turned on at present.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…pache#49718)

### What problem does this PR solve?

Related PR: apache#45966

### Release note
[opt] (orc-reader) Turn on late materialization of orc complex types.

After implementing the new merge io function in apache#45966 to adapt the
complex type delayed materialization and the need to backtrack to solve
the reading characteristics, turn on the late materialization of orc
complex types in orc reader.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
… of orc-reader. (apache#51102)

### What problem does this PR solve?

Related PR: apache#45966

Fix merge range not sorted in new merge io facility of orc-reader.
Because the ranges taken from std::unordered_map<orc::StreamId, io::PrefetchRange>&ranges are not sorted, merging adjacent ranges will have a very poor effect.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Jun 20, 2025
…pache#49718)

Related PR: apache#45966

[opt] (orc-reader) Turn on late materialization of orc complex types.

After implementing the new merge io function in apache#45966 to adapt the
complex type delayed materialization and the need to backtrack to solve
the reading characteristics, turn on the late materialization of orc
complex types in orc reader.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Jun 20, 2025
…apache#45966)

related: apache/doris-thirdparty#270

Problem Summary:

The original merge io mechanism `MergeRangeFileReader` requires that the
range must be read in order, and the ranges can be out of order, so the
range cannot be read back.
And if you turn on delayed materialization of orc complex types, you
will need to present a stream readback scenario, such as `select
struct_element(info, 'age'), id from test_orc_struct, where
struct_element(info, 'name') = 'Alice'`.
When late materialization is turned on, the present stream of the parent
node `info` will be read first after `name` is read. When reading `age`,
the parent node `info` needs to be read back. So the late
materialization of the orc complex type cannot be turned on at present.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Jun 20, 2025
…apache#45966)

related: apache/doris-thirdparty#270

Problem Summary:

The original merge io mechanism `MergeRangeFileReader` requires that the
range must be read in order, and the ranges can be out of order, so the
range cannot be read back.
And if you turn on delayed materialization of orc complex types, you
will need to present a stream readback scenario, such as `select
struct_element(info, 'age'), id from test_orc_struct, where
struct_element(info, 'name') = 'Alice'`.
When late materialization is turned on, the present stream of the parent
node `info` will be read first after `name` is read. When reading `age`,
the parent node `info` needs to be read back. So the late
materialization of the orc complex type cannot be turned on at present.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Jun 20, 2025
…apache#45966)

related: apache/doris-thirdparty#270

Problem Summary:

The original merge io mechanism `MergeRangeFileReader` requires that the
range must be read in order, and the ranges can be out of order, so the
range cannot be read back.
And if you turn on delayed materialization of orc complex types, you
will need to present a stream readback scenario, such as `select
struct_element(info, 'age'), id from test_orc_struct, where
struct_element(info, 'name') = 'Alice'`.
When late materialization is turned on, the present stream of the parent
node `info` will be read first after `name` is read. When reading `age`,
the parent node `info` needs to be read back. So the late
materialization of the orc complex type cannot be turned on at present.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Jun 24, 2025
…apache#45966)

related: apache/doris-thirdparty#270

Problem Summary:

The original merge io mechanism `MergeRangeFileReader` requires that the
range must be read in order, and the ranges can be out of order, so the
range cannot be read back.
And if you turn on delayed materialization of orc complex types, you
will need to present a stream readback scenario, such as `select
struct_element(info, 'age'), id from test_orc_struct, where
struct_element(info, 'name') = 'Alice'`.
When late materialization is turned on, the present stream of the parent
node `info` will be read first after `name` is read. When reading `age`,
the parent node `info` needs to be read back. So the late
materialization of the orc complex type cannot be turned on at present.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Jun 24, 2025
… of orc-reader. (apache#51102)

### What problem does this PR solve?

Related PR: apache#45966

Fix merge range not sorted in new merge io facility of orc-reader.
Because the ranges taken from std::unordered_map<orc::StreamId, io::PrefetchRange>&ranges are not sorted, merging adjacent ranges will have a very poor effect.
morrySnow pushed a commit that referenced this pull request Jun 25, 2025
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Jun 25, 2025
…pache#49718)

Related PR: apache#45966

[opt] (orc-reader) Turn on late materialization of orc complex types.

After implementing the new merge io function in apache#45966 to adapt the
complex type delayed materialization and the need to backtrack to solve
the reading characteristics, turn on the late materialization of orc
complex types in orc reader.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.x-experimental dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants