[opt](MergedIO) no need to merge large columns #27315

AshinGau · 2023-11-21T02:06:30Z

Proposed changes

Fix a profile bug of MergeRangeFileReader, and add a profile ApplyBytes to show the total bytes of ranges.
There's no need to merge large columns, because MergeRangeFileReader will increase the copy time.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

github-actions · 2023-11-21T02:13:04Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2023-11-21T02:28:00Z

clang-tidy review says "All clean, LGTM! 👍"

AshinGau · 2023-11-21T07:32:46Z

run buildall

doris-robot · 2023-11-21T08:18:20Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.64 seconds
stream load tsv: 567 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17099465012 Bytes

doris-robot · 2023-11-21T08:36:22Z

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 6f8778a3deff9805bbbf1f80bbc0515d4b48f4d6, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4927	4677	4679	4677
q2	372	182	158	158
q3	2045	1905	1856	1856
q4	1386	1265	1236	1236
q5	3947	3937	3980	3937
q6	247	127	130	127
q7	1410	885	888	885
q8	2731	2766	2751	2751
q9	58368	11344	9606	9606
q10	13174	3516	3530	3516
q11	375	242	238	238
q12	1682	290	290	290
q13	21856	3829	3738	3738
q14	320	294	286	286
q15	580	532	530	530
q16	664	582	587	582
q17	1153	978	940	940
q18	7734	7336	7458	7336
q19	1723	1681	1679	1679
q20	534	300	305	300
q21	7509	4014	3969	3969
q22	477	375	380	375
Total cold run time: 133214 ms
Total hot run time: 49012 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4642	4610	4580	4580
q2	335	211	249	211
q3	4000	3982	3965	3965
q4	2683	2677	2684	2677
q5	9604	9571	9568	9568
q6	242	122	117	117
q7	2993	2493	2479	2479
q8	4436	4464	4466	4464
q9	13151	13061	12945	12945
q10	4085	4196	4198	4196
q11	786	660	661	660
q12	973	805	826	805
q13	4287	3542	3593	3542
q14	406	348	350	348
q15	575	506	523	506
q16	716	665	669	665
q17	3972	3819	3874	3819
q18	9579	9091	9081	9081
q19	1815	1790	1752	1752
q20	2380	2036	2015	2015
q21	8766	8466	8408	8408
q22	894	820	780	780
Total cold run time: 81320 ms
Total hot run time: 77583 ms

morningman

LGTM

github-actions · 2023-11-22T01:41:19Z

PR approved by at least one committer and no changes requested.

github-actions · 2023-11-22T01:41:21Z

PR approved by anyone and no changes requested.

github-actions · 2023-11-23T03:32:30Z

clang-tidy review says "All clean, LGTM! 👍"

AshinGau · 2023-11-23T04:51:04Z

run buildall

doris-robot · 2023-11-23T05:26:35Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.24 seconds
stream load tsv: 582 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.3 seconds inserted 10000000 Rows, about 353K ops/s
storage size: 17100896244 Bytes

doris-robot · 2023-11-23T05:43:25Z

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 5d7af7132a268a392f25a0c077e28876982535f1, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4904	4623	4648	4623
q2	360	143	168	143
q3	2038	1959	1869	1869
q4	1377	1250	1245	1245
q5	3962	3932	4071	3932
q6	255	125	130	125
q7	1418	870	871	870
q8	2798	2781	2773	2773
q9	9668	9536	9457	9457
q10	3463	3534	3562	3534
q11	385	249	240	240
q12	444	294	291	291
q13	4561	3860	3780	3780
q14	313	287	284	284
q15	588	530	522	522
q16	661	598	583	583
q17	1142	975	941	941
q18	7919	7436	7440	7436
q19	1684	1687	1673	1673
q20	519	326	321	321
q21	4432	4039	3946	3946
q22	456	374	376	374
Total cold run time: 53347 ms
Total hot run time: 48962 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4589	4564	4553	4553
q2	335	232	273	232
q3	4025	4003	4003	4003
q4	2704	2688	2703	2688
q5	9677	9643	9759	9643
q6	243	122	124	122
q7	3021	2465	2469	2465
q8	4479	4446	4505	4446
q9	12881	12766	12878	12766
q10	4079	4182	4170	4170
q11	772	657	661	657
q12	982	812	804	804
q13	4274	3554	3623	3554
q14	381	341	347	341
q15	577	516	527	516
q16	742	673	660	660
q17	3878	3818	3887	3818
q18	9578	9076	8888	8888
q19	1797	1785	1777	1777
q20	2393	2052	2056	2052
q21	8793	8747	8593	8593
q22	921	773	798	773
Total cold run time: 81121 ms
Total hot run time: 77521 ms

AshinGau · 2023-11-23T07:05:19Z

run buildall

github-actions · 2023-11-23T07:11:06Z

clang-tidy review says "All clean, LGTM! 👍"

doris-robot · 2023-11-23T08:36:29Z

TeamCity be ut coverage result:
Function Coverage: 36.57% (8448/23100)
Line Coverage: 28.87% (68676/237915)
Region Coverage: 27.82% (35503/127612)
Branch Coverage: 24.56% (18115/73760)
Coverage Report: http://coverage.selectdb-in.cc/coverage/244ffc85b59fd3f41011407e26c2b821c03bea01_244ffc85b59fd3f41011407e26c2b821c03bea01/report/index.html

doris-robot · 2023-11-23T09:06:14Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.42 seconds
stream load tsv: 567 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.4 seconds inserted 10000000 Rows, about 352K ops/s
storage size: 17100699499 Bytes

doris-robot · 2023-11-23T09:11:19Z

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 244ffc85b59fd3f41011407e26c2b821c03bea01, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4930	4671	4678	4671
q2	360	156	159	156
q3	2050	1902	1911	1902
q4	1390	1275	1236	1236
q5	3960	3953	4025	3953
q6	251	133	136	133
q7	1419	874	884	874
q8	2787	2801	2789	2789
q9	9656	9633	9446	9446
q10	3465	3510	3570	3510
q11	377	251	236	236
q12	433	285	294	285
q13	4593	3845	3792	3792
q14	319	286	285	285
q15	588	538	521	521
q16	670	591	584	584
q17	1141	975	931	931
q18	7986	7491	7472	7472
q19	1696	1679	1691	1679
q20	542	290	332	290
q21	4405	3989	4032	3989
q22	470	371	367	367
Total cold run time: 53488 ms
Total hot run time: 49101 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4596	4605	4554	4554
q2	329	229	268	229
q3	4019	3995	3986	3986
q4	2707	2696	2710	2696
q5	9564	9640	9548	9548
q6	246	124	125	124
q7	3016	2487	2476	2476
q8	4474	4489	4484	4484
q9	12866	12730	12852	12730
q10	4072	4175	4170	4170
q11	780	689	652	652
q12	981	816	799	799
q13	4293	3596	3558	3558
q14	380	346	338	338
q15	575	533	529	529
q16	732	673	681	673
q17	3812	3880	3862	3862
q18	9595	9046	9296	9046
q19	1805	1797	1799	1797
q20	2412	2063	2048	2048
q21	8853	8633	8790	8633
q22	862	789	848	789
Total cold run time: 80969 ms
Total hot run time: 77721 ms

morningman

LGTM

github-actions · 2023-11-23T11:07:32Z

PR approved by at least one committer and no changes requested.

kaka11chen

LGTM

kaka11chen · 2023-11-22T05:14:58Z

be/src/vec/exec/format/orc/vorc_reader.cpp

 doris::io::PrefetchRange prefetch_range = {offset, offset + length};
 prefetch_ranges.emplace_back(std::move(prefetch_range));
 }
 offset += length;
 }
- // The underlying page reader will prefetch data in column.
- _file_reader.reset(new io::MergeRangeFileReader(_profile, _inner_reader, prefetch_ranges));
+ size_t num_columns = std::count_if(selected_columns.begin(), selected_columns.end(),


It should not include the first orc struct column.

1. Fix a profile bug of `MergeRangeFileReader`, and add a profile `ApplyBytes` to show the total bytes of ranges. 2. There's no need to merge large columns, because `MergeRangeFileReader` will increase the copy time.

…#27497) 1. Fix a profile bug of `MergeRangeFileReader`, and add a profile `ApplyBytes` to show the total bytes of ranges. 2. There's no need to merge large columns, because `MergeRangeFileReader` will increase the copy time.

* [fix](stats) Fix update rows for unique table didn't get updated properly #26968 (#27337) * [FIX](jsonb) fix jsonb in predict column #27325 (#27424) * [fix](fe) slots in having clause should be set to need materialized(#27412) (#27429) * [Bug](insert)fix insert wrong data on mv when stmt have multiple values (#27297) (#27382) fix insert wrong data on mv when stmt have multiple values * [fix](fe ut) Fix OlapQueryCacheTest failed (#27305) (#27406) 1. ``` java.lang.NullPointerException: null at org.apache.doris.catalog.Env.getCurrentSystemInfo(Env.java:793) ~[classes/:?] at org.apache.doris.qe.SimpleScheduler$UpdateBlacklistThread.run(SimpleScheduler.java:206) ~[classes/:?] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382] java.lang.NullPointerException at org.apache.doris.qe.OlapQueryCacheTest.setUp(OlapQueryCacheTest.java:226) ``` 2. ``` [ERROR] testSqlCacheKeyWithNestedViewForNereids Time elapsed: 1.962 s <<< FAILURE! java.lang.AssertionError: SELECT command denied to user 'testCluster:testUser'@'192.168.1.1' for table 'internal: testCluster:testDb: appevent' at org.apache.doris.qe.OlapQueryCacheTest.parseSqlByNereids(OlapQueryCacheTest.java:579) at org.apache.doris.qe.OlapQueryCacheTest.testSqlCacheKeyWithNestedViewForNereids(OlapQueryCacheTest.java:1338) ``` 3. ``` [ERROR] Tests run: 28, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 113.63 s <<< FAILURE! - in org.apache.doris.qe.OlapQueryCacheTest [ERROR] testCacheModeTable Time elapsed: 1.657 s <<< ERROR! java.lang.IllegalArgumentException: Value of type org.apache.doris.qe.QueryState incompatible with return type org.apache.doris.system.SystemInfoService of org.apache.doris.catalog.Env#getCurrentSystemInfo() at org.apache.doris.qe.OlapQueryCacheTest.setUp(OlapQueryCacheTest.java:156) ``` * [regression test](schema change) add some schema change regression cases (#27112) (#27418) * [fix](Nereids) result type of add precision is 1 more than expected (#27136) (#27426) * [fix](Nereids): fill miss slot in having subquery (#27177) (#27394) * [fix](memory) Fix make_top_consumption_snapshots heap-use-after-free #27434 (#27465) * [fix](function) make TIMESTAMP function DEPEND_ON_ARGUMENT (#27343) (#27458) * [fix](test) order by clause in test_map(#27390) (#27391) pick #27390 * [performance](Planner): optimize getStringValue() in DateLiteral (#27363) (#27470) - reduce cost of `getStringValue()` - original code don't consider `microsecond` part in `getStringValue()` (cherry picked from commit 044a295) * [Chore](pick) do not push down agg on aggregate column (#27356) (#27498) * [fix](stats) table not exists error msg not print objects name #27074 (#27463) * [improve](nereids) support agg function of count(const value) pushdown #26677 (#27499) support sql: select count(1)-count(not null) from table, the agg of count could push down. * [test](fe-ut) fix unstable MysqlServerTest (#27459) Need to find a unbind port for MysqlServerTest * [opt](MergedIO) no need to merge large columns (#27315) (#27497) 1. Fix a profile bug of `MergeRangeFileReader`, and add a profile `ApplyBytes` to show the total bytes of ranges. 2. There's no need to merge large columns, because `MergeRangeFileReader` will increase the copy time. * [improvement](drop tablet) impr gc shutdown tablet lock (#26151) (#27478) * [doc](stats) SQL manual for stats (#27461) * [chore](merge-on-write) disable rowid conversion check for mow table by default (#27482) (#27508) * [fix](regression)Fix hive p2 case (#27466) (#27511) * [fix](statistics)Fix auto analyze remove finished job bug #27486 (#27510) * [Bug](bitmap) Fix heap-use-after-free in the bitmap functions #27411 (#27521) * [Pick](nereids) Pick: partition prune fails in case of NOT expression (#27047) (#27507) * [fix](clone) Fix engine_clone file exist (#27361) (#27536) * [chore](case) adjust timeout of broker load case #27540 * Fix auto analyze doesn't filter unsupported type bug. (#27547) Fix auto analyze doesn't filter unsupported type bug. Catch throwable in auto analyze thread for each database, otherwise the thread will quit when one database failed to create jobs and all other databases will not get analyzed. change FE config item full_auto_analyze_simultaneously_running_task_num to auto_analyze_simultaneously_running_task_num backport #27559 * [chore](fe plugin) Upgrade dependency to doris 2.0-SNAPSHOT #27522 (#27558) * [Bug](materialized-view) add limitation for duplicate expr on materialized view (#27523) (#27562) * [fix](planner)join node should output required slot from parent node #27526 (#27551) * [branch-2.0](hive) enable hive view by default (#27550) * [pick](nereids) adjust bc join and shuffle join #27113 (#27566) * [Fix](hive-transactional-table) Fix NPE when query empty hive transactional table. (#27567) --------- Co-authored-by: AKIRA <33112463+Kikyou1997@users.noreply.github.com> Co-authored-by: amory <wangqiannan@selectdb.com> Co-authored-by: Jerry Hu <mrhhsg@gmail.com> Co-authored-by: Pxl <pxl290@qq.com> Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com> Co-authored-by: Luwei <814383175@qq.com> Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com> Co-authored-by: 谢健 <jianxie0@gmail.com> Co-authored-by: Mryange <59914473+Mryange@users.noreply.github.com> Co-authored-by: jakevin <jakevingoo@gmail.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: Mingyu Chen <morningman@163.com> Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com> Co-authored-by: yujun <yu.jun.reach@gmail.com> Co-authored-by: Xin Liao <liaoxinbit@126.com> Co-authored-by: Jibing-Li <64681310+Jibing-Li@users.noreply.github.com> Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com> Co-authored-by: minghong <englefly@gmail.com> Co-authored-by: Jack Drogon <jack.xsuperman@gmail.com> Co-authored-by: Dongyang Li <hello_stephen@qq.com> Co-authored-by: zhiqiang <seuhezhiqiang@163.com> Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com> Co-authored-by: Qi Chen <kaka11.chen@gmail.com>

1. Fix a profile bug of `MergeRangeFileReader`, and add a profile `ApplyBytes` to show the total bytes of ranges. 2. There's no need to merge large columns, because `MergeRangeFileReader` will increase the copy time.

…#27497) 1. Fix a profile bug of `MergeRangeFileReader`, and add a profile `ApplyBytes` to show the total bytes of ranges. 2. There's no need to merge large columns, because `MergeRangeFileReader` will increase the copy time.

1. Fix a profile bug of `MergeRangeFileReader`, and add a profile `ApplyBytes` to show the total bytes of ranges. 2. There's no need to merge large columns, because `MergeRangeFileReader` will increase the copy time.

AshinGau force-pushed the large_columns branch from 07ac864 to 6f8778a Compare November 21, 2023 02:21

morningman added the dev/2.0.4 label Nov 21, 2023

morningman previously approved these changes Nov 22, 2023

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 22, 2023

github-actions bot added the reviewed label Nov 22, 2023

AshinGau dismissed morningman’s stale review via 5d7af71 November 23, 2023 03:26

AshinGau force-pushed the large_columns branch from 6f8778a to 5d7af71 Compare November 23, 2023 03:26

github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 23, 2023

[opt](MergedIO) no need to merge large columns

244ffc8

AshinGau force-pushed the large_columns branch from 5d7af71 to 244ffc8 Compare November 23, 2023 07:05

morningman approved these changes Nov 23, 2023

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 23, 2023

kaka11chen approved these changes Nov 23, 2023

View reviewed changes

AshinGau merged commit dd65cc1 into apache:master Nov 23, 2023
27 of 29 checks passed

AshinGau mentioned this pull request Nov 23, 2023

[opt](MergedIO) no need to merge large columns (#27315) #27497

Merged

morningman added dev/2.0.3-merged and removed dev/2.0.4 labels Nov 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[opt](MergedIO) no need to merge large columns #27315

[opt](MergedIO) no need to merge large columns #27315

AshinGau commented Nov 21, 2023 •

edited

Loading

github-actions bot commented Nov 21, 2023

github-actions bot commented Nov 21, 2023

AshinGau commented Nov 21, 2023

doris-robot commented Nov 21, 2023

doris-robot commented Nov 21, 2023

morningman left a comment

github-actions bot commented Nov 22, 2023

github-actions bot commented Nov 22, 2023

github-actions bot commented Nov 23, 2023

AshinGau commented Nov 23, 2023

doris-robot commented Nov 23, 2023

doris-robot commented Nov 23, 2023

AshinGau commented Nov 23, 2023

github-actions bot commented Nov 23, 2023

doris-robot commented Nov 23, 2023

doris-robot commented Nov 23, 2023

doris-robot commented Nov 23, 2023

morningman left a comment

github-actions bot commented Nov 23, 2023

kaka11chen left a comment

kaka11chen Nov 22, 2023

[opt](MergedIO) no need to merge large columns #27315

[opt](MergedIO) no need to merge large columns #27315

Conversation

AshinGau commented Nov 21, 2023 • edited Loading

Proposed changes

Further comments

github-actions bot commented Nov 21, 2023

github-actions bot commented Nov 21, 2023

AshinGau commented Nov 21, 2023

doris-robot commented Nov 21, 2023

doris-robot commented Nov 21, 2023

morningman left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 22, 2023

github-actions bot commented Nov 22, 2023

github-actions bot commented Nov 23, 2023

AshinGau commented Nov 23, 2023

doris-robot commented Nov 23, 2023

doris-robot commented Nov 23, 2023

AshinGau commented Nov 23, 2023

github-actions bot commented Nov 23, 2023

doris-robot commented Nov 23, 2023

doris-robot commented Nov 23, 2023

doris-robot commented Nov 23, 2023

morningman left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 23, 2023

kaka11chen left a comment

Choose a reason for hiding this comment

kaka11chen Nov 22, 2023

Choose a reason for hiding this comment

AshinGau commented Nov 21, 2023 •

edited

Loading