Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[performance](Planner): optimize getStringValue() in DateLiteral #27363

Merged
merged 2 commits into from
Nov 22, 2023

Conversation

jackwener
Copy link
Member

@jackwener jackwener commented Nov 21, 2023

Proposed changes

  • reduce cost of getStringValue()
  • original code don't consider microsecond part in getStringValue()

run 1000000 times getStringValue()
original: 6789ms
now: 2353ms

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@jackwener
Copy link
Member Author

run buildall

1 similar comment
@jackwener
Copy link
Member Author

run buildall

@jackwener
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 66e6bff6d79370adc44801e9e93e8a442980741f, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4945	4648	4649	4648
q2	354	151	159	151
q3	2019	1919	1932	1919
q4	1378	1245	1239	1239
q5	3947	3969	4040	3969
q6	245	127	130	127
q7	1428	900	877	877
q8	2746	2774	2763	2763
q9	9720	9722	9577	9577
q10	3459	3519	3528	3519
q11	373	246	242	242
q12	449	297	297	297
q13	4562	3820	3821	3820
q14	319	299	278	278
q15	590	536	532	532
q16	663	587	596	587
q17	1141	960	922	922
q18	7825	7204	7349	7204
q19	1671	1659	1679	1659
q20	551	318	316	316
q21	4392	3951	3975	3951
q22	481	372	372	372
Total cold run time: 53258 ms
Total hot run time: 48969 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4627	4613	4579	4579
q2	343	236	252	236
q3	4012	3991	3989	3989
q4	2713	2688	2711	2688
q5	9793	9815	9754	9754
q6	245	125	128	125
q7	3030	2506	2473	2473
q8	4426	4441	4500	4441
q9	13218	13168	13113	13113
q10	4108	4196	4198	4196
q11	808	663	666	663
q12	983	820	825	820
q13	4312	3599	3563	3563
q14	388	356	365	356
q15	581	518	529	518
q16	743	664	665	664
q17	3857	3860	3912	3860
q18	9459	8950	9015	8950
q19	1773	1773	1758	1758
q20	2384	2075	2059	2059
q21	8918	8545	8928	8545
q22	904	778	760	760
Total cold run time: 81625 ms
Total hot run time: 78110 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.48 seconds
stream load tsv: 577 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.5 seconds inserted 10000000 Rows, about 350K ops/s
storage size: 17101502368 Bytes

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.44 seconds
stream load tsv: 576 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17097919748 Bytes

@jackwener
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit d56bd17c358a10fa54b624f155540bd71f73f500, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4886	4666	4670	4666
q2	371	145	174	145
q3	2037	1928	1834	1834
q4	1379	1265	1248	1248
q5	3973	3960	3978	3960
q6	242	131	132	131
q7	1423	868	879	868
q8	2750	2772	2753	2753
q9	9767	9598	9577	9577
q10	3449	3543	3542	3542
q11	384	245	242	242
q12	432	287	287	287
q13	4575	3786	3819	3786
q14	325	280	289	280
q15	582	547	530	530
q16	661	585	587	585
q17	1137	940	965	940
q18	7790	7453	7350	7350
q19	1650	1668	1644	1644
q20	526	290	304	290
q21	4370	3946	3940	3940
q22	476	389	371	371
Total cold run time: 53185 ms
Total hot run time: 48969 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4591	4571	4538	4538
q2	339	225	278	225
q3	4001	3992	3996	3992
q4	2686	2682	2744	2682
q5	9697	9726	9721	9721
q6	238	122	124	122
q7	3006	2463	2491	2463
q8	4421	4493	4494	4493
q9	13214	13171	13142	13142
q10	4120	4194	4191	4191
q11	821	643	673	643
q12	980	815	796	796
q13	4272	3567	3519	3519
q14	381	354	342	342
q15	581	533	523	523
q16	740	685	673	673
q17	3926	3918	3948	3918
q18	9542	9057	9056	9056
q19	1809	1758	1766	1758
q20	2384	2081	2050	2050
q21	8694	8451	8617	8451
q22	888	818	807	807
Total cold run time: 81331 ms
Total hot run time: 78105 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.03 seconds
stream load tsv: 578 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17099616652 Bytes

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 22, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 044a295 into apache:master Nov 22, 2023
28 of 30 checks passed
jackwener added a commit to jackwener/doris that referenced this pull request Nov 23, 2023
…che#27363)

- reduce cost of `getStringValue()`
- original code don't consider `microsecond` part in `getStringValue()`

(cherry picked from commit 044a295)
jackwener added a commit that referenced this pull request Nov 23, 2023
) (#27470)

- reduce cost of `getStringValue()`
- original code don't consider `microsecond` part in `getStringValue()`

(cherry picked from commit 044a295)
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Nov 27, 2023
…che#27363) (apache#27470)

- reduce cost of `getStringValue()`
- original code don't consider `microsecond` part in `getStringValue()`

(cherry picked from commit 044a295)
eldenmoon added a commit that referenced this pull request Nov 27, 2023
* [fix](stats) Fix update rows for unique table didn't get updated properly #26968 (#27337)

* [FIX](jsonb) fix jsonb in predict column #27325 (#27424)

* [fix](fe) slots in having clause should be set to need materialized(#27412) (#27429)

* [Bug](insert)fix insert wrong data on mv when stmt have multiple values (#27297) (#27382)

fix insert wrong data on mv when stmt have multiple values

* [fix](fe ut) Fix OlapQueryCacheTest failed (#27305) (#27406)

1.
```
java.lang.NullPointerException: null
        at org.apache.doris.catalog.Env.getCurrentSystemInfo(Env.java:793) ~[classes/:?]
        at org.apache.doris.qe.SimpleScheduler$UpdateBlacklistThread.run(SimpleScheduler.java:206) ~[classes/:?]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]

java.lang.NullPointerException
        at org.apache.doris.qe.OlapQueryCacheTest.setUp(OlapQueryCacheTest.java:226)
```

2.
```
[ERROR] testSqlCacheKeyWithNestedViewForNereids  Time elapsed: 1.962 s  <<< FAILURE!
java.lang.AssertionError: SELECT command denied to user 'testCluster:testUser'@'192.168.1.1' for table 'internal: testCluster:testDb: appevent'
	at org.apache.doris.qe.OlapQueryCacheTest.parseSqlByNereids(OlapQueryCacheTest.java:579)
	at org.apache.doris.qe.OlapQueryCacheTest.testSqlCacheKeyWithNestedViewForNereids(OlapQueryCacheTest.java:1338)
```

3.
```
[ERROR] Tests run: 28, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 113.63 s <<< FAILURE! - in org.apache.doris.qe.OlapQueryCacheTest
[ERROR] testCacheModeTable  Time elapsed: 1.657 s  <<< ERROR!
java.lang.IllegalArgumentException: Value of type org.apache.doris.qe.QueryState incompatible with return type org.apache.doris.system.SystemInfoService of org.apache.doris.catalog.Env#getCurrentSystemInfo()
        at org.apache.doris.qe.OlapQueryCacheTest.setUp(OlapQueryCacheTest.java:156)
```

* [regression test](schema change) add some schema change regression cases (#27112) (#27418)

* [fix](Nereids) result type of add precision is 1 more than expected (#27136) (#27426)

* [fix](Nereids): fill miss slot in having subquery (#27177) (#27394)

* [fix](memory) Fix make_top_consumption_snapshots heap-use-after-free #27434 (#27465)

* [fix](function) make TIMESTAMP function DEPEND_ON_ARGUMENT (#27343) (#27458)

* [fix](test) order by clause in test_map(#27390) (#27391)

pick #27390

* [performance](Planner): optimize getStringValue() in DateLiteral (#27363) (#27470)

- reduce cost of `getStringValue()`
- original code don't consider `microsecond` part in `getStringValue()`

(cherry picked from commit 044a295)

* [Chore](pick) do not push down agg on aggregate column (#27356) (#27498)

* [fix](stats) table not exists error msg not print objects name #27074 (#27463)

* [improve](nereids) support agg function of count(const value) pushdown #26677 (#27499)

support sql: select count(1)-count(not null) from table, the agg of count could push down.

* [test](fe-ut) fix unstable MysqlServerTest (#27459)

Need to find a unbind port for MysqlServerTest

* [opt](MergedIO) no need to merge large columns (#27315) (#27497)

1. Fix a profile bug of `MergeRangeFileReader`, and add a profile `ApplyBytes` to show the total bytes  of ranges.
2. There's no need to merge large columns, because `MergeRangeFileReader` will increase the copy time.

* [improvement](drop tablet)  impr gc shutdown tablet lock (#26151) (#27478)

* [doc](stats) SQL manual for stats (#27461)

* [chore](merge-on-write) disable rowid conversion check for mow table by default (#27482) (#27508)

* [fix](regression)Fix hive p2 case (#27466) (#27511)

* [fix](statistics)Fix auto analyze remove finished job bug #27486 (#27510)

* [Bug](bitmap) Fix heap-use-after-free in the bitmap functions #27411 (#27521)

* [Pick](nereids) Pick: partition prune fails in case of NOT expression (#27047) (#27507)

* [fix](clone) Fix engine_clone file exist (#27361) (#27536)

* [chore](case) adjust timeout of broker load case #27540

* Fix auto analyze doesn't filter unsupported type bug. (#27547)

Fix auto analyze doesn't filter unsupported type bug.
Catch throwable in auto analyze thread for each database, otherwise the thread will quit when one database failed to create jobs and all other databases will not get analyzed.
change FE config item full_auto_analyze_simultaneously_running_task_num to auto_analyze_simultaneously_running_task_num
backport #27559

* [chore](fe plugin) Upgrade dependency to doris 2.0-SNAPSHOT #27522 (#27558)

* [Bug](materialized-view) add limitation for duplicate expr on materialized view (#27523) (#27562)

* [fix](planner)join node should output required slot from parent node #27526 (#27551)

* [branch-2.0](hive) enable hive view by default (#27550)

* [pick](nereids) adjust bc join and shuffle join #27113 (#27566)

* [Fix](hive-transactional-table) Fix NPE when query empty hive transactional table. (#27567)

---------

Co-authored-by: AKIRA <33112463+Kikyou1997@users.noreply.github.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: Pxl <pxl290@qq.com>
Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com>
Co-authored-by: Luwei <814383175@qq.com>
Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>
Co-authored-by: 谢健 <jianxie0@gmail.com>
Co-authored-by: Mryange <59914473+Mryange@users.noreply.github.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>
Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
Co-authored-by: Mingyu Chen <morningman@163.com>
Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com>
Co-authored-by: yujun <yu.jun.reach@gmail.com>
Co-authored-by: Xin Liao <liaoxinbit@126.com>
Co-authored-by: Jibing-Li <64681310+Jibing-Li@users.noreply.github.com>
Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>
Co-authored-by: minghong <englefly@gmail.com>
Co-authored-by: Jack Drogon <jack.xsuperman@gmail.com>
Co-authored-by: Dongyang Li <hello_stephen@qq.com>
Co-authored-by: zhiqiang <seuhezhiqiang@163.com>
Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com>
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
seawinde pushed a commit to seawinde/doris that referenced this pull request Nov 28, 2023
…che#27363)

- reduce cost of `getStringValue()` 
- original code don't consider `microsecond` part in `getStringValue()`
gnehil pushed a commit to gnehil/doris that referenced this pull request Dec 4, 2023
…che#27363) (apache#27470)

- reduce cost of `getStringValue()`
- original code don't consider `microsecond` part in `getStringValue()`

(cherry picked from commit 044a295)
yiguolei pushed a commit to yiguolei/incubator-doris that referenced this pull request Dec 12, 2023
…che#27363) (apache#27470)

- reduce cost of `getStringValue()`
- original code don't consider `microsecond` part in `getStringValue()`

(cherry picked from commit 044a295)
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…che#27363)

- reduce cost of `getStringValue()` 
- original code don't consider `microsecond` part in `getStringValue()`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.3-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants