-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[enhancement](nereids) improve lots of values in insert into values statement
#40202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
TPC-H: Total hot run time: 38038 ms |
TPC-DS: Total hot run time: 192483 ms |
ClickBench: Total hot run time: 31.62 s |
c221b92 to
1b6854e
Compare
|
run buildall |
5 similar comments
|
run buildall |
|
run buildall |
|
run buildall |
|
run buildall |
|
run buildall |
2a30e59 to
4fe7f08
Compare
|
run buildall |
3 similar comments
|
run buildall |
|
run buildall |
|
run buildall |
61d85d6 to
5c8f0a7
Compare
|
run buildall |
7 similar comments
|
run buildall |
|
run buildall |
|
run buildall |
|
run buildall |
|
run buildall |
|
run buildall |
|
run buildall |
13e1f0f to
bee14a8
Compare
|
run buildall |
1 similar comment
|
run buildall |
…ion (#48780) ### What problem does this PR solve? when nereids cast invalid date literal to date like type, it will throws exceptions: ``` select '' = cast('2020-10-20' as date); (1105, 'errCode = 2, detailMessage = date/datetime literal [] is invalid') ``` But old planner will not throw exceptions, so let neredis don't throw exceptions too. This PR is pick code from: #40202 ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [x] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
### What problem does this PR solve? when parse date literal failed, no throw DateTimeException, all throw AnalysisException. and for cast date literal met Exception, will skip parsing it, and give it to be for processing. Relate PR: #40202
) when nereids cast invalid date literal to date like type, it will throws exceptions: ``` select '' = cast('2020-10-20' as date); (1105, 'errCode = 2, detailMessage = date/datetime literal [] is invalid') ``` But old planner will not throw exceptions, so let neredis don't throw exceptions too. This PR is pick code from: apache#40202 Co-Authored-By: yujun <yu.jun.reach@gmail.com>
) ### What problem does this PR solve? when parse date literal failed, no throw DateTimeException, all throw AnalysisException. and for cast date literal met Exception, will skip parsing it, and give it to be for processing. Relate PR: apache#40202
… statement (apache#40202) improve lots of values in `insert into values` statement by bypass NereidsPlanner the main logic is 1. `InsertUtils.normalizePlan` use `FoldConstantRuleOnFE` to reduce the expression, e.g. `values(date(now())` 2. `FastInsertIntoValuesPlanner` skip most of rules to analyze and rewrite `LogicalInlineTable` to `LogicalUnion` or `LogicalOneRowRelation` 3. fast parse date time string without date format 4. getHintMap and normal lexer share the same tokens 5. `set enable_fast_analyze_into_values=false` can force to execute all optimize rules, when we meet some bugs in `FastInsertIntoValuesPlanner` test: insert 1000 rows with 1000 columns, the columns contains int, bigint, decimal(26,7), date, datetime, varchar(10 chinese chars) +---------------------------------+------------------------------------------------------+--------------------------+--------------------------+ |FastInsertIntoValuesPlanner |NereidsPlanner(enable_fast_analyze_into_values=false) |Legacy optimizer in 2.1.6 | Nereids planner in 2.1.6 | +---------------------------------+------------------------------------------------------+--------------------------+--------------------------+ |16s(bottleneck is antlr's lexer) |32s |16s |80s | +---------------------------------+------------------------------------------------------+--------------------------+--------------------------+ If you use FastInsertIntoValuesPlanner with group commit in a transaction, the time can reduce to 12s. TODO: build a custom lexer. in my hand write lexer test, FastInsertIntoValuesPlanner without group commit can reduce 16s to 12s, but it will take more effort: RegularExpression -> NFA -> DFA -> minimal DFA -> Lexer codegen (cherry picked from commit 81f3c48)
…n collect hint map (#52627) cherry pick part of code from pr: #40202 commitId: 81f3c48 ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…nbound object' and 'Insert has filtered data in strict mode' exception (#52802) 1. fix `Invalid call to sql on unbound object` when use `interval`, introduced by #40202 ```sql CREATE TABLE `test_insert_cast_interval` ( `id` int NULL, `dt` date NULL ) ENGINE=OLAP DISTRIBUTED BY HASH(`id`) BUCKETS 10 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); INSERT INTO test_insert_cast_interval values(1, date_floor('2020-02-02', interval 1 second)); (1105, 'errCode = 2, detailMessage = Invalid call to sql on unbound object') ``` 2. fix `Insert has filtered data in strict mode`, introduced by #49116 ```sql CREATE TABLE `test_insert_more_string` ( `r_regionkey` int NULL, `r_name` varchar(25) NULL, `r_comment` varchar(152) NULL ) ENGINE=OLAP DISTRIBUTED BY HASH(`r_regionkey`) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1"); insert into test_insert_more_string values (3, "akljalkjbalkjsldkrjewokjfalksdjflaksjfdlaskjfalsdkfjalsdfjkasfdl", "aa"); (1105, 'errCode = 2, detailMessage = Insert has filtered data in strict mode') ```
…nbound object' and 'Insert has filtered data in strict mode' exception (apache#52802) 1. fix `Invalid call to sql on unbound object` when use `interval`, introduced by apache#40202 ```sql CREATE TABLE `test_insert_cast_interval` ( `id` int NULL, `dt` date NULL ) ENGINE=OLAP DISTRIBUTED BY HASH(`id`) BUCKETS 10 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); INSERT INTO test_insert_cast_interval values(1, date_floor('2020-02-02', interval 1 second)); (1105, 'errCode = 2, detailMessage = Invalid call to sql on unbound object') ``` 2. fix `Insert has filtered data in strict mode`, introduced by apache#49116 ```sql CREATE TABLE `test_insert_more_string` ( `r_regionkey` int NULL, `r_name` varchar(25) NULL, `r_comment` varchar(152) NULL ) ENGINE=OLAP DISTRIBUTED BY HASH(`r_regionkey`) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1"); insert into test_insert_more_string values (3, "akljalkjbalkjsldkrjewokjfalksdjflaksjfdlaskjfalsdkfjalsdfjkasfdl", "aa"); (1105, 'errCode = 2, detailMessage = Insert has filtered data in strict mode') ``` (cherry picked from commit 2c01f69)
…nto values statement (apache#40202) (apache#51925) cherry pick from apache#40202 and apache#51925
…nbound object' and 'Insert has filtered data in strict mode' exception (apache#52802) 1. fix `Invalid call to sql on unbound object` when use `interval`, introduced by apache#40202 ```sql CREATE TABLE `test_insert_cast_interval` ( `id` int NULL, `dt` date NULL ) ENGINE=OLAP DISTRIBUTED BY HASH(`id`) BUCKETS 10 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); INSERT INTO test_insert_cast_interval values(1, date_floor('2020-02-02', interval 1 second)); (1105, 'errCode = 2, detailMessage = Invalid call to sql on unbound object') ``` 2. fix `Insert has filtered data in strict mode`, introduced by apache#49116 ```sql CREATE TABLE `test_insert_more_string` ( `r_regionkey` int NULL, `r_name` varchar(25) NULL, `r_comment` varchar(152) NULL ) ENGINE=OLAP DISTRIBUTED BY HASH(`r_regionkey`) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1"); insert into test_insert_more_string values (3, "akljalkjbalkjsldkrjewokjfalksdjflaksjfdlaskjfalsdkfjalsdfjkasfdl", "aa"); (1105, 'errCode = 2, detailMessage = Insert has filtered data in strict mode') ``` (cherry picked from commit 2c01f69)
…n collect hint map (#52629) cherry-pick part code from #40202 pr: #40202 commitId: 81f3c48 ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
Proposed changes
improve lots of values in
insert into valuesstatement by bypass NereidsPlannerthe main logic is
InsertUtils.normalizePlanuseFoldConstantRuleOnFEto reduce the expression, e.g.values(date(now())FastInsertIntoValuesPlannerskip most of rules to analyze and rewriteLogicalInlineTabletoLogicalUnionorLogicalOneRowRelationset enable_fast_analyze_into_values=falsecan force to execute all optimize rules, when we meet some bugs inFastInsertIntoValuesPlannertest: insert 1000 rows with 1000 columns, the columns contains int, bigint, decimal(26,7), date, datetime, varchar(10 chinese chars)
If you use FastInsertIntoValuesPlanner with group commit in a transaction, the time can reduce to 12s.
TODO: build a custom lexer. in my hand write lexer test, FastInsertIntoValuesPlanner without group commit can reduce 16s to 12s, but it will take more effort: RegularExpression -> NFA -> DFA -> minimal DFA -> Lexer codegen