Skip to content

Conversation

@924060929
Copy link
Contributor

@924060929 924060929 commented Aug 30, 2024

Proposed changes

improve lots of values in insert into values statement by bypass NereidsPlanner

the main logic is

  1. InsertUtils.normalizePlan use FoldConstantRuleOnFE to reduce the expression, e.g. values(date(now())
  2. FastInsertIntoValuesPlanner skip most of rules to analyze and rewrite LogicalInlineTable to LogicalUnion or LogicalOneRowRelation
  3. fast parse date time string without date format
  4. getHintMap and normal lexer share the same tokens
  5. set enable_fast_analyze_into_values=false can force to execute all optimize rules, when we meet some bugs in FastInsertIntoValuesPlanner

test: insert 1000 rows with 1000 columns, the columns contains int, bigint, decimal(26,7), date, datetime, varchar(10 chinese chars)

FastInsertIntoValuesPlanner NereidsPlanner(enable_fast_analyze_into_values=false) Legacy optimizer in 2.1.6 Nereids planner in 2.1.6
16s(bottleneck is antlr's lexer) 32s 16s 80s

If you use FastInsertIntoValuesPlanner with group commit in a transaction, the time can reduce to 12s.

TODO: build a custom lexer. in my hand write lexer test, FastInsertIntoValuesPlanner without group commit can reduce 16s to 12s, but it will take more effort: RegularExpression -> NFA -> DFA -> minimal DFA -> Lexer codegen

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@924060929
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38038 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c221b9226095f726f9751c308f10241dab3aaa85, data reload: false

------ Round 1 ----------------------------------
q1	17640	4840	4305	4305
q2	2018	188	178	178
q3	11669	954	1149	954
q4	10517	749	768	749
q5	7737	2866	2811	2811
q6	228	139	141	139
q7	969	628	610	610
q8	9338	2074	2086	2074
q9	7097	6507	6528	6507
q10	7005	2145	2260	2145
q11	460	240	245	240
q12	406	238	236	236
q13	18291	3037	3054	3037
q14	283	234	230	230
q15	517	495	484	484
q16	599	529	503	503
q17	983	690	709	690
q18	7313	6926	6963	6926
q19	1399	983	1009	983
q20	666	338	332	332
q21	3913	3001	2893	2893
q22	1101	1032	1012	1012
Total cold run time: 110149 ms
Total hot run time: 38038 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4327	4361	4349	4349
q2	386	279	282	279
q3	2936	2661	2691	2661
q4	1991	1681	1676	1676
q5	5677	5715	5736	5715
q6	235	145	143	143
q7	2257	1867	1860	1860
q8	3310	3428	3457	3428
q9	8912	8910	8849	8849
q10	3628	3377	3394	3377
q11	596	520	507	507
q12	810	683	656	656
q13	15351	3332	3453	3332
q14	345	328	299	299
q15	548	504	496	496
q16	643	610	602	602
q17	1899	1560	1560	1560
q18	8712	8413	7960	7960
q19	2619	1593	1589	1589
q20	2178	1990	1910	1910
q21	5724	5513	5527	5513
q22	1132	1054	1025	1025
Total cold run time: 74216 ms
Total hot run time: 57786 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192483 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c221b9226095f726f9751c308f10241dab3aaa85, data reload: false

query1	1282	898	870	870
query2	6359	2041	1922	1922
query3	10604	3992	3919	3919
query4	60222	27296	23342	23342
query5	5485	515	479	479
query6	433	166	162	162
query7	5782	303	302	302
query8	297	216	206	206
query9	8961	2466	2462	2462
query10	488	268	261	261
query11	17233	15030	15251	15030
query12	155	104	100	100
query13	1556	396	379	379
query14	10831	7215	7040	7040
query15	250	175	186	175
query16	7645	456	487	456
query17	1114	629	611	611
query18	2087	309	315	309
query19	301	158	156	156
query20	125	112	109	109
query21	206	108	103	103
query22	4501	4408	4266	4266
query23	34243	33570	33340	33340
query24	5929	2879	2871	2871
query25	548	410	398	398
query26	698	161	162	161
query27	1778	288	281	281
query28	3816	2106	2090	2090
query29	730	427	417	417
query30	237	151	160	151
query31	935	764	767	764
query32	87	55	60	55
query33	486	299	278	278
query34	848	490	482	482
query35	838	724	711	711
query36	1059	941	931	931
query37	149	91	98	91
query38	3977	3868	3908	3868
query39	1471	1420	1399	1399
query40	198	121	122	121
query41	48	47	45	45
query42	116	98	97	97
query43	525	491	475	475
query44	1109	755	754	754
query45	203	165	168	165
query46	1091	788	765	765
query47	1867	1802	1795	1795
query48	371	310	302	302
query49	774	442	439	439
query50	809	424	441	424
query51	7207	7085	6983	6983
query52	102	87	88	87
query53	258	190	193	190
query54	580	467	462	462
query55	83	84	83	83
query56	280	269	271	269
query57	1161	1093	1081	1081
query58	227	243	238	238
query59	3083	2789	2733	2733
query60	304	280	364	280
query61	101	99	102	99
query62	758	654	657	654
query63	214	188	190	188
query64	2855	677	630	630
query65	3185	3117	3161	3117
query66	632	334	351	334
query67	15360	15537	15183	15183
query68	4397	563	571	563
query69	416	275	270	270
query70	1163	1124	1133	1124
query71	357	277	280	277
query72	6604	3954	4059	3954
query73	740	328	336	328
query74	9178	8819	8756	8756
query75	3383	2670	2701	2670
query76	1762	990	965	965
query77	539	324	316	316
query78	10175	9089	9287	9089
query79	1551	545	540	540
query80	1063	538	501	501
query81	563	230	238	230
query82	386	150	148	148
query83	195	144	144	144
query84	270	79	120	79
query85	932	292	302	292
query86	360	304	273	273
query87	4405	4301	4229	4229
query88	3368	2304	2303	2303
query89	395	289	284	284
query90	1942	200	190	190
query91	130	100	101	100
query92	63	50	52	50
query93	1944	547	545	545
query94	799	304	286	286
query95	347	265	261	261
query96	594	263	265	263
query97	3179	3090	3027	3027
query98	233	205	205	205
query99	1553	1281	1289	1281
Total cold run time: 310233 ms
Total hot run time: 192483 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.62 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c221b9226095f726f9751c308f10241dab3aaa85, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.04	0.04
query3	0.23	0.06	0.05
query4	1.66	0.10	0.09
query5	0.50	0.50	0.48
query6	1.12	0.72	0.74
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.55	0.50	0.48
query10	0.55	0.55	0.54
query11	0.15	0.12	0.12
query12	0.15	0.12	0.12
query13	0.61	0.59	0.59
query14	2.15	2.05	2.07
query15	0.90	0.81	0.83
query16	0.37	0.38	0.37
query17	0.98	1.04	0.97
query18	0.21	0.21	0.21
query19	1.91	1.82	1.78
query20	0.01	0.00	0.01
query21	15.40	0.66	0.65
query22	4.33	7.74	1.46
query23	18.29	1.49	1.39
query24	2.08	0.23	0.23
query25	0.15	0.08	0.06
query26	0.27	0.18	0.17
query27	0.08	0.08	0.08
query28	13.17	1.02	1.01
query29	12.62	3.30	3.31
query30	0.24	0.06	0.05
query31	2.86	0.41	0.40
query32	3.24	0.48	0.48
query33	2.99	3.04	2.98
query34	17.29	4.33	4.38
query35	4.46	4.44	4.48
query36	0.66	0.49	0.49
query37	0.18	0.16	0.15
query38	0.15	0.14	0.14
query39	0.04	0.04	0.04
query40	0.15	0.13	0.13
query41	0.10	0.05	0.04
query42	0.06	0.06	0.05
query43	0.05	0.04	0.05
Total cold run time: 111.11 s
Total hot run time: 31.62 s

@924060929 924060929 force-pushed the opt_insert_into_values branch from c221b92 to 1b6854e Compare November 22, 2024 03:42
@924060929
Copy link
Contributor Author

run buildall

5 similar comments
@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929 924060929 marked this pull request as ready for review November 29, 2024 06:38
@924060929 924060929 force-pushed the opt_insert_into_values branch from 2a30e59 to 4fe7f08 Compare December 6, 2024 04:05
@924060929
Copy link
Contributor Author

run buildall

3 similar comments
@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929 924060929 force-pushed the opt_insert_into_values branch from 61d85d6 to 5c8f0a7 Compare December 9, 2024 06:59
@924060929
Copy link
Contributor Author

run buildall

7 similar comments
@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929
Copy link
Contributor Author

run buildall

@924060929 924060929 force-pushed the opt_insert_into_values branch from 13e1f0f to bee14a8 Compare December 10, 2024 03:06
@924060929
Copy link
Contributor Author

run buildall

1 similar comment
@924060929
Copy link
Contributor Author

run buildall

@924060929 924060929 deleted the opt_insert_into_values branch March 6, 2025 06:50
yiguolei pushed a commit that referenced this pull request Mar 11, 2025
…ion (#48780)

### What problem does this PR solve?

when nereids cast invalid date literal to date like type, it will throws
exceptions:

```
select '' = cast('2020-10-20' as date);
(1105, 'errCode = 2, detailMessage = date/datetime literal [] is invalid')
```

But old planner will not throw exceptions, so let neredis don't throw
exceptions too.

This PR is pick code from: #40202

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [x] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
morrySnow pushed a commit that referenced this pull request Mar 14, 2025
### What problem does this PR solve?

when parse date literal failed, no throw DateTimeException, all throw
AnalysisException. and for cast date literal met Exception, will skip
parsing it, and give it to be for processing.

Relate PR:  #40202
deardeng pushed a commit to deardeng/incubator-doris that referenced this pull request Apr 30, 2025
)

when nereids cast invalid date literal to date like type, it will throws
exceptions:

```
select '' = cast('2020-10-20' as date);
(1105, 'errCode = 2, detailMessage = date/datetime literal [] is invalid')
```

But old planner will not throw exceptions, so let neredis don't throw
exceptions too.

This PR is pick code from: apache#40202


Co-Authored-By: yujun <yu.jun.reach@gmail.com>
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
)

### What problem does this PR solve?

when parse date literal failed, no throw DateTimeException, all throw
AnalysisException. and for cast date literal met Exception, will skip
parsing it, and give it to be for processing.

Relate PR:  apache#40202
924060929 added a commit to 924060929/incubator-doris that referenced this pull request Jun 19, 2025
… statement (apache#40202)

improve lots of values in `insert into values` statement by bypass NereidsPlanner

the main logic is
1. `InsertUtils.normalizePlan` use `FoldConstantRuleOnFE` to reduce the
expression, e.g. `values(date(now())`
2. `FastInsertIntoValuesPlanner` skip most of rules to analyze and
rewrite `LogicalInlineTable` to `LogicalUnion` or
`LogicalOneRowRelation`
3. fast parse date time string without date format
4. getHintMap and normal lexer share the same tokens
5. `set enable_fast_analyze_into_values=false` can force to execute all
optimize rules, when we meet some bugs in `FastInsertIntoValuesPlanner`

test: insert 1000 rows with 1000 columns, the columns contains int,
bigint, decimal(26,7), date, datetime, varchar(10 chinese chars)

+---------------------------------+------------------------------------------------------+--------------------------+--------------------------+
|FastInsertIntoValuesPlanner      |NereidsPlanner(enable_fast_analyze_into_values=false) |Legacy optimizer in 2.1.6 | Nereids planner in 2.1.6 |
+---------------------------------+------------------------------------------------------+--------------------------+--------------------------+
|16s(bottleneck is antlr's lexer) |32s                                                   |16s                       |80s                       |
+---------------------------------+------------------------------------------------------+--------------------------+--------------------------+

If you use FastInsertIntoValuesPlanner with group commit in a
transaction, the time can reduce to 12s.

TODO: build a custom lexer. in my hand write lexer test,
FastInsertIntoValuesPlanner without group commit can reduce 16s to 12s,
but it will take more effort: RegularExpression -> NFA -> DFA -> minimal
DFA -> Lexer codegen

(cherry picked from commit 81f3c48)
morrySnow pushed a commit that referenced this pull request Jun 20, 2025
924060929 added a commit that referenced this pull request Jun 20, 2025
…ids.trees.expressions.NamedExpression (#51925)

fix error message `Cast cannot be cast to class
org.apache.doris.nereids.trees.expressions.NamedExpression`, introduced
by #40202

the right error message should be `Unknown column 'xxx' in 'table list'
in UNBOUND_OLAP_TABLE_SINK clause`
yiguolei pushed a commit that referenced this pull request Jul 3, 2025
…n collect hint map (#52627)

cherry pick part of code from
pr: #40202
commitId: 81f3c48 


### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
924060929 added a commit that referenced this pull request Jul 10, 2025
…nbound object' and 'Insert has filtered data in strict mode' exception (#52802)

1. fix `Invalid call to sql on unbound object` when use `interval`,
introduced by #40202
```sql
CREATE TABLE `test_insert_cast_interval` (
  `id` int NULL,
  `dt` date NULL
) ENGINE=OLAP
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
INSERT INTO test_insert_cast_interval values(1, date_floor('2020-02-02', interval 1 second));

(1105, 'errCode = 2, detailMessage = Invalid call to sql on unbound object')
```

2. fix `Insert has filtered data in strict mode`, introduced by #49116
```sql
CREATE TABLE `test_insert_more_string` (
  `r_regionkey` int NULL,
  `r_name` varchar(25) NULL,
  `r_comment` varchar(152) NULL
) ENGINE=OLAP
DISTRIBUTED BY HASH(`r_regionkey`) BUCKETS 1
PROPERTIES ( "replication_allocation" = "tag.location.default: 1");

insert into test_insert_more_string values (3, "akljalkjbalkjsldkrjewokjfalksdjflaksjfdlaskjfalsdkfjalsdfjkasfdl", "aa");

(1105, 'errCode = 2, detailMessage = Insert has filtered data in strict mode')
```
924060929 added a commit to 924060929/incubator-doris that referenced this pull request Jul 10, 2025
…nbound object' and 'Insert has filtered data in strict mode' exception (apache#52802)

1. fix `Invalid call to sql on unbound object` when use `interval`,
introduced by apache#40202
```sql
CREATE TABLE `test_insert_cast_interval` (
  `id` int NULL,
  `dt` date NULL
) ENGINE=OLAP
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
INSERT INTO test_insert_cast_interval values(1, date_floor('2020-02-02', interval 1 second));

(1105, 'errCode = 2, detailMessage = Invalid call to sql on unbound object')
```

2. fix `Insert has filtered data in strict mode`, introduced by apache#49116
```sql
CREATE TABLE `test_insert_more_string` (
  `r_regionkey` int NULL,
  `r_name` varchar(25) NULL,
  `r_comment` varchar(152) NULL
) ENGINE=OLAP
DISTRIBUTED BY HASH(`r_regionkey`) BUCKETS 1
PROPERTIES ( "replication_allocation" = "tag.location.default: 1");

insert into test_insert_more_string values (3, "akljalkjbalkjsldkrjewokjfalksdjflaksjfdlaskjfalsdkfjalsdfjkasfdl", "aa");

(1105, 'errCode = 2, detailMessage = Insert has filtered data in strict mode')
```

(cherry picked from commit 2c01f69)
924060929 added a commit to 924060929/incubator-doris that referenced this pull request Jul 14, 2025
924060929 added a commit to 924060929/incubator-doris that referenced this pull request Jul 14, 2025
…nbound object' and 'Insert has filtered data in strict mode' exception (apache#52802)

1. fix `Invalid call to sql on unbound object` when use `interval`,
introduced by apache#40202
```sql
CREATE TABLE `test_insert_cast_interval` (
  `id` int NULL,
  `dt` date NULL
) ENGINE=OLAP
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
INSERT INTO test_insert_cast_interval values(1, date_floor('2020-02-02', interval 1 second));

(1105, 'errCode = 2, detailMessage = Invalid call to sql on unbound object')
```

2. fix `Insert has filtered data in strict mode`, introduced by apache#49116
```sql
CREATE TABLE `test_insert_more_string` (
  `r_regionkey` int NULL,
  `r_name` varchar(25) NULL,
  `r_comment` varchar(152) NULL
) ENGINE=OLAP
DISTRIBUTED BY HASH(`r_regionkey`) BUCKETS 1
PROPERTIES ( "replication_allocation" = "tag.location.default: 1");

insert into test_insert_more_string values (3, "akljalkjbalkjsldkrjewokjfalksdjflaksjfdlaskjfalsdkfjalsdfjkasfdl", "aa");

(1105, 'errCode = 2, detailMessage = Insert has filtered data in strict mode')
```

(cherry picked from commit 2c01f69)
dataroaring pushed a commit that referenced this pull request Aug 12, 2025
…n collect hint map (#52629)

cherry-pick part code from #40202
pr: #40202
commitId: 81f3c48 

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.8-merged dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants