Skip to content

Conversation

@zhiqiang-hhhh
Copy link
Contributor

@zhiqiang-hhhh zhiqiang-hhhh commented Jul 3, 2025

What problem does this PR solve?

TL;DR: Introduce virtual slot ref to eliminate redundant computation of common sub-expressions

Problem to solve

Consider the following queries:

select funcC(funcA(colA)), funcB(funcA(colA)) from table;
select funcA(colA) as sub from table where funcB(funcA(colA)) > 0;
select l2_distance(colA, [10]) as distance from table where l2_distance(colA, [10]) > 0

The common characteristic of these SQL statements is that certain expressions appear multiple times in different places—whether in the projection, in predicates, or during index computation (e.g., for ANN index in Q3). These identical repeated expressions are currently computed multiple times, but they could actually be computed just once.

We introduce virtual slot ref to address this issue.

In the storage layer, we implement a VirtualColumnIterator. The behavior of VirtualColumnIterator is identical to other ColumnIterators, except it is not used to read any physical column. Instead, it is dedicated to reading the result of expressions computed from the index (for example, the distance returned by an ANN index, or in the future, the relevance score from a full-text index). Once an expression result is computed via the index, we use VirtualColumnIterator::prepare_materialization to store the data source. If a segment does not have the corresponding index, the data source of the VirtualColumnIterator will be a special ColumnNothing type (this is an important design trick that allows virtual slot ref to elegantly handle the case where a segment does not yet have the index built).

We also modify SegmentIterator. Before processing each block, we first initialize the positions of the virtual slot ref in the block as ColumnNothing. Before actually returning a block, we check whether the virtual slot ref have been materialized; if not, we execute the expressions corresponding to the virtual slot ref (e.g., l2_distance or a score function) to generate the actual virtual slot ref, ensuring that every virtual column in the block returned to the computation layer has been materialized.

For expression evaluation, we introduce VirtualSlotRef, which is essentially SlotRef + FunctionCall. When the expression tree executes a node of this type, it automatically checks whether the corresponding expression has been materialized: if it has, VirtualSlotRef behaves like a SlotRef; if it hasn’t, it behaves like a FunctionCall.

Modification on planner

Here’s an example to better illustrate the execution of virtual slot ref:

select func(colA) from table where func(colA) > 0;

For this SQL, our current ScanNode is:

ScanNode {
    predicates: func(colA[#0]) > 0
    final projection: func(colA[#0])
    final projection tuple id: 1
    tuple_id: 0
}

TupleDesc[id=0]{
    SlotDesc{id=0, col=colA)}
}

TupleDesc[id=1] {
    SlotDesc{id=1, col=null, ..., type=float64)
}

After this pr, our plan will become:

ScanNode {
    predicate: function(colA)[#1] > 0
    final projection: function(colA)[#1]
    final projection tuple id: 1
    tuple_id: 0
}

TupleDesc[id=0]{
    SlotDesc{id=0, col="colA")},
    SlotDesc{id=1, col=virtual_column_1, expr=function1(colA[#0])}
}

TupleDesc[id=1] {
    SlotDesc{id=2, name=virtual_column_1[#1])
}

Note that we added a VirtualSlot in Tuple 0, and other places that originally required computing the expression are transformed to reference this VirtualSlot. In this way, redundant computation of common expressions is eliminated.

Benchmark

disable the plan rules

mysql> set disable_nereids_rules='PUSH_DOWN_VIRTUAL_COLUMNS_INTO_OLAP_SCAN';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT counterid,        Count(*)               AS hit_count,        Count(DISTINCT userid) AS unique_users FROM   hits WHERE  ( Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) = 'GOOGLE.COM'           OR Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) =              'GOOGLE.RU'           OR Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) LIKE              '%GOOGLE%' )        AND ( Length(Regexp_extract(referer, '^https?://([^/]+)', 1)) > 3               OR Regexp_extract(referer, '^https?://([^/]+)', 1) != ''
              OR Regexp_extract(referer, '^https?://([^/]+)', 1) IS NOT NULL )        AND eventdate = '2013-07-15' GROUP  BY counterid HAVING hit_count > 100 ORDER  BY hit_count DESC LIMIT  20;
+-----------+-----------+--------------+
| counterid | hit_count | unique_users |
+-----------+-----------+--------------+
|    105857 |   1919075 |      1412926 |
|    117917 |    200018 |        50285 |
|     99062 |    114384 |        71408 |
|      1634 |     43839 |        14975 |
|        59 |     31328 |         6668 |
|    114157 |     28852 |        19729 |
|        62 |     22549 |        14130 |
|      1483 |      8425 |         5677 |
|        38 |      5436 |         1805 |
|      1060 |      4043 |         2948 |
|     76221 |      2060 |         1325 |
|    128858 |      1690 |          825 |
|    102847 |      1500 |          350 |
|     89761 |      1419 |          274 |
|     92040 |      1180 |          978 |
|      1089 |      1067 |          961 |
|      2004 |       880 |          698 |
|      1213 |       597 |          219 |
|     77729 |       448 |          108 |
|     71099 |       289 |           70 |
+-----------+-----------+--------------+
20 rows in set (1.50 sec)

reopen the rule:

mysql> unset variable disable_nereids_rules;
--------------
unset variable disable_nereids_rules
--------------

Query OK, 0 rows affected (0.00 sec)

mysql> -- 查询 1: 分析从 Google 中获得最多点击的 20 个网站
mysql> SELECT counterid,
    ->        Count(*)               AS hit_count,
    ->        Count(DISTINCT userid) AS unique_users
    -> FROM   hits
    -> WHERE  ( Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) = 'GOOGLE.COM'
    ->           OR Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) =
    ->              'GOOGLE.RU'
    ->           OR Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) LIKE
    ->              '%GOOGLE%' )
    ->        AND ( Length(Regexp_extract(referer, '^https?://([^/]+)', 1)) > 3
    ->               OR Regexp_extract(referer, '^https?://([^/]+)', 1) != ''
    ->               OR Regexp_extract(referer, '^https?://([^/]+)', 1) IS NOT NULL )
    ->        AND eventdate = '2013-07-15'
    -> GROUP  BY counterid
    -> HAVING hit_count > 100
    -> ORDER  BY hit_count DESC
    -> LIMIT  20;
+-----------+-----------+--------------+
| counterid | hit_count | unique_users |
+-----------+-----------+--------------+
|    105857 |   1919075 |      1412926 |
|    117917 |    200018 |        50285 |
|     99062 |    114384 |        71408 |
|      1634 |     43839 |        14975 |
|        59 |     31328 |         6668 |
|    114157 |     28852 |        19729 |
|        62 |     22549 |        14130 |
|      1483 |      8425 |         5677 |
|        38 |      5436 |         1805 |
|      1060 |      4043 |         2948 |
|     76221 |      2060 |         1325 |
|    128858 |      1690 |          825 |
|    102847 |      1500 |          350 |
|     89761 |      1419 |          274 |
|     92040 |      1180 |          978 |
|      1089 |      1067 |          961 |
|      2004 |       880 |          698 |
|      1213 |       597 |          219 |
|     77729 |       448 |          108 |
|     71099 |       289 |           70 |
+-----------+-----------+--------------+
20 rows in set (0.57 sec)

About 300% optimization.

TODO

In the future, we can leverage virtual slot ref to implement more functionalities, including:

  1. ANN Index
  2. Relevance scoring based on full-text indexes
  3. Generated Columns (of NOT ALWAYS type)
  4. Index-Only Scan (which will require modifying SlotRef computation in SegmentIterator to be pull-based)
  5. CSE replace rule on FE is very basic, but enough to use for now. Need a further modification on fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/PushDownVirtualColumnsIntoOlapScanTest.java

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jul 3, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 82.56% (1188/1439)
Line Coverage 67.35% (20731/30779)
Region Coverage 66.98% (10310/15393)
Branch Coverage 56.30% (5388/9570)

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 82.56% (1188/1439)
Line Coverage 67.33% (20725/30779)
Region Coverage 67.02% (10317/15393)
Branch Coverage 56.29% (5387/9570)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 82.56% (1188/1439)
Line Coverage 67.33% (20724/30779)
Region Coverage 66.95% (10306/15393)
Branch Coverage 56.29% (5387/9570)

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 82.56% (1188/1439)
Line Coverage 67.41% (20749/30779)
Region Coverage 67.02% (10316/15393)
Branch Coverage 56.30% (5388/9570)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@zhiqiang-hhhh zhiqiang-hhhh marked this pull request as ready for review July 3, 2025 13:28
@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@zhiqiang-hhhh zhiqiang-hhhh changed the title [feat] Virtual Column [feat] Virtual Slot Ref Jul 3, 2025
@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 82.92% (1214/1464)
Line Coverage 67.42% (20971/31107)
Region Coverage 67.15% (10439/15546)
Branch Coverage 56.51% (5463/9668)

@doris-robot
Copy link

TPC-H: Total hot run time: 33935 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5218a6fe4f3382e54341f8ce171bf8ab4e93b24e, data reload: false

------ Round 1 ----------------------------------
q1	17595	5242	5084	5084
q2	1969	274	177	177
q3	10433	1337	735	735
q4	10252	1031	554	554
q5	7830	2400	2355	2355
q6	178	159	129	129
q7	889	745	598	598
q8	9325	1367	1126	1126
q9	7456	5124	5180	5124
q10	6871	2389	1948	1948
q11	498	293	282	282
q12	337	349	212	212
q13	17771	3688	3060	3060
q14	234	235	218	218
q15	551	484	471	471
q16	432	420	377	377
q17	596	852	351	351
q18	7595	7212	7147	7147
q19	1225	954	584	584
q20	322	336	208	208
q21	3796	3154	2277	2277
q22	1014	996	918	918
Total cold run time: 107169 ms
Total hot run time: 33935 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5171	5101	5159	5101
q2	241	321	218	218
q3	2165	2676	2319	2319
q4	1336	1756	1311	1311
q5	4226	4645	4526	4526
q6	206	170	130	130
q7	2071	1942	1872	1872
q8	2621	2546	2561	2546
q9	7446	7320	7149	7149
q10	3176	3349	2929	2929
q11	565	501	501	501
q12	697	813	639	639
q13	3599	3939	3319	3319
q14	292	324	274	274
q15	521	472	481	472
q16	472	507	456	456
q17	1181	1575	1377	1377
q18	8292	7851	7859	7851
q19	830	875	869	869
q20	1999	1981	1964	1964
q21	4846	4380	4337	4337
q22	1033	997	1000	997
Total cold run time: 52986 ms
Total hot run time: 51157 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184427 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5218a6fe4f3382e54341f8ce171bf8ab4e93b24e, data reload: false

query1	1013	387	397	387
query2	6507	1664	1671	1664
query3	6740	207	205	205
query4	26335	23771	23066	23066
query5	4364	582	439	439
query6	298	220	201	201
query7	4638	506	295	295
query8	256	215	217	215
query9	8607	2681	2685	2681
query10	472	347	265	265
query11	15448	15001	14850	14850
query12	146	104	103	103
query13	1647	531	406	406
query14	8839	5655	5756	5655
query15	201	182	193	182
query16	7191	617	489	489
query17	1179	688	560	560
query18	1983	399	290	290
query19	190	202	154	154
query20	122	119	110	110
query21	209	123	108	108
query22	4034	4233	4186	4186
query23	33820	32878	32846	32846
query24	8437	2376	2400	2376
query25	553	503	421	421
query26	891	273	154	154
query27	2746	524	351	351
query28	4335	2154	2133	2133
query29	713	569	432	432
query30	280	216	193	193
query31	906	840	774	774
query32	68	67	60	60
query33	567	369	308	308
query34	800	839	531	531
query35	800	840	720	720
query36	967	958	875	875
query37	115	100	76	76
query38	4217	4051	4085	4051
query39	1478	1384	1404	1384
query40	215	120	104	104
query41	55	54	51	51
query42	139	111	113	111
query43	478	497	461	461
query44	1352	827	822	822
query45	178	166	160	160
query46	862	1036	639	639
query47	1724	1777	1690	1690
query48	395	427	301	301
query49	699	482	412	412
query50	651	704	441	441
query51	4165	4093	4126	4093
query52	109	109	102	102
query53	236	252	189	189
query54	568	568	497	497
query55	82	80	83	80
query56	305	303	284	284
query57	1164	1162	1121	1121
query58	265	256	283	256
query59	2506	2654	2526	2526
query60	341	313	306	306
query61	125	155	117	117
query62	816	689	653	653
query63	221	190	193	190
query64	3496	1022	638	638
query65	4302	4173	4180	4173
query66	1008	404	316	316
query67	15759	15431	15357	15357
query68	5650	902	534	534
query69	505	303	273	273
query70	1196	1197	1094	1094
query71	415	323	291	291
query72	5600	4803	4551	4551
query73	629	581	357	357
query74	8902	9105	8844	8844
query75	3179	3353	2673	2673
query76	3153	1143	714	714
query77	464	437	291	291
query78	9914	10143	9341	9341
query79	1516	836	591	591
query80	597	533	450	450
query81	492	253	221	221
query82	176	127	98	98
query83	250	260	236	236
query84	253	107	89	89
query85	745	358	310	310
query86	367	302	297	297
query87	4350	4436	4273	4273
query88	2975	2316	2313	2313
query89	372	315	275	275
query90	1816	215	214	214
query91	151	156	128	128
query92	76	64	58	58
query93	1799	964	592	592
query94	702	424	321	321
query95	386	305	298	298
query96	500	588	281	281
query97	2635	2778	2639	2639
query98	238	203	199	199
query99	1324	1373	1295	1295
Total cold run time: 265655 ms
Total hot run time: 184427 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.34 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5218a6fe4f3382e54341f8ce171bf8ab4e93b24e, data reload: false

query1	0.04	0.03	0.04
query2	0.12	0.06	0.06
query3	0.29	0.06	0.06
query4	1.62	0.08	0.08
query5	0.43	0.41	0.40
query6	1.15	0.66	0.66
query7	0.02	0.01	0.02
query8	0.06	0.05	0.05
query9	0.64	0.52	0.52
query10	0.59	0.57	0.57
query11	0.25	0.13	0.13
query12	0.26	0.14	0.14
query13	0.65	0.62	0.62
query14	0.81	0.84	0.83
query15	0.96	0.89	0.89
query16	0.38	0.39	0.38
query17	1.09	1.10	1.10
query18	0.24	0.23	0.24
query19	2.06	1.95	1.91
query20	0.01	0.01	0.02
query21	15.38	0.95	0.65
query22	0.94	1.05	0.83
query23	14.70	1.56	0.75
query24	5.01	0.60	0.30
query25	0.17	0.09	0.09
query26	0.56	0.23	0.19
query27	0.09	0.09	0.09
query28	11.12	1.20	0.58
query29	12.57	4.05	3.38
query30	0.28	0.08	0.06
query31	2.85	0.66	0.44
query32	3.23	0.60	0.50
query33	3.19	3.09	3.09
query34	16.90	5.41	4.71
query35	4.75	4.84	4.83
query36	0.65	0.52	0.49
query37	0.19	0.18	0.17
query38	0.18	0.16	0.15
query39	0.05	0.04	0.05
query40	0.18	0.16	0.19
query41	0.10	0.06	0.06
query42	0.07	0.05	0.06
query43	0.06	0.05	0.05
Total cold run time: 104.89 s
Total hot run time: 30.34 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 27.09% (146/539) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.11% (15408/26981)
Line Coverage 46.14% (139759/302912)
Region Coverage 45.44% (70829/155870)
Branch Coverage 40.22% (37367/92910)

@zhiqiang-hhhh zhiqiang-hhhh force-pushed the feat-virtual-column branch from e61396d to 25ebe62 Compare July 7, 2025 14:36
@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 83.02% (1222/1472)
Line Coverage 67.52% (21170/31355)
Region Coverage 67.26% (10542/15673)
Branch Coverage 56.61% (5547/9798)

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 29.32% (163/556) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.28% (15509/27074)
Line Coverage 46.26% (140988/304749)
Region Coverage 45.51% (71277/156631)
Branch Coverage 40.24% (37557/93332)

@doris-robot
Copy link

ClickBench: Total hot run time: 32.96 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit af5fc37ffc558c01d31ea8935f747c53d1ed8220, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.04	0.05
query3	0.25	0.07	0.08
query4	1.61	0.11	0.11
query5	0.43	0.45	0.43
query6	1.17	0.70	0.69
query7	0.02	0.02	0.01
query8	0.05	0.03	0.03
query9	0.55	0.48	0.47
query10	0.54	0.53	0.53
query11	0.15	0.10	0.11
query12	0.15	0.11	0.12
query13	0.64	0.65	0.64
query14	0.93	1.26	1.01
query15	0.92	0.91	0.92
query16	0.39	0.40	0.39
query17	1.10	1.11	1.08
query18	0.22	0.20	0.21
query19	2.00	1.93	1.85
query20	0.02	0.01	0.01
query21	15.37	0.86	0.55
query22	0.78	1.08	0.82
query23	14.83	1.14	0.61
query24	6.46	2.04	0.53
query25	0.51	0.13	0.18
query26	0.68	0.16	0.13
query27	0.06	0.06	0.06
query28	9.37	0.84	0.44
query29	12.60	3.83	3.34
query30	3.02	2.96	2.93
query31	2.81	0.56	0.39
query32	3.24	0.56	0.49
query33	3.05	3.22	3.27
query34	15.99	5.33	4.90
query35	4.87	5.04	4.87
query36	0.69	0.51	0.50
query37	0.10	0.08	0.07
query38	0.05	0.06	0.04
query39	0.04	0.02	0.03
query40	0.17	0.14	0.13
query41	0.07	0.02	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.04
Total cold run time: 106.11 s
Total hot run time: 32.96 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 40.13% (256/638) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.58% (15964/27724)
Line Coverage 46.34% (143555/309804)
Region Coverage 35.72% (108162/302775)
Branch Coverage 38.30% (47768/124727)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 81.70% (518/634) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.09% (22069/27214)
Line Coverage 73.70% (228016/309376)
Region Coverage 61.36% (190548/310523)
Branch Coverage 65.10% (82110/126135)

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 28, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@airborne12 airborne12 merged commit 08a9dbb into apache:master Jul 28, 2025
27 of 28 checks passed
@zhiqiang-hhhh zhiqiang-hhhh deleted the feat-virtual-column branch July 28, 2025 02:40
w41ter pushed a commit to w41ter/incubator-doris that referenced this pull request Jul 30, 2025
### What problem does this PR solve?

**TL;DR:** Introduce virtual slot ref to eliminate redundant computation
of common sub-expressions

### Problem to solve
Consider the following queries:
```sql
select funcC(funcA(colA)), funcB(funcA(colA)) from table;
```
```sql
select funcA(colA) as sub from table where funcB(funcA(colA)) > 0;
```
```sql
select l2_distance(colA, [10]) as distance from table where l2_distance(colA, [10]) > 0
```

The common characteristic of these SQL statements is that certain
expressions appear multiple times in different places—whether in the
projection, in predicates, or during index computation (e.g., for ANN
index in Q3). These identical repeated expressions are currently
computed multiple times, but they could actually be computed just once.

We introduce **virtual slot ref** to address this issue.

In the storage layer, we implement a `VirtualColumnIterator`. The
behavior of `VirtualColumnIterator` is identical to other
`ColumnIterator`s, except it is not used to read any physical column.
Instead, it is dedicated to reading the result of expressions computed
from the index (for example, the distance returned by an ANN index, or
in the future, the relevance score from a full-text index). Once an
expression result is computed via the index, we use
`VirtualColumnIterator::prepare_materialization` to store the data
source. If a segment does not have the corresponding index, the data
source of the `VirtualColumnIterator` will be a special `ColumnNothing`
type (this is an important design trick that allows virtual slot ref to
elegantly handle the case where a segment does not yet have the index
built).

We also modify `SegmentIterator`. Before processing each block, we first
initialize the positions of the virtual slot ref in the block as
`ColumnNothing`. Before actually returning a block, we check whether the
virtual slot ref have been materialized; if not, we execute the
expressions corresponding to the virtual slot ref (e.g., `l2_distance`
or a `score` function) to generate the actual virtual slot ref, ensuring
that every virtual column in the block returned to the computation layer
has been materialized.

For expression evaluation, we introduce `VirtualSlotRef`, which is
essentially `SlotRef` + `FunctionCall`. When the expression tree
executes a node of this type, it automatically checks whether the
corresponding expression has been materialized: if it has,
`VirtualSlotRef` behaves like a `SlotRef`; if it hasn’t, it behaves like
a `FunctionCall`.

### Modification on planner
Here’s an example to better illustrate the execution of virtual slot
ref:
```sql
select func(colA) from table where func(colA) > 0;
```
For this SQL, our current `ScanNode` is:
```
ScanNode {
    predicates: func(colA[#0]) > 0
    final projection: func(colA[#0])
    final projection tuple id: 1
    tuple_id: 0
}

TupleDesc[id=0]{
    SlotDesc{id=0, col=colA)}
}

TupleDesc[id=1] {
    SlotDesc{id=1, col=null, ..., type=float64)
}
```
After this pr, our plan will become:
```
ScanNode {
    predicate: function(colA)[apache#1] > 0
    final projection: function(colA)[apache#1]
    final projection tuple id: 1
    tuple_id: 0
}

TupleDesc[id=0]{
    SlotDesc{id=0, col="colA")},
    SlotDesc{id=1, col=virtual_column_1, expr=function1(colA[#0])}
}

TupleDesc[id=1] {
    SlotDesc{id=2, name=virtual_column_1[apache#1])
}
```

Note that we added a `VirtualSlot` in Tuple 0, and other places that
originally required computing the expression are transformed to
reference this `VirtualSlot`. In this way, redundant computation of
common expressions is eliminated.

### Benchmark
disable the plan rules
```sql
mysql> set disable_nereids_rules='PUSH_DOWN_VIRTUAL_COLUMNS_INTO_OLAP_SCAN';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT counterid,        Count(*)               AS hit_count,        Count(DISTINCT userid) AS unique_users FROM   hits WHERE  ( Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) = 'GOOGLE.COM'           OR Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) =              'GOOGLE.RU'           OR Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) LIKE              '%GOOGLE%' )        AND ( Length(Regexp_extract(referer, '^https?://([^/]+)', 1)) > 3               OR Regexp_extract(referer, '^https?://([^/]+)', 1) != ''
              OR Regexp_extract(referer, '^https?://([^/]+)', 1) IS NOT NULL )        AND eventdate = '2013-07-15' GROUP  BY counterid HAVING hit_count > 100 ORDER  BY hit_count DESC LIMIT  20;
+-----------+-----------+--------------+
| counterid | hit_count | unique_users |
+-----------+-----------+--------------+
|    105857 |   1919075 |      1412926 |
|    117917 |    200018 |        50285 |
|     99062 |    114384 |        71408 |
|      1634 |     43839 |        14975 |
|        59 |     31328 |         6668 |
|    114157 |     28852 |        19729 |
|        62 |     22549 |        14130 |
|      1483 |      8425 |         5677 |
|        38 |      5436 |         1805 |
|      1060 |      4043 |         2948 |
|     76221 |      2060 |         1325 |
|    128858 |      1690 |          825 |
|    102847 |      1500 |          350 |
|     89761 |      1419 |          274 |
|     92040 |      1180 |          978 |
|      1089 |      1067 |          961 |
|      2004 |       880 |          698 |
|      1213 |       597 |          219 |
|     77729 |       448 |          108 |
|     71099 |       289 |           70 |
+-----------+-----------+--------------+
20 rows in set (1.50 sec)
```
reopen the rule:
```text
mysql> unset variable disable_nereids_rules;
--------------
unset variable disable_nereids_rules
--------------

Query OK, 0 rows affected (0.00 sec)

mysql> -- 查询 1: 分析从 Google 中获得最多点击的 20 个网站
mysql> SELECT counterid,
    ->        Count(*)               AS hit_count,
    ->        Count(DISTINCT userid) AS unique_users
    -> FROM   hits
    -> WHERE  ( Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) = 'GOOGLE.COM'
    ->           OR Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) =
    ->              'GOOGLE.RU'
    ->           OR Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) LIKE
    ->              '%GOOGLE%' )
    ->        AND ( Length(Regexp_extract(referer, '^https?://([^/]+)', 1)) > 3
    ->               OR Regexp_extract(referer, '^https?://([^/]+)', 1) != ''
    ->               OR Regexp_extract(referer, '^https?://([^/]+)', 1) IS NOT NULL )
    ->        AND eventdate = '2013-07-15'
    -> GROUP  BY counterid
    -> HAVING hit_count > 100
    -> ORDER  BY hit_count DESC
    -> LIMIT  20;
+-----------+-----------+--------------+
| counterid | hit_count | unique_users |
+-----------+-----------+--------------+
|    105857 |   1919075 |      1412926 |
|    117917 |    200018 |        50285 |
|     99062 |    114384 |        71408 |
|      1634 |     43839 |        14975 |
|        59 |     31328 |         6668 |
|    114157 |     28852 |        19729 |
|        62 |     22549 |        14130 |
|      1483 |      8425 |         5677 |
|        38 |      5436 |         1805 |
|      1060 |      4043 |         2948 |
|     76221 |      2060 |         1325 |
|    128858 |      1690 |          825 |
|    102847 |      1500 |          350 |
|     89761 |      1419 |          274 |
|     92040 |      1180 |          978 |
|      1089 |      1067 |          961 |
|      2004 |       880 |          698 |
|      1213 |       597 |          219 |
|     77729 |       448 |          108 |
|     71099 |       289 |           70 |
+-----------+-----------+--------------+
20 rows in set (0.57 sec)
```
About 300% optimization.

### TODO
In the future, we can leverage virtual slot ref to implement more
functionalities, including:

1. ANN Index
2. Relevance scoring based on full-text indexes
3. Generated Columns (of NOT ALWAYS type)
4. Index-Only Scan (which will require modifying SlotRef computation in
`SegmentIterator` to be pull-based)
5. CSE replace rule on FE is very basic, but enough to use for now. Need
a further modification on
fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/PushDownVirtualColumnsIntoOlapScanTest.java
---

Co-authored-by: morrySnow <zhangwenxin@selectdb.com>
zhiqiang-hhhh added a commit to zhiqiang-hhhh/doris that referenced this pull request Aug 4, 2025
**TL;DR:** Introduce virtual slot ref to eliminate redundant computation
of common sub-expressions

Consider the following queries:
```sql
select funcC(funcA(colA)), funcB(funcA(colA)) from table;
```
```sql
select funcA(colA) as sub from table where funcB(funcA(colA)) > 0;
```
```sql
select l2_distance(colA, [10]) as distance from table where l2_distance(colA, [10]) > 0
```

The common characteristic of these SQL statements is that certain
expressions appear multiple times in different places—whether in the
projection, in predicates, or during index computation (e.g., for ANN
index in Q3). These identical repeated expressions are currently
computed multiple times, but they could actually be computed just once.

We introduce **virtual slot ref** to address this issue.

In the storage layer, we implement a `VirtualColumnIterator`. The
behavior of `VirtualColumnIterator` is identical to other
`ColumnIterator`s, except it is not used to read any physical column.
Instead, it is dedicated to reading the result of expressions computed
from the index (for example, the distance returned by an ANN index, or
in the future, the relevance score from a full-text index). Once an
expression result is computed via the index, we use
`VirtualColumnIterator::prepare_materialization` to store the data
source. If a segment does not have the corresponding index, the data
source of the `VirtualColumnIterator` will be a special `ColumnNothing`
type (this is an important design trick that allows virtual slot ref to
elegantly handle the case where a segment does not yet have the index
built).

We also modify `SegmentIterator`. Before processing each block, we first
initialize the positions of the virtual slot ref in the block as
`ColumnNothing`. Before actually returning a block, we check whether the
virtual slot ref have been materialized; if not, we execute the
expressions corresponding to the virtual slot ref (e.g., `l2_distance`
or a `score` function) to generate the actual virtual slot ref, ensuring
that every virtual column in the block returned to the computation layer
has been materialized.

For expression evaluation, we introduce `VirtualSlotRef`, which is
essentially `SlotRef` + `FunctionCall`. When the expression tree
executes a node of this type, it automatically checks whether the
corresponding expression has been materialized: if it has,
`VirtualSlotRef` behaves like a `SlotRef`; if it hasn’t, it behaves like
a `FunctionCall`.

Here’s an example to better illustrate the execution of virtual slot
ref:
```sql
select func(colA) from table where func(colA) > 0;
```
For this SQL, our current `ScanNode` is:
```
ScanNode {
    predicates: func(colA[#0]) > 0
    final projection: func(colA[#0])
    final projection tuple id: 1
    tuple_id: 0
}

TupleDesc[id=0]{
    SlotDesc{id=0, col=colA)}
}

TupleDesc[id=1] {
    SlotDesc{id=1, col=null, ..., type=float64)
}
```
After this pr, our plan will become:
```
ScanNode {
    predicate: function(colA)[#1] > 0
    final projection: function(colA)[#1]
    final projection tuple id: 1
    tuple_id: 0
}

TupleDesc[id=0]{
    SlotDesc{id=0, col="colA")},
    SlotDesc{id=1, col=virtual_column_1, expr=function1(colA[#0])}
}

TupleDesc[id=1] {
    SlotDesc{id=2, name=virtual_column_1[#1])
}
```

Note that we added a `VirtualSlot` in Tuple 0, and other places that
originally required computing the expression are transformed to
reference this `VirtualSlot`. In this way, redundant computation of
common expressions is eliminated.

disable the plan rules
```sql
mysql> set disable_nereids_rules='PUSH_DOWN_VIRTUAL_COLUMNS_INTO_OLAP_SCAN';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT counterid,        Count(*)               AS hit_count,        Count(DISTINCT userid) AS unique_users FROM   hits WHERE  ( Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) = 'GOOGLE.COM'           OR Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) =              'GOOGLE.RU'           OR Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) LIKE              '%GOOGLE%' )        AND ( Length(Regexp_extract(referer, '^https?://([^/]+)', 1)) > 3               OR Regexp_extract(referer, '^https?://([^/]+)', 1) != ''
              OR Regexp_extract(referer, '^https?://([^/]+)', 1) IS NOT NULL )        AND eventdate = '2013-07-15' GROUP  BY counterid HAVING hit_count > 100 ORDER  BY hit_count DESC LIMIT  20;
+-----------+-----------+--------------+
| counterid | hit_count | unique_users |
+-----------+-----------+--------------+
|    105857 |   1919075 |      1412926 |
|    117917 |    200018 |        50285 |
|     99062 |    114384 |        71408 |
|      1634 |     43839 |        14975 |
|        59 |     31328 |         6668 |
|    114157 |     28852 |        19729 |
|        62 |     22549 |        14130 |
|      1483 |      8425 |         5677 |
|        38 |      5436 |         1805 |
|      1060 |      4043 |         2948 |
|     76221 |      2060 |         1325 |
|    128858 |      1690 |          825 |
|    102847 |      1500 |          350 |
|     89761 |      1419 |          274 |
|     92040 |      1180 |          978 |
|      1089 |      1067 |          961 |
|      2004 |       880 |          698 |
|      1213 |       597 |          219 |
|     77729 |       448 |          108 |
|     71099 |       289 |           70 |
+-----------+-----------+--------------+
20 rows in set (1.50 sec)
```
reopen the rule:
```text
mysql> unset variable disable_nereids_rules;
--------------
unset variable disable_nereids_rules
--------------

Query OK, 0 rows affected (0.00 sec)

mysql> -- 查询 1: 分析从 Google 中获得最多点击的 20 个网站
mysql> SELECT counterid,
    ->        Count(*)               AS hit_count,
    ->        Count(DISTINCT userid) AS unique_users
    -> FROM   hits
    -> WHERE  ( Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) = 'GOOGLE.COM'
    ->           OR Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) =
    ->              'GOOGLE.RU'
    ->           OR Upper(Regexp_extract(referer, '^https?://([^/]+)', 1)) LIKE
    ->              '%GOOGLE%' )
    ->        AND ( Length(Regexp_extract(referer, '^https?://([^/]+)', 1)) > 3
    ->               OR Regexp_extract(referer, '^https?://([^/]+)', 1) != ''
    ->               OR Regexp_extract(referer, '^https?://([^/]+)', 1) IS NOT NULL )
    ->        AND eventdate = '2013-07-15'
    -> GROUP  BY counterid
    -> HAVING hit_count > 100
    -> ORDER  BY hit_count DESC
    -> LIMIT  20;
+-----------+-----------+--------------+
| counterid | hit_count | unique_users |
+-----------+-----------+--------------+
|    105857 |   1919075 |      1412926 |
|    117917 |    200018 |        50285 |
|     99062 |    114384 |        71408 |
|      1634 |     43839 |        14975 |
|        59 |     31328 |         6668 |
|    114157 |     28852 |        19729 |
|        62 |     22549 |        14130 |
|      1483 |      8425 |         5677 |
|        38 |      5436 |         1805 |
|      1060 |      4043 |         2948 |
|     76221 |      2060 |         1325 |
|    128858 |      1690 |          825 |
|    102847 |      1500 |          350 |
|     89761 |      1419 |          274 |
|     92040 |      1180 |          978 |
|      1089 |      1067 |          961 |
|      2004 |       880 |          698 |
|      1213 |       597 |          219 |
|     77729 |       448 |          108 |
|     71099 |       289 |           70 |
+-----------+-----------+--------------+
20 rows in set (0.57 sec)
```
About 300% optimization.

In the future, we can leverage virtual slot ref to implement more
functionalities, including:

1. ANN Index
2. Relevance scoring based on full-text indexes
3. Generated Columns (of NOT ALWAYS type)
4. Index-Only Scan (which will require modifying SlotRef computation in
`SegmentIterator` to be pull-based)
5. CSE replace rule on FE is very basic, but enough to use for now. Need
a further modification on
fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/PushDownVirtualColumnsIntoOlapScanTest.java
---

Co-authored-by: morrySnow <zhangwenxin@selectdb.com>
yiguolei pushed a commit that referenced this pull request Aug 4, 2025
#54223)

### What problem does this PR solve?

Related PR: #52701

TimeV2 is a runtime type, it can not be used as VirtualSlotRef.

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
airborne12 pushed a commit that referenced this pull request Aug 18, 2025
Tablet schema that contains virtual column should not be added into
SchemaCache.

Related PR: #52701
airborne12 pushed a commit that referenced this pull request Aug 22, 2025
### What problem does this PR solve?

Introducing Ann index to doris. 

This pull request introduces foundational support for ANN (Approximate
Nearest Neighbor) vector index functionality in the storage engine,
including new runtime structures, configuration options, and initial
integration with the build system. The changes lay the groundwork for
ANN-based search and statistics collection, and begin integrating ANN
index support into various storage and query execution paths.

The implementation of ann index is based on
[faiss](https://github.com/facebookresearch/faiss).
Faiss could return distance directly, so this pr using [virtual slot
ref](#52701) to return result from
index.

Each data segment of doris will have a faiss index if user creates a
table with Ann index, and new segment generated by compaction will have
a faiss index automatically.

Currently, create index and build index is not supported, index
defination be added to ddl if you want it.

**ANN Index Feature Integration:**

* Added new runtime structures and parameters for ANN index operations,
including `AnnIndexStats`, `AnnIndexParam`, `RangeSearchParams`,
`RangeSearchResult`, and others in `ann_search_params.h`, as well as
`RangeSearchRuntimeInfo` for managing ANN range search context.
[[1]](diffhunk://#diff-088dbea44296fb3669fe0cd22005df6aff33f8b60b20d5a49a68c6bbd22c29d1R1-R109)
[[2]](diffhunk://#diff-ec34e664611a5877cab8f157919c35fe9b697428533c702536c97fd4f05769bdR1-R97)
[[3]](diffhunk://#diff-d41283d91ba2756db2b45cadf964a78ad1a5c3360e1b854cb1a3d1f60817c804R1-R44)
* Extended `StorageReadOptions` and `RowsetReaderContext` to include
`ann_topn_runtime` for passing ANN runtime information through the
storage read path.
[[1]](diffhunk://#diff-8dec3cee74c5e0821835a7125bede7e3358bd3b5067b2748262193cb4cf80d48R126)
[[2]](diffhunk://#diff-19fb296aa338021a0806017aa78ddadbea1791de3f4545724e34f8d9683a6551R95)
[[3]](diffhunk://#diff-66ec81528c724b2f69e242d61f35d8a54d7bb5286a7e334c486166a3cc946642R106)
* Added new ANN-related statistics fields (timing and row counts) to
`OlapReaderStatistics` for monitoring ANN index operations.

**Build System and Dependency Updates:**

* Added `doris-faiss` and `doris-openblas` as submodules for ANN/vector
index support, and integrated the new `Vector` library into the build
process and as a dependency for relevant targets.
[[1]](diffhunk://#diff-fe7afb5c9c916e521401d3fcfb4277d5071798c3baf83baf11d6071742823584R24-R31)
[[2]](diffhunk://#diff-3507aac2aff9b5fe5f66d28967f3aa848491d4ced2466f6bf201ab3a97531837R532)
[[3]](diffhunk://#diff-3507aac2aff9b5fe5f66d28967f3aa848491d4ced2466f6bf201ab3a97531837R787-R788)
[[4]](diffhunk://#diff-d67261040b7ca84a64e8aeef5f7e1a8bab5efcc20fcdd3402f24160f56c29959R26)

**Index Handling and Schema Integration:**

* Updated index file writer accessors and naming from "inverted_index"
to more generic "index" to accommodate ANN and other index types.
[[1]](diffhunk://#diff-0c1c144f791918ef5b05ded169a7efb22a0ae67565e641cc03c31d4c2872729eL747-R748)
[[2]](diffhunk://#diff-60cd05e044b4218e4a4d774abe89636fa0f6e1290dd0ff7892231d30770cd2b1L193-R193)
* Changed index creation logic in `SegmentFlusher` to use
`has_extra_index()` (supporting both inverted and ANN indexes) instead
of `has_inverted_index()`.
[[1]](diffhunk://#diff-7e9f53b4ef59bdb00d10393a2941be9201ddd46c3aab957d1dae8bc5d8898ebeL139-R139)
[[2]](diffhunk://#diff-7e9f53b4ef59bdb00d10393a2941be9201ddd46c3aab957d1dae8bc5d8898ebeL157-R157)
[[3]](diffhunk://#diff-7e9f53b4ef59bdb00d10393a2941be9201ddd46c3aab957d1dae8bc5d8898ebeL176-R176)
[[4]](diffhunk://#diff-7e9f53b4ef59bdb00d10393a2941be9201ddd46c3aab957d1dae8bc5d8898ebeL193-R193)

**Configuration:**

* Introduced a new configuration option `opm_threads_limit` to control
the maximum number of OpenMP threads used per Doris thread, which is
relevant for vectorized/ANN computation.
[[1]](diffhunk://#diff-b626e6ab16bc72abf40db76bf5094fcc8ca3c37534c2eb83b63b7805e1b601ffR1578-R1580)
[[2]](diffhunk://#diff-46e8c1ada0d43acf8c2965e46e90909089aada1f46531976c10605b837f8da3dR1634-R1635)

These changes set up the infrastructure required for future development
of ANN vector index features, including search, filtering, and
statistics collection.

Co-authored-by: chenlinzhong <490103404@qq.com>
Co-authored-by: morrySnow <zhangwenxin@selectdb.com>
BiteTheDDDDt pushed a commit that referenced this pull request Aug 28, 2025
…55323)

Some expression can not handle empty block, such as function
`element_at`.
So materialize virtual column in advance to avoid errors.

Related PR: #52701
morrySnow added a commit that referenced this pull request Aug 28, 2025
…54998)

### What problem does this PR solve?

Related PR: #52701

Problem Summary:

1. do not push down WhenClause in case when
2. not generate virtual column which only used once: fix the pattern
like `select func_a(x), func_b(func_a(x)), func_c(func_b(func_a(x)))`.
zhiqiang-hhhh pushed a commit to zhiqiang-hhhh/doris that referenced this pull request Aug 29, 2025
…pache#54998)

### What problem does this PR solve?

Related PR: apache#52701

Problem Summary:

1. do not push down WhenClause in case when
2. not generate virtual column which only used once: fix the pattern
like `select func_a(x), func_b(func_a(x)), func_c(func_b(func_a(x)))`.
englefly pushed a commit that referenced this pull request Sep 5, 2025
### What problem does this PR solve?

need adjust nullable for expression of the virtual slot

bug introduced by:  #52701 

example as follow:

SQL: 

```sql
        SELECT t1.*, t2.*
        FROM
            tbl_adjust_virtual_slot_nullable_1 AS t1
        LEFT JOIN tbl_adjust_virtual_slot_nullable_2 AS t2
        ON  t1.c_int = t2.c_int
        WHERE
            NOT (
                    day(t2.c_date) IN (1, 3)
                AND
                    day(t2.c_date) IN (2, 3, 3)
                );
```

throw exception:

```
java.sql.SQLException: errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]Could not find function dayofmonth, arg c_date return Nullable(TINYINT)
```
wenzhenghu pushed a commit to wenzhenghu/doris that referenced this pull request Sep 8, 2025
…#55694)

### What problem does this PR solve?

need adjust nullable for expression of the virtual slot

bug introduced by:  apache#52701 

example as follow:

SQL: 

```sql
        SELECT t1.*, t2.*
        FROM
            tbl_adjust_virtual_slot_nullable_1 AS t1
        LEFT JOIN tbl_adjust_virtual_slot_nullable_2 AS t2
        ON  t1.c_int = t2.c_int
        WHERE
            NOT (
                    day(t2.c_date) IN (1, 3)
                AND
                    day(t2.c_date) IN (2, 3, 3)
                );
```

throw exception:

```
java.sql.SQLException: errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]Could not find function dayofmonth, arg c_date return Nullable(TINYINT)
```
morrySnow pushed a commit that referenced this pull request Sep 12, 2025
### What problem does this PR solve?

Related: #52701

1. Functions that processing comprehensive type is too complicated, and
may have many unexpected problem, eg. return `array<null_type>`, so do
not process them by using virtual slot.
2. lambda function can not be processed by virtual column.

So stop removing common sub expression if we meet above cases.
englefly pushed a commit that referenced this pull request Sep 17, 2025
### What problem does this PR solve?

Related PR: #52701

1. Do not optimize grouping scalar function.
2. Fix rule type of ann topn push down.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants