Skip to content

Conversation

@uchenily
Copy link
Contributor

What problem does this PR solve?

This pull request standardizes the return type of all vector distance functions to float across the codebase, ensuring consistency and improving performance for vector similarity search operations.

Related PR: #54276

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.

The array type in the distance function parameter cannot contain null values, otherwise a runtime error will occur.

If the sum of squares of x or y in cosine_distance is 0, in this case, return distance 2 directly to avoid division by zero.

  • Does this need documentation?
    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@uchenily
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

Possible file(s) that should be tracked in LFS detected: 🚨

The following file(s) exceeds the file size limit: 1048576 bytes, as set in the .yml configuration files:

  • regression-test/data/nereids_function_p0/scalar_function/Array1.out

Consider using git-lfs to manage large files.

@github-actions github-actions bot added the lfs-detected! Warning Label for use when LFS is detected in the commits of a Pull Request label Aug 28, 2025
@doris-robot
Copy link

TPC-H: Total hot run time: 34199 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4f691d9ac968444b3c108528eda39d993b42b07d, data reload: false

------ Round 1 ----------------------------------
q1	17634	5259	5044	5044
q2	2019	313	208	208
q3	10257	1264	723	723
q4	10242	1042	509	509
q5	7569	2490	2277	2277
q6	191	174	140	140
q7	920	763	643	643
q8	9363	1351	1100	1100
q9	6913	5093	5129	5093
q10	6944	2379	1980	1980
q11	501	304	279	279
q12	358	357	242	242
q13	17779	3649	3062	3062
q14	240	250	235	235
q15	581	509	496	496
q16	427	440	384	384
q17	601	867	364	364
q18	7497	7169	7114	7114
q19	1376	958	557	557
q20	340	338	233	233
q21	3706	2523	2958	2523
q22	1052	1039	993	993
Total cold run time: 106510 ms
Total hot run time: 34199 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5190	5151	5152	5151
q2	253	332	227	227
q3	2186	2675	2280	2280
q4	1382	1800	1328	1328
q5	4195	4300	4563	4300
q6	215	166	133	133
q7	2078	1994	1864	1864
q8	2666	2678	2535	2535
q9	7388	7372	7230	7230
q10	3182	3384	2869	2869
q11	579	516	507	507
q12	662	763	640	640
q13	3550	3828	3301	3301
q14	279	300	300	300
q15	533	469	484	469
q16	468	517	449	449
q17	1189	1608	1390	1390
q18	8035	7625	7575	7575
q19	851	802	958	802
q20	2003	2086	1782	1782
q21	4717	4418	4313	4313
q22	1111	1039	993	993
Total cold run time: 52712 ms
Total hot run time: 50438 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 187005 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4f691d9ac968444b3c108528eda39d993b42b07d, data reload: false

query1	1078	435	418	418
query2	6561	1764	1796	1764
query3	6761	234	228	228
query4	26837	23482	23164	23164
query5	4421	623	497	497
query6	344	236	214	214
query7	4637	526	300	300
query8	300	251	243	243
query9	8624	2878	2836	2836
query10	485	337	286	286
query11	15667	15575	15061	15061
query12	186	126	128	126
query13	1691	570	461	461
query14	9437	5918	5873	5873
query15	213	190	178	178
query16	7197	656	469	469
query17	1235	804	621	621
query18	2007	436	374	374
query19	198	196	171	171
query20	134	125	127	125
query21	222	131	113	113
query22	4104	4275	4095	4095
query23	33689	32878	32986	32878
query24	8184	2375	2413	2375
query25	582	520	455	455
query26	1231	282	170	170
query27	2732	523	352	352
query28	4345	2254	2218	2218
query29	824	594	494	494
query30	289	239	197	197
query31	915	786	744	744
query32	94	84	87	84
query33	579	395	348	348
query34	779	847	520	520
query35	845	825	780	780
query36	978	1029	953	953
query37	131	115	95	95
query38	4052	4021	4097	4021
query39	1489	1444	1443	1443
query40	241	144	134	134
query41	115	66	63	63
query42	123	117	119	117
query43	552	512	488	488
query44	1337	850	857	850
query45	183	177	174	174
query46	872	1008	652	652
query47	1755	1780	1728	1728
query48	387	424	327	327
query49	744	504	411	411
query50	638	668	398	398
query51	4095	4219	4179	4179
query52	119	116	104	104
query53	292	267	201	201
query54	619	616	545	545
query55	97	93	91	91
query56	346	335	333	333
query57	1198	1187	1114	1114
query58	286	286	280	280
query59	2654	2726	2609	2609
query60	379	372	363	363
query61	167	164	178	164
query62	787	721	678	678
query63	234	194	192	192
query64	4510	1142	877	877
query65	4319	4185	4204	4185
query66	1188	463	369	369
query67	15277	15111	15090	15090
query68	7824	932	590	590
query69	525	340	301	301
query70	1256	1195	1179	1179
query71	449	355	328	328
query72	6078	4992	5043	4992
query73	672	634	357	357
query74	8925	9341	8894	8894
query75	3394	3098	2627	2627
query76	3283	1130	738	738
query77	690	430	352	352
query78	9622	9786	8799	8799
query79	2379	830	605	605
query80	714	591	512	512
query81	504	268	232	232
query82	211	138	116	116
query83	267	268	251	251
query84	260	109	98	98
query85	860	478	430	430
query86	390	327	301	301
query87	4326	4219	4219	4219
query88	3000	2226	2208	2208
query89	397	338	302	302
query90	1991	235	234	234
query91	164	159	134	134
query92	96	76	72	72
query93	2247	1015	638	638
query94	691	417	317	317
query95	413	331	333	331
query96	487	581	285	285
query97	2657	2666	2595	2595
query98	258	229	230	229
query99	1336	1486	1279	1279
Total cold run time: 273957 ms
Total hot run time: 187005 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.98 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 4f691d9ac968444b3c108528eda39d993b42b07d, data reload: false

query1	0.06	0.04	0.05
query2	0.10	0.05	0.06
query3	0.26	0.09	0.09
query4	1.61	0.12	0.12
query5	0.45	0.42	0.42
query6	1.18	0.65	0.66
query7	0.03	0.03	0.02
query8	0.06	0.05	0.05
query9	0.59	0.53	0.53
query10	0.60	0.59	0.58
query11	0.18	0.11	0.12
query12	0.16	0.13	0.12
query13	0.63	0.64	0.61
query14	0.81	0.86	0.86
query15	0.90	0.85	0.85
query16	0.39	0.40	0.39
query17	1.07	1.06	1.02
query18	0.26	0.21	0.20
query19	1.89	1.78	1.80
query20	0.02	0.01	0.01
query21	15.40	0.96	0.59
query22	0.80	1.19	0.73
query23	14.88	1.46	0.64
query24	6.61	1.73	0.91
query25	0.51	0.34	0.08
query26	0.60	0.15	0.13
query27	0.07	0.06	0.05
query28	10.39	0.96	0.44
query29	12.59	3.96	3.27
query30	3.08	3.07	3.03
query31	2.83	0.58	0.38
query32	3.24	0.55	0.47
query33	3.06	3.18	3.10
query34	16.12	5.52	4.86
query35	4.89	4.92	4.93
query36	0.73	0.51	0.50
query37	0.10	0.08	0.07
query38	0.06	0.05	0.04
query39	0.04	0.03	0.03
query40	0.18	0.16	0.14
query41	0.08	0.03	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.04
Total cold run time: 107.59 s
Total hot run time: 32.98 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 16.67% (2/12) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 54.84% (34/62) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 51.80% (17151/33111)
Line Coverage 37.27% (156301/419361)
Region Coverage 31.97% (119196/372842)
Branch Coverage 33.25% (52363/157482)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 98.39% (61/62) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.64% (22977/32529)
Line Coverage 56.85% (238354/419251)
Region Coverage 52.42% (198369/378388)
Branch Coverage 53.93% (85474/158480)

@uchenily uchenily closed this Aug 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lfs-detected! Warning Label for use when LFS is detected in the commits of a Pull Request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants