-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improve](function)Refactor distance function return types to float #55184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
66e5255 to
ba4abda
Compare
|
run buildall |
| const auto& offsets2 = *arr2.offsets_ptr; | ||
| const auto& nested_col1 = assert_cast<const ColumnFloat64*>(arr1.nested_col.get()); | ||
| const auto& nested_col2 = assert_cast<const ColumnFloat64*>(arr2.nested_col.get()); | ||
| const auto& nested_col1 = assert_cast<const ColumnType*>(arr1.nested_col.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if input column is nullable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now all distance funcs are changed to PropagateNullable, block data type is FLOAT rather than Nullable(FLOAT), so I removed the null_column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now all distance funcs are changed to PropagateNullable, block data type is FLOAT rather than Nullable(FLOAT), so I removed the null_column.
If i am not mistaken, propageteNullable means return null if input is null, so there still could have a nullable input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can add a regression test.
set debug_skip_fold_constant=true;
select l2_distance([1.0, 2.0, 3.0], [1.0, null, 3.0]);
select l2_distance([1.0, 2.0, 3.0], null);Both of above sql return NULL in current master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhiqiang-hhhh it will be ok. because here we del the use_default_implementation_for_nulls the func in doris will dispose the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhiqiang-hhhh it will be ok. because here we del the
use_default_implementation_for_nullsthe func in doris will dispose the case
ok, i have no problem
TPC-H: Total hot run time: 34012 ms |
TPC-DS: Total hot run time: 185272 ms |
ClickBench: Total hot run time: 32.53 s |
|
run buildall |
TPC-H: Total hot run time: 34416 ms |
TPC-DS: Total hot run time: 185338 ms |
ClickBench: Total hot run time: 32.78 s |
cb6209c to
2c27b30
Compare
|
run buildall |
TPC-H: Total hot run time: 34094 ms |
TPC-DS: Total hot run time: 184567 ms |
ClickBench: Total hot run time: 32.92 s |
|
run buildall |
TPC-H: Total hot run time: 33989 ms |
TPC-DS: Total hot run time: 185107 ms |
ClickBench: Total hot run time: 32.5 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
TPC-H: Total hot run time: 34334 ms |
TPC-DS: Total hot run time: 187333 ms |
ClickBench: Total hot run time: 32.67 s |
Possible file(s) that should be tracked in LFS detected: 🚨The following file(s) exceeds the file size limit:
Consider using |
|
run buildall |
8600428 to
3c2d11d
Compare
Possible file(s) that should be tracked in LFS detected: 🚨The following file(s) exceeds the file size limit:
Consider using |
|
run buildall |
TPC-H: Total hot run time: 34455 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
TPC-DS: Total hot run time: 187671 ms |
ClickBench: Total hot run time: 32.64 s |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression && UT Coverage ReportIncrement line coverage |
|
run p0 |
|
run nonConcurrent |
zhiqiang-hhhh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by anyone and no changes requested. |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
1 similar comment
FE Regression Coverage ReportIncrement line coverage |
HappenLee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
What problem does this PR solve?
This pull request standardizes the return type of all vector distance functions to float across the codebase, ensuring consistency and improving performance for vector similarity search operations.
Related PR: #54276
Release note
None
Check List (For Author)
Test
Behavior changed:
The array type in the distance function parameter cannot contain null values, otherwise a runtime error will occur.
If the sum of squares of x or y in cosine_distance is 0, in this case, return distance 2 directly to avoid division by zero.
Check List (For Reviewer who merge this PR)