-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix is_distinct from for float NaN values #5446
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Thank you @comphead -- I have some suggestions but I don't think they are required to merge this PR.
I wonder if it is time to start porting more of these kernels into arrow 🤔
@@ -80,3 +80,29 @@ select '1' from foo order by column1; | |||
# foo distinct order by | |||
statement error DataFusion error: Error during planning: For SELECT DISTINCT, ORDER BY expressions column1 must appear in select list | |||
select distinct '1' from foo order by column1; | |||
|
|||
# distincts for float nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a good test to add to https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/tests/sqllogictests/test_files/pg_compat so they are automatically compared with postgres as part of CI (kudos to @melgenek for help there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ive duplicated in both select.slt
and pg_compat_simple.slt
Reason for that is PG doesn't support double datatype
!(left_isnull != right_isnull | ||
|| left_value.is_nan() != right_value.is_nan() | ||
|| (!left_value.is_nan() | ||
&& !right_value.is_nan() | ||
&& left_value != right_value)) | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me quite some time to see that the only difference here is that there is a !
in front of the entire expression. Would it be possible to factor out the duplication so the difference between the kernels are clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Right, I think apache/arrow-rs#960 is good place to do it. |
1::float is distinct from null v17 | ||
; | ||
---- | ||
false true false true true false true true false false true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
@@ -32,6 +32,14 @@ use std::sync::Arc; | |||
// Simple (low performance) kernels until optimized kernels are added to arrow | |||
// See https://github.com/apache/arrow-rs/issues/960 | |||
|
|||
macro_rules! distinct_float { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you -- this is great
Thanks again @comphead |
Benchmark runs are scheduled for baseline = 61fc514 and contender = 8de7ea4. 8de7ea4 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #5392.
Rationale for this change
To fix wrong behavior when comparing NaN. Details in #5392
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
No