Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add isnan and iszero #7274

Merged
merged 6 commits into from
Aug 16, 2023
Merged

Add isnan and iszero #7274

merged 6 commits into from
Aug 16, 2023

Conversation

sarutak
Copy link
Member

@sarutak sarutak commented Aug 12, 2023

Which issue does this PR close?

Closes #7273

Rationale for this change

In Datafusion, +NaN and -NaN are distinguished as well as +0.0 and -0.0.
For example, asin(10) returns NaN but 'inf'::DOUBLE / 'inf'::DOUBLE returns -NaN.
3.0 * 0 returns 0.0 but -3.0 * 0 returns -0.0.

So, users need to determine NaN and 0.0 like as follows.

SELECT ... WHERE x = 'NaN'::DOUBLE OR x = -'NaN'::DOUBLE;

I think it's nice to avoid such boilerplate code.

What changes are included in this PR?

This PR aims to add isnan and iszero which determine NaN and 0.0 ignoring their sign.

Are these changes tested?

Added new tests.

Are there any user-facing changes?

Yes but it doesn't break compatibility.

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Aug 12, 2023
Comment on lines 97 to 98
| isnan(x) | predicate determining whether NaN or not |
| iszero(x) | predicate determining whether 0.0 or not |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| isnan(x) | predicate determining whether NaN or not |
| iszero(x) | predicate determining whether 0.0 or not |
| isnan(x) | predicate determining whether NaN/-NaN or not |
| iszero(x) | predicate determining whether 0.0/-0.0 or not |

Comment on lines 1198 to 1199
| BuiltinScalarFunction::Isnan
| BuiltinScalarFunction::Iszero => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider Isnan and Iszero not math expressions as listed here. The reason to give f64 high priority is also not applied too for these two expressions, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on above reason, maybe it is less confused to have them in a separate pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply.
Yes, f64 doesn't need high priority.
I've changed it.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. They are also in Spark expressions too, although seems not all similar systems support these two expressions.

Isnan,
isnan,
num,
"returns true if a given number is +NaN or -NaN otherwise returns false"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't know there are +NaN and -NaN. i thought there's only one type of NaN and (they) are unordered. unlike +0 and -0 which are distinct.

Copy link
Contributor

@tustvold tustvold Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any value with an exponent field of all 1s is a Nan, and so there are 2^N distinct values of NaN, where N is the number of mantissa bits. The same is true of -NaN

Within arrow-rs we follow the IEEE 754 total order predicate which establishes an ordering for NaNs, (and infinity, etc...)

@sarutak
Copy link
Member Author

sarutak commented Aug 15, 2023

This CI failure seems caused by #7270. I opened #7286 to fix it.

@@ -283,6 +285,32 @@ gcd(expression_x, expression_y)
- **expression_y**: Second numeric expression to operate on.
Can be a constant, column, or function, and any combination of arithmetic operators.

### `isnan`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb
Copy link
Contributor

alamb commented Aug 15, 2023

I took the liberty of merging this branch up to resolve some conflicts.

I think it can be merged when the CI passes.

Thank you @sarutak

@@ -99,3 +99,15 @@ query RRR
SELECT nanvl(asin(10), 1.0), nanvl(1.0, 2.0), nanvl(asin(10), asin(10))
----
1 1 NaN

# isnan
query BBBB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will merge it.

@liukun4515 liukun4515 merged commit cf152af into apache:main Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add isnan and iszero for FLOAD and DOUBLE
7 participants