Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparisons (BinaryExprs) to literal NULL don't work #1179

Closed
Tracked by #1184 ...
alamb opened this issue Oct 26, 2021 · 6 comments · Fixed by #2481
Closed
Tracked by #1184 ...

Comparisons (BinaryExprs) to literal NULL don't work #1179

alamb opened this issue Oct 26, 2021 · 6 comments · Fixed by #2481
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented Oct 26, 2021

Describe the bug

Bug is that datafusion SQL treats a literal NULL as though it is typed ScalarValue::Utf8(None)

So then if you try to compare null to a column that is not a scalar DataFusion goes 💥

To Reproduce
(I love the new VALUES feature that @jimexist added)

> select * from (VALUES (1, 'foo' ,2.3), (2, 'bar', 5.4)) as t;
+---------+---------+---------+
| column1 | column2 | column3 |
+---------+---------+---------+
| 1       | foo     | 2.3     |
| 2       | bar     | 5.4     |
+---------+---------+---------+
2 rows in set. Query took 0.004 seconds.
> select column1 < null from (VALUES (1, 'foo' ,2.3), (2, 'bar', 5.4)) as t;
Plan("'Int64 < Utf8' can't be evaluated because there isn't a common type to coerce the types to")
> select column2 < null from (VALUES (1, 'foo' ,2.3), (2, 'bar', 5.4)) as t;
ArrowError(ExternalError(Internal("compute_utf8_op_scalar for 'lt' failed to cast literal value NULL")))
> select column3 < null from (VALUES (1, 'foo' ,2.3), (2, 'bar', 5.4)) as t;
Plan("'Float64 < Utf8' can't be evaluated because there isn't a common type to coerce the types to")

Expected behavior
All three queries should return 1 column, 2 rows with all values null;

result
NULL
NULL

Additional context

This came up in the context of IOx where we are doing partial evaluation (on something that is like at the partition level) where references to columns may be rewritten into nulls. See more details on https://github.com/influxdata/influxdb_iox/issues/883

@alamb alamb added the bug Something isn't working label Oct 26, 2021
@alamb alamb self-assigned this Oct 26, 2021
@alamb
Copy link
Contributor Author

alamb commented Oct 26, 2021

FYI here is what postgres does

alamb@MacBook-Pro:/tmp/arrow-rs$ psql
psql (13.4)
Type "help" for help.

alamb=# select * from (VALUES (1, 'foo' ,2.3), (2, 'bar', 5.4)) as t;
 column1 | column2 | column3 
---------+---------+---------
       1 | foo     |     2.3
       2 | bar     |     5.4
(2 rows)

alamb=# select column1 < null from (VALUES (1, 'foo' ,2.3), (2, 'bar', 5.4)) as t;
 ?column? 
----------
 
 
(2 rows)

alamb=# select column2 < null from (VALUES (1, 'foo' ,2.3), (2, 'bar', 5.4)) as t;
 ?column? 
----------
 
 
(2 rows)

alamb=# select column3 < null from (VALUES (1, 'foo' ,2.3), (2, 'bar', 5.4)) as t;
 ?column? 
----------
 
 
(2 rows)

alamb=# 

@jimexist
Copy link
Member

jimexist commented Oct 27, 2021

FYI i put \pset null ¤ into ~/.psqlrc so that null is explicit

[postgres] # select column1 < null from (VALUES (1, 'foo' ,2.3), (2, 'bar', 5.4)) as t;
 ?column?
----------
 ¤
 ¤
(2 rows)

@Dandandan
Copy link
Contributor

The source of the problem is here:

                        SQLExpr::Value(Value::Null) => {
                            Ok(Expr::Literal(ScalarValue::Utf8(None)))
                        }

I would propose:

  • To introduce ScalarValue::Null and use that instead for Value::Null.
  • Return DataType::Null as type for ScalarValue::Null
  • Implement the rest of changes in coercion as needed.

@jimexist
Copy link
Member

jimexist commented Oct 27, 2021

The source of the problem is here:

                        SQLExpr::Value(Value::Null) => {
                            Ok(Expr::Literal(ScalarValue::Utf8(None)))
                        }

I would propose:

  • To introduce ScalarValue::Null and use that instead for Value::Null.
  • Return DataType::Null as type for ScalarValue::Null
  • Implement the rest of changes in coercion as needed.

i wonder how this line up with typed nulls for other enum cases.

or, could we actually get rid of the Option wrapping for other types, and keep this null enum case with wrapper type instead?

diff --git a/datafusion/src/scalar.rs b/datafusion/src/scalar.rs
index 00586bf55..0ce87195c 100644
--- a/datafusion/src/scalar.rs
+++ b/datafusion/src/scalar.rs
@@ -37,32 +37,34 @@ use std::{convert::TryFrom, fmt, iter::repeat, sync::Arc};
 /// This is the single-valued counter-part of arrow’s `Array`.
 #[derive(Clone)]
 pub enum ScalarValue {
+    /// a typed or untyped null
+    Null(DataType),
     /// true or false value
-    Boolean(Option<bool>),
+    Boolean(bool)
     /// 32bit float
-    Float32(Option<f32>),
+    Float32(f32)
     /// 64bit float
-    Float64(Option<f64>),
+    Float64(f64)
     /// signed 8bit int
-    Int8(Option<i8>),
+    Int8(i8)
     /// signed 16bit int
-    Int16(Option<i16>),
+    Int16(i16)
     /// signed 32bit int
-    Int32(Option<i32>),
+    Int32(i32)
     /// signed 64bit int
-    Int64(Option<i64>),
+    Int64(i64)
     /// unsigned 8bit int
-    UInt8(Option<u8>),
+    UInt8(u8)
     /// unsigned 16bit int
-    UInt16(Option<u16>),
+    UInt16(u16)
     /// unsigned 32bit int
-    UInt32(Option<u32>),
+    UInt32(u32)
     /// unsigned 64bit int
-    UInt64(Option<u64>),
+    UInt64(u64)
     /// utf-8 encoded string.
-    Utf8(Option<String>),
+    Utf8(String)
     /// utf-8 encoded string representing a LargeString's arrow type.
-    LargeUtf8(Option<String>),
+    LargeUtf8(String)
     /// binary
     Binary(Option<Vec<u8>>),
     /// large binary
@@ -71,21 +73,21 @@ pub enum ScalarValue {
     #[allow(clippy::box_vec)]
     List(Option<Box<Vec<ScalarValue>>>, Box<DataType>),
     /// Date stored as a signed 32bit int
-    Date32(Option<i32>),
+    Date32(i32)
     /// Date stored as a signed 64bit int
-    Date64(Option<i64>),
+    Date64(i64)
     /// Timestamp Second
-    TimestampSecond(Option<i64>),
+    TimestampSecond(i64)
     /// Timestamp Milliseconds
-    TimestampMillisecond(Option<i64>),
+    TimestampMillisecond(i64)
     /// Timestamp Microseconds
-    TimestampMicrosecond(Option<i64>),
+    TimestampMicrosecond(i64)
     /// Timestamp Nanoseconds
-    TimestampNanosecond(Option<i64>),
+    TimestampNanosecond(i64)
     /// Interval with YearMonth unit
-    IntervalYearMonth(Option<i32>),
+    IntervalYearMonth(i32)
     /// Interval with DayTime unit
-    IntervalDayTime(Option<i64>),
+    IntervalDayTime(i64)
     /// struct of nested ScalarValue (boxed to reduce size_of(ScalarValue))
     #[allow(clippy::box_vec)]
     Struct(Option<Box<Vec<ScalarValue>>>, Box<Vec<Field>>),

@alamb
Copy link
Contributor Author

alamb commented Oct 27, 2021

Hey @alamb did you start working on this?
I might otherwise check it out how this is handled - maybe before / in the weekend.

Thanks @Dandandan and @jimexist -- yes I have started working on this, and came up with something simple (#1181) and uncovered a bunch of other work

To introduce ScalarValue::Null and use that instead for Value::Null.
Return DataType::Null as type for ScalarValue::Null
Implement the rest of changes in coercion as needed.

Yes I agree that would be the ideal solution -- I tried a variant of that (basically made ScalarValue::get_datatype() return DataType::Null if ScalarValue::is_null() and taught the coercion logic about it for binary operators). However, I have since abandoned that approach in the near term.

The biggest problem I hit was that the arrow-rs cast kernels doesn't support casting from Null -> other_type -- I will be filing a ticket shortly for that.

I also found that functions and binary expressions don't share the same coercion logic and that many other expressions / operators don't actually try to coerce their arguments at all (e.g. try running a query like select case when NULL then 'foo' else 'bar' end;)

I'll also be filing tickets for those items as well

@alamb alamb changed the title Comparisons to literal NULL don't work Comparisons (BinaryExprs) to literal NULL don't work Oct 27, 2021
@alamb
Copy link
Contributor Author

alamb commented Oct 27, 2021

🤔 well I thought #1181 was ready but apparently it is not -- I am about out of time to work on this issue today, so I am going to spend the rest of my time writing up tickets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants