-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparisons (BinaryExprs) to literal NULL don't work #1179
Comments
FYI here is what postgres does
|
FYI i put
|
The source of the problem is here:
I would propose:
|
i wonder how this line up with typed nulls for other enum cases. or, could we actually get rid of the diff --git a/datafusion/src/scalar.rs b/datafusion/src/scalar.rs
index 00586bf55..0ce87195c 100644
--- a/datafusion/src/scalar.rs
+++ b/datafusion/src/scalar.rs
@@ -37,32 +37,34 @@ use std::{convert::TryFrom, fmt, iter::repeat, sync::Arc};
/// This is the single-valued counter-part of arrow’s `Array`.
#[derive(Clone)]
pub enum ScalarValue {
+ /// a typed or untyped null
+ Null(DataType),
/// true or false value
- Boolean(Option<bool>),
+ Boolean(bool)
/// 32bit float
- Float32(Option<f32>),
+ Float32(f32)
/// 64bit float
- Float64(Option<f64>),
+ Float64(f64)
/// signed 8bit int
- Int8(Option<i8>),
+ Int8(i8)
/// signed 16bit int
- Int16(Option<i16>),
+ Int16(i16)
/// signed 32bit int
- Int32(Option<i32>),
+ Int32(i32)
/// signed 64bit int
- Int64(Option<i64>),
+ Int64(i64)
/// unsigned 8bit int
- UInt8(Option<u8>),
+ UInt8(u8)
/// unsigned 16bit int
- UInt16(Option<u16>),
+ UInt16(u16)
/// unsigned 32bit int
- UInt32(Option<u32>),
+ UInt32(u32)
/// unsigned 64bit int
- UInt64(Option<u64>),
+ UInt64(u64)
/// utf-8 encoded string.
- Utf8(Option<String>),
+ Utf8(String)
/// utf-8 encoded string representing a LargeString's arrow type.
- LargeUtf8(Option<String>),
+ LargeUtf8(String)
/// binary
Binary(Option<Vec<u8>>),
/// large binary
@@ -71,21 +73,21 @@ pub enum ScalarValue {
#[allow(clippy::box_vec)]
List(Option<Box<Vec<ScalarValue>>>, Box<DataType>),
/// Date stored as a signed 32bit int
- Date32(Option<i32>),
+ Date32(i32)
/// Date stored as a signed 64bit int
- Date64(Option<i64>),
+ Date64(i64)
/// Timestamp Second
- TimestampSecond(Option<i64>),
+ TimestampSecond(i64)
/// Timestamp Milliseconds
- TimestampMillisecond(Option<i64>),
+ TimestampMillisecond(i64)
/// Timestamp Microseconds
- TimestampMicrosecond(Option<i64>),
+ TimestampMicrosecond(i64)
/// Timestamp Nanoseconds
- TimestampNanosecond(Option<i64>),
+ TimestampNanosecond(i64)
/// Interval with YearMonth unit
- IntervalYearMonth(Option<i32>),
+ IntervalYearMonth(i32)
/// Interval with DayTime unit
- IntervalDayTime(Option<i64>),
+ IntervalDayTime(i64)
/// struct of nested ScalarValue (boxed to reduce size_of(ScalarValue))
#[allow(clippy::box_vec)]
Struct(Option<Box<Vec<ScalarValue>>>, Box<Vec<Field>>), |
Thanks @Dandandan and @jimexist -- yes I have started working on this, and came up with something simple (#1181) and uncovered a bunch of other work
Yes I agree that would be the ideal solution -- I tried a variant of that (basically made The biggest problem I hit was that the arrow-rs cast kernels doesn't support casting from I also found that functions and binary expressions don't share the same coercion logic and that many other expressions / operators don't actually try to coerce their arguments at all (e.g. try running a query like I'll also be filing tickets for those items as well |
🤔 well I thought #1181 was ready but apparently it is not -- I am about out of time to work on this issue today, so I am going to spend the rest of my time writing up tickets. |
Describe the bug
Bug is that datafusion SQL treats a literal
NULL
as though it is typedScalarValue::Utf8(None)
So then if you try to compare null to a column that is not a scalar DataFusion goes 💥
To Reproduce
(I love the new
VALUES
feature that @jimexist added)Expected behavior
All three queries should return 1 column, 2 rows with all values null;
Additional context
This came up in the context of IOx where we are doing partial evaluation (on something that is like at the partition level) where references to columns may be rewritten into
null
s. See more details on https://github.com/influxdata/influxdb_iox/issues/883The text was updated successfully, but these errors were encountered: