-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support "IS TRUE/FALSE" syntax #3189
Conversation
@sarahyurick I tried some |
Good point, thanks! Dask SQL main currently throws a |
@sarahyurick Could you target this to the Also, see discussion on this feature branch approach in #3191 |
IS TRUE and IS FALSE should both be fully functional. I'm not sure what's going on with the proto files, though. After those are fixed, it should just be formatting checks and tests, if any. |
Co-authored-by: Andy Grove <andygrove73@gmail.com>
Co-authored-by: Andy Grove <andygrove73@gmail.com>
Co-authored-by: Andy Grove <andygrove73@gmail.com>
Codecov Report
@@ Coverage Diff @@
## master #3189 +/- ##
==========================================
- Coverage 85.87% 85.75% -0.13%
==========================================
Files 291 293 +2
Lines 52885 53033 +148
==========================================
+ Hits 45415 45476 +61
- Misses 7470 7557 +87
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Should I also add tests in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @sarahyurick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I would recommend is adding some sort of SQL level test -- like https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/sql/expr.rs#L257 for IS NULL
The rationale for adding this would be to ensure all the plumbing is hooked up through the planner and optimizer
| Expr::IsNotNull(_) | ||
| Expr::IsTrue(_) | ||
| Expr::IsFalse(_) | ||
| Expr::Exists { .. } => Ok(false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
let array_len = array.len(); | ||
let mut result_builder = BooleanBuilder::new(array_len); | ||
for i in 0..array_len { | ||
result_builder.append_value(!bool_array.is_null(i) && !bool_array.value(i)); | ||
} | ||
Ok(ColumnarValue::Array(Arc::new(result_builder.finish()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably do the same thing with less code (and more efficiently) using the FromIter
method:
let array_len = array.len(); | |
let mut result_builder = BooleanBuilder::new(array_len); | |
for i in 0..array_len { | |
result_builder.append_value(!bool_array.is_null(i) && !bool_array.value(i)); | |
} | |
Ok(ColumnarValue::Array(Arc::new(result_builder.finish()))) | |
let result: BooleanArray = bool_array.iter() | |
.map(|v| v.map(|v| !v).or(Some(false))) | |
.collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same comment applies to is_true.rs)
SQLExpr::IsTrue(expr) => Ok(Expr::IsTrue(Box::new( | ||
self.sql_expr_to_logical_expr(*expr, schema, ctes)?, | ||
))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might be able to avoid having to extend Expr
and add a PhysicalExpr
at all if you rewrote IS TRUE
in the sql planner to IS NOT DISTINCT FROM
Something like
SQLExpr::IsTrue(expr) => Ok(Expr::IsTrue(Box::new( | |
self.sql_expr_to_logical_expr(*expr, schema, ctes)?, | |
))), | |
SQLExpr::IsTrue(expr) => Ok(Expr::BinaryExpr { | |
left: Box::new(self.sql_expr_to_logical_expr(*expr, schema, ctes)?), | |
op: Operator::IsNotDistinctFrom, | |
right: Box::new(lit(true)), | |
}), |
That way a query like SELECT x IS TRUE
becomes SELECT x IS NOT DISTINCT FROM TRUE
Here is an example from postgres showing they are equivalent:
alamb=# select column1 is true, column1 is not distinct from true from (values (true), (false), (null)) as sq;
?column? | ?column?
----------+----------
t | t
f | f
f | f
(3 rows)
The same type of transformation applies to IS FALSE
Then this PR would likely be 8 lines of code and then the sql level tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would be to keep the logical plan as close to the original query as possible. Different query engines will have different approaches for how to implement this in the physical plan. I also think that it is better UX for the user to see IsTrue/False in the logical plan if that's what their query contained.
I think it is a good idea for DataFusion to map the logical IsTrue /False to the physical expression IsDistinctFrom and that removes the need for the new physical expressions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think that makes sense to me. The only thing I was wondering was how this would handle non-boolean/non-null inputs, since we should error for all other datatypes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is related to the work I am starting in #3222. I think we need to do some type validation here so that if we see <expr> IS TRUE
where expr
is not boolean, then we either add a cast or throw an error if it cannot be cast to boolean.
BTW @sarahyurick -- thank you for sticking with this. It is great to see the functionality coming along so nicely 👌 |
Which issue does this PR close?
Closes #3159
What changes are included in this PR?
So far, I've added some basic changes so that IS TRUE works on a scalar (
ColumnarValue::Scalar(scalar)
), likeSELECT TRUE IS TRUE as t
,SELECT FALSE IS TRUE as t
,SELECT 2 IS TRUE as t
, etc.Mainly, I am looking for feedback on how to handle the
ColumnarValue::Array(array)
case.Please also let me know if there are any major issues with this implementation so far. Originally, @alamb suggested using
IsNotDistinctFrom
, but I was having trouble figuring out how that was supposed to work. Since we can already parseIS TRUE
the initial error (without the changes from this PR) isError: NotImplemented("Unsupported ast node IsTrue(Identifier(Ident { value: \"b\", quote_style: None })) in sqltorel")
. Whenever I would try to linkIsTrue
toIsNotDistinctFrom
, I would run into trouble becauseIsTrue
looks like<expr> IS TRUE
whileIsNotDistinctFrom
looks like<expr> IS NOT DISTINCT FROM <expr>
. Maybe I'm missing something obvious, but the way I decided to go about this implementation was to mimic the implementations for IsNull and IsNotNull, since they don't have a RHS expression either.Thanks in advance!
Update: IS TRUE should be fully functional.