-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Interval analysis for OR
expressions
#7884
Changes from all commits
608ce37
4591d82
b7e66a3
765b668
d846ac3
072588a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -432,6 +432,27 @@ impl Interval { | |
} | ||
} | ||
|
||
/// Compute the logical disjunction of this (boolean) interval with the given boolean interval. | ||
pub(crate) fn or<T: Borrow<Interval>>(&self, other: T) -> Result<Interval> { | ||
let rhs = other.borrow(); | ||
if self.get_datatype()? != DataType::Boolean | ||
|| rhs.get_datatype()? != DataType::Boolean | ||
{ | ||
return internal_err!( | ||
"Incompatible types for logical disjunction: {self:?}, {rhs:?}" | ||
); | ||
} | ||
|
||
// if either bound is always true, the result is always true. | ||
if self.is_certainly_true() || rhs.is_certainly_true() { | ||
Ok(Interval::CERTAINLY_TRUE) | ||
} else if self.is_certainly_false() && rhs.is_certainly_false() { | ||
Ok(Interval::CERTAINLY_FALSE) | ||
} else { | ||
Ok(Interval::UNCERTAIN) | ||
} | ||
} | ||
|
||
/// Compute the logical negation of this (boolean) interval. | ||
pub(crate) fn not(&self) -> Result<Self> { | ||
if !matches!(self.get_datatype()?, DataType::Boolean) { | ||
|
@@ -519,6 +540,20 @@ impl Interval { | |
)) | ||
} | ||
|
||
/// return true if this is a Boolean interval that is known to be `true` | ||
pub fn is_certainly_true(&self) -> bool { | ||
self == &Self::CERTAINLY_TRUE | ||
} | ||
|
||
/// return true if this is a Boolean interval that may be either `true` or `false` | ||
pub fn is_uncertain(&self) -> bool { | ||
self == &Self::UNCERTAIN | ||
} | ||
|
||
/// return true if this is a Boolean interval that is known to be `false` | ||
pub fn is_certainly_false(&self) -> bool { | ||
self == &Self::CERTAINLY_FALSE | ||
} | ||
pub const CERTAINLY_FALSE: Interval = Interval { | ||
lower: IntervalBound::new_closed(ScalarValue::Boolean(Some(false))), | ||
upper: IntervalBound::new_closed(ScalarValue::Boolean(Some(false))), | ||
|
@@ -738,6 +773,7 @@ pub fn apply_operator(op: &Operator, lhs: &Interval, rhs: &Interval) -> Result<I | |
Operator::Lt => Ok(lhs.lt(rhs)), | ||
Operator::LtEq => Ok(lhs.lt_eq(rhs)), | ||
Operator::And => lhs.and(rhs), | ||
Operator::Or => lhs.or(rhs), | ||
Operator::Plus => lhs.add(rhs), | ||
Operator::Minus => lhs.sub(rhs), | ||
_ => Ok(Interval::default()), | ||
|
@@ -1221,6 +1257,8 @@ mod tests { | |
|
||
#[test] | ||
fn and_test() -> Result<()> { | ||
// check that (lhs AND rhs) is equal to result | ||
// (lhs_lower, lhs_upper, rhs_lower, rhs_upper, lower_result, upper_result) | ||
let cases = vec![ | ||
(false, true, false, false, false, false), | ||
(false, false, false, true, false, false), | ||
|
@@ -1240,6 +1278,35 @@ mod tests { | |
Ok(()) | ||
} | ||
|
||
#[test] | ||
fn or_test() -> Result<()> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When I was writing this test I found there appears to be something wrong with boolean interval handling. Specifically, I tracked it down to the code in https://github.com/apache/arrow-datafusion/blob/4591d82ccba6f52db535a377486362a35bfd84fb/datafusion/physical-expr/src/intervals/interval_aritmetic.rs#L250-L249 And I will file a follow on ticket about this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When you create an interval with an open bound, if it is a lower bound, it can be But of course, you are right. We should not give the chance to create an invalid interval. Especially for boolean intervals, open_open case always gives an invalid interval, meaning that it needs to return an error regardless of the interval values (false or true). I will have solved all of these problems with the PR I am currently working on. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The PR you are working on sounds great 🎉 . If it is possible, can you possibly break it up into smaller parts (e.g perhaps we could break the "make |
||
// check that (lhs OR rhs) is equal to result | ||
// ((lhs_lower, lhs_upper), (rhs_lower, rhs_upper), (lower_result, upper_result)) | ||
let cases = vec![ | ||
// Note there are only three valid boolean intervals: [false, false], [false, true] and [true, true], | ||
((false, false), (false, false), (false, false)), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This test might be easier to read if I rewrote it in terms of |
||
((false, false), (false, true), (false, true)), | ||
((false, false), (true, true), (true, true)), | ||
((false, true), (false, false), (false, true)), | ||
((false, true), (false, true), (false, true)), | ||
((false, true), (true, true), (true, true)), | ||
((true, true), (false, false), (true, true)), | ||
((true, true), (false, true), (true, true)), | ||
((true, true), (true, true), (true, true)), | ||
]; | ||
|
||
for case in cases { | ||
let (lhs, rhs, expected) = case; | ||
println!("case: {:?}", case); | ||
let lhs = closed_closed(Some(lhs.0), Some(lhs.1)); | ||
let rhs = closed_closed(Some(rhs.0), Some(rhs.1)); | ||
let expected = closed_closed(Some(expected.0), Some(expected.1)); | ||
println!("lhs: {:?}, rhs: {:?}, expected: {:?}", lhs, rhs, expected); | ||
assert_eq!(lhs.or(rhs)?, expected,); | ||
} | ||
Ok(()) | ||
} | ||
|
||
#[test] | ||
fn add_test() -> Result<()> { | ||
let cases = vec![ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would appreciate any suggestions if any changes are needed here in propagate_constraints as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly changing the API here. It will be more understandable and easy to add new supports. Since the current version is written by overfitting the AND operator, adding OR support may be a challenge. Undoubtedly, it will be easier to add it after my refactoring PR arrives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good -- I don't fully understand what propagate constraints is used for. I think postponing any extra work until your refactoring is complete sounds like a good idea to me