Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Interval analysis for OR expressions #7884

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions datafusion/physical-expr/src/expressions/binary.rs
Original file line number Diff line number Diff line change
Expand Up @@ -344,8 +344,8 @@ impl PhysicalExpr for BinaryExpr {
let right_interval = children[1];

let (left, right) = if self.op.is_logic_operator() {
// TODO: Currently, this implementation only supports the AND operator
// and does not require any further propagation. In the future,
// TODO: Currently, this implementation only supports the AND/OR operator
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate any suggestions if any changes are needed here in propagate_constraints as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly changing the API here. It will be more understandable and easy to add new supports. Since the current version is written by overfitting the AND operator, adding OR support may be a challenge. Undoubtedly, it will be easier to add it after my refactoring PR arrives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good -- I don't fully understand what propagate constraints is used for. I think postponing any extra work until your refactoring is complete sounds like a good idea to me

// and does not require any further propagation (??). In the future,
alamb marked this conversation as resolved.
Show resolved Hide resolved
// upon adding support for additional logical operators, this
// method will require modification to support propagating the
// changes accordingly.
Expand Down
67 changes: 67 additions & 0 deletions datafusion/physical-expr/src/intervals/interval_aritmetic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -432,6 +432,27 @@ impl Interval {
}
}

/// Compute the logical disjunction of this (boolean) interval with the given boolean interval.
pub(crate) fn or<T: Borrow<Interval>>(&self, other: T) -> Result<Interval> {
let rhs = other.borrow();
if self.get_datatype()? != DataType::Boolean
&& rhs.get_datatype()? != DataType::Boolean
alamb marked this conversation as resolved.
Show resolved Hide resolved
{
return internal_err!(
"Incompatible types for logical disjunction: {self:?}, {rhs:?}"
);
}

// if either bound is always true, the result is always true.
if self.is_certainly_true() || rhs.is_certainly_true() {
Ok(Interval::CERTAINLY_TRUE)
} else if self.is_certainly_false() && rhs.is_certainly_false() {
Ok(Interval::CERTAINLY_FALSE)
} else {
Ok(Interval::UNCERTAIN)
}
}

/// Compute the logical negation of this (boolean) interval.
pub(crate) fn not(&self) -> Result<Self> {
if !matches!(self.get_datatype()?, DataType::Boolean) {
Expand Down Expand Up @@ -519,6 +540,20 @@ impl Interval {
))
}

/// return true if this is a Boolean interval that is known to be `true`
pub fn is_certainly_true(&self) -> bool {
self == &Self::CERTAINLY_TRUE
}

/// return true if this is a Boolean interval that may be either `true` or `false`
pub fn is_uncertain(&self) -> bool {
self == &Self::UNCERTAIN
}

/// return true if this is a Boolean interval that is known to be `false`
pub fn is_certainly_false(&self) -> bool {
self == &Self::CERTAINLY_FALSE
}
pub const CERTAINLY_FALSE: Interval = Interval {
lower: IntervalBound::new_closed(ScalarValue::Boolean(Some(false))),
upper: IntervalBound::new_closed(ScalarValue::Boolean(Some(false))),
Expand Down Expand Up @@ -738,6 +773,7 @@ pub fn apply_operator(op: &Operator, lhs: &Interval, rhs: &Interval) -> Result<I
Operator::Lt => Ok(lhs.lt(rhs)),
Operator::LtEq => Ok(lhs.lt_eq(rhs)),
Operator::And => lhs.and(rhs),
Operator::Or => lhs.or(rhs),
Operator::Plus => lhs.add(rhs),
Operator::Minus => lhs.sub(rhs),
_ => Ok(Interval::default()),
Expand Down Expand Up @@ -1221,6 +1257,8 @@ mod tests {

#[test]
fn and_test() -> Result<()> {
// check that (lhs AND rhs) is equal to result
// (lhs_lower, lhs_upper, rhs_lower, rhs_upper, lower_result, upper_result)
let cases = vec![
(false, true, false, false, false, false),
(false, false, false, true, false, false),
Expand All @@ -1240,6 +1278,35 @@ mod tests {
Ok(())
}

#[test]
fn or_test() -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was writing this test I found there appears to be something wrong with boolean interval handling.

Specifically, open_open(.., ..) always creates the interval Interval { lower: IntervalBound { value: Boolean(true), open: false }, upper: IntervalBound { value: Boolean(false), open: false } } regardless of the arguments:

I tracked it down to the code in https://github.com/apache/arrow-datafusion/blob/4591d82ccba6f52db535a377486362a35bfd84fb/datafusion/physical-expr/src/intervals/interval_aritmetic.rs#L250-L249

And I will file a follow on ticket about this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you create an interval with an open bound, if it is a lower bound, it can be (false or (true. (false can be represented as [true. For an upper bound case, true) rounds down to false]. That's why this result comes in that way.

But of course, you are right. We should not give the chance to create an invalid interval. Especially for boolean intervals, open_open case always gives an invalid interval, meaning that it needs to return an error regardless of the interval values (false or true). I will have solved all of these problems with the PR I am currently working on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have solved all of these problems with the PR I am currently working on.

The PR you are working on sounds great 🎉 . If it is possible, can you possibly break it up into smaller parts (e.g perhaps we could break the "make Interval fields not pub in one PR, and rework how propagate_constraints works a a follow on PR?)

// check that (lhs OR rhs) is equal to result
// ((lhs_lower, lhs_upper), (rhs_lower, rhs_upper), (lower_result, upper_result))
let cases = vec![
// Note there are only three valid boolean intervals: [false, false], [false, true] and [true, true],
((false, false), (false, false), (false, false)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test might be easier to read if I rewrote it in terms of CERTAINLY_TRUE, UNCERTAIN, and CERTAINLY_FALSE`. I'll try and do that next week

((false, false), (false, true), (false, true)),
((false, false), (true, true), (true, true)),
((false, true), (false, false), (false, true)),
((false, true), (false, true), (false, true)),
((false, true), (true, true), (true, true)),
((true, true), (false, false), (true, true)),
((true, true), (false, true), (true, true)),
((true, true), (true, true), (true, true)),
];

for case in cases {
let (lhs, rhs, expected) = case;
println!("case: {:?}", case);
let lhs = closed_closed(Some(lhs.0), Some(lhs.1));
let rhs = closed_closed(Some(rhs.0), Some(rhs.1));
let expected = closed_closed(Some(expected.0), Some(expected.1));
println!("lhs: {:?}, rhs: {:?}, expected: {:?}", lhs, rhs, expected);
assert_eq!(lhs.or(rhs)?, expected,);
}
Ok(())
}

#[test]
fn add_test() -> Result<()> {
let cases = vec![
Expand Down