Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Interval analysis for OR expressions #7884

Closed
wants to merge 6 commits into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Oct 20, 2023

Which issue does this PR close?

Closes #7883

Rationale for this change

See #7883

What changes are included in this PR?

  1. Add support for OR in interval analysis
  2. New tests

Are these changes tested?

Yes

Are there any user-facing changes?

Probably not

@github-actions github-actions bot added the physical-expr Physical Expressions label Oct 20, 2023
@@ -1240,6 +1270,32 @@ mod tests {
Ok(())
}

#[test]
fn or_test() -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was writing this test I found there appears to be something wrong with boolean interval handling.

Specifically, open_open(.., ..) always creates the interval Interval { lower: IntervalBound { value: Boolean(true), open: false }, upper: IntervalBound { value: Boolean(false), open: false } } regardless of the arguments:

I tracked it down to the code in https://github.com/apache/arrow-datafusion/blob/4591d82ccba6f52db535a377486362a35bfd84fb/datafusion/physical-expr/src/intervals/interval_aritmetic.rs#L250-L249

And I will file a follow on ticket about this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you create an interval with an open bound, if it is a lower bound, it can be (false or (true. (false can be represented as [true. For an upper bound case, true) rounds down to false]. That's why this result comes in that way.

But of course, you are right. We should not give the chance to create an invalid interval. Especially for boolean intervals, open_open case always gives an invalid interval, meaning that it needs to return an error regardless of the interval values (false or true). I will have solved all of these problems with the PR I am currently working on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have solved all of these problems with the PR I am currently working on.

The PR you are working on sounds great 🎉 . If it is possible, can you possibly break it up into smaller parts (e.g perhaps we could break the "make Interval fields not pub in one PR, and rework how propagate_constraints works a a follow on PR?)

@alamb alamb marked this pull request as draft October 20, 2023 15:18
@alamb alamb marked this pull request as ready for review October 20, 2023 15:53
@alamb alamb requested a review from metesynnada October 20, 2023 15:53
@@ -344,8 +344,8 @@ impl PhysicalExpr for BinaryExpr {
let right_interval = children[1];

let (left, right) = if self.op.is_logic_operator() {
// TODO: Currently, this implementation only supports the AND operator
// and does not require any further propagation. In the future,
// TODO: Currently, this implementation only supports the AND/OR operator
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate any suggestions if any changes are needed here in propagate_constraints as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly changing the API here. It will be more understandable and easy to add new supports. Since the current version is written by overfitting the AND operator, adding OR support may be a challenge. Undoubtedly, it will be easier to add it after my refactoring PR arrives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good -- I don't fully understand what propagate constraints is used for. I think postponing any extra work until your refactoring is complete sounds like a good idea to me

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>
// ((lhs_lower, lhs_upper), (rhs_lower, rhs_upper), (lower_result, upper_result))
let cases = vec![
// Note there are only three valid boolean intervals: [false, false], [false, true] and [true, true],
((false, false), (false, false), (false, false)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test might be easier to read if I rewrote it in terms of CERTAINLY_TRUE, UNCERTAIN, and CERTAINLY_FALSE`. I'll try and do that next week

@alamb
Copy link
Contributor Author

alamb commented Oct 23, 2023

Marking as a draft until @berkaysynnada 's refactor PR has landed

@alamb alamb marked this pull request as draft October 23, 2023 13:19
@alamb
Copy link
Contributor Author

alamb commented Nov 20, 2023

Superseded by #8276

@alamb alamb closed this Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for OR to interval arithmetic
3 participants