Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make AnalysisContext aware of empty sets to represent certainly false bounds #14279

Merged

Conversation

buraksenn
Copy link
Contributor

@buraksenn buraksenn commented Jan 24, 2025

Which issue does this PR close?

Closes #14226

Rationale for this change

Details from #14226:

The AnalysisContext which is the result of the analyze method for refining column boundaries from a physical expression represents an empty set the same as an unbounded set.

For example, in the case where the bounds can not be shrunk, e.g., a < 0 OR a >= 0, this results an interval of [None, None], but means [-∞, ∞], i.e., CERTAINLY_TRUE. In the case where the bounds represent an empty set, e.g., a < 0 AND a > 0, this also results in an interval of [None, None], but should mean CERTAINLY_FALSE.

What changes are included in this PR?

  • Change AnalysisContext boundaries to be Option to represent empty set

Are these changes tested?

Added unit tests. Also existing tests is to be passed before merging this one

Are there any user-facing changes?

I think this breaks public API but I could not find any other way to do this without it.

@github-actions github-actions bot added the physical-expr Physical Expressions label Jan 24, 2025
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking this issue @buraksenn - left some feedback for minor improvements

buraksenn and others added 3 commits January 25, 2025 20:53
Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
if let Some(interval) = &bound.interval {
target_indices_and_boundaries.push((*index, interval.clone()));
} else {
return Err(internal_datafusion_err!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, if someone calls analyze() with infeasible (empty) columns, then we are returning error here. I think that's not people expect. We should continue with the None's instead. I'm doing a fix for that now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember infeasible sets are not supported in cp_solver currently. Let's keep it as error for now

@berkaysynnada
Copy link
Contributor

Thank you @buraksenn. I've just sent a minor commit. If you've done with this PR, I am merging it once the CI is green again.

Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now. I'm merging it, and if someone comes up with another idea, then we can create a ticket.

@berkaysynnada berkaysynnada merged commit 71996fb into apache:main Jan 28, 2025
25 checks passed
@berkaysynnada berkaysynnada deleted the represent-empty-set-in-analysis-context branch January 28, 2025 14:38
@andygrove andygrove added the api change Changes the API exposed to users of the crate label Jan 31, 2025
/// For example, if the column `a` has values in the range [10, 20],
/// and there is a filter asserting that `a > 50`, then the resulting interval
/// range of `a` will be `None`.
pub interval: Option<Interval>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the api-change label due to making interval an Option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Variant on AnalysisContext to represent empty-set
4 participants