-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
return absent stats when filters are pushed down #12471
return absent stats when filters are pushed down #12471
Conversation
90cd6b5
to
20c74b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @waruto210 -- I think this is a great start
I left some comments -- let us know what you think
// When filters are pushed down, we have no way of knowing the exact statistics. | ||
// Note that pruning predicate is also a kind of filter pushdown. | ||
let stats = if self.pruning_predicate.is_some() | ||
|| self.page_pruning_predicate.is_some() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think bloom filters should also belong in this list 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bloom filters are used by pruning_predicate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, maybe we can update the comments to help future readers. I left a suggestion
return absent stats when filters are pushed down
20c74b9
to
a7606b4
Compare
@alamb PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @waruto210 this looks good to me now
I do think it is worth considering more explicit test coverage for partition column pushdown, but I don't think it is required.
Thanks again for the contribution
// When filters are pushed down, we have no way of knowing the exact statistics. | ||
// Note that pruning predicate is also a kind of filter pushdown. | ||
let stats = if self.pruning_predicate.is_some() | ||
|| self.page_pruning_predicate.is_some() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, maybe we can update the comments to help future readers. I left a suggestion
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Thank you again @waruto210 |
* do not pushdown filters that can be resolved only using partition cols return absent stats when filters are pushed down * fix and add test * Update datafusion/core/src/datasource/physical_plan/parquet/mod.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * add test for partition pruning filters --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
Closes #12416.
Rationale for this change
Fix the bug mentioned in #12416.
What changes are included in this PR?
Unlike what was mentioned in #12416, I chose to return absent stats because it's hard to know the selectivity of the filters, and also, for filters that can be resolved using only partition cols, there's no need to pushdown them to the TableScanExec, which would otherwise produce useless
unhandled
pruning predicate.Are these changes tested?
Yes
Are there any user-facing changes?