-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TableProvider.statistics method #3986
Conversation
@isidentical @Dandandan @alamb PTAL when you can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andygrove
cc @isidentical
@@ -77,6 +78,11 @@ pub trait TableProvider: Sync + Send { | |||
) -> Result<TableProviderFilterPushDown> { | |||
Ok(TableProviderFilterPushDown::Unsupported) | |||
} | |||
|
|||
/// Get statistics for this table, if available | |||
fn statistics(&self) -> Option<Statistics> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if any of the built in providers can give statistic (I am thinking Parquet) 🤔
Definitely something for a follow on PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Filed as #3988
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great as well.
Does it mean that we already decide to move the stats back to the LogicalPlan ? |
It seems this change would make it more hybrid. I would like to continue discussing in the Google doc what is the best way forward. |
No. This change is independent of any existing cost based optimizations. |
Benchmark runs are scheduled for baseline = 0678093 and contender = 4ea970d. 4ea970d is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #3983
Rationale for this change
I would like the ability for table providers to provide statistics that can be used in logical plan optimizations, such as join reordering.
What changes are included in this PR?
Add new TableProvider.statistics method
Are there any user-facing changes?
No. The method has a default implementation.