-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
There are at least three places in DataFusion where multiple Statistics
objects are aggregated together, and they do so inconsistently:
get_statistics_with_limit
: https://github.com/apache/arrow-datafusion/blob/e54894c39202815b14d9e7eae58f64d3a269c165/datafusion/core/src/datasource/statistics.rs#L34-L33
2 . Parquet::infer_stats: https://github.com/apache/arrow-datafusion/blob/a892300a5a56c97b5b4ddc9aa4a421aaf412d0fe/datafusion/core/src/datasource/file_format/parquet.rs#L503-L581- Union::statistics: https://github.com/apache/arrow-datafusion/blob/c2e768052c43e4bab6705ee76befc19de383c2cb/datafusion/physical-plan/src/union.rs#L612-L611
(and we actually have another version of this in IOx)
Describe the solution you'd like
I would like to consolidate the three implementations into a StatisticsAggregator
that knows how to aggregate multiple Statistics
objects that is both documented and well tested.
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request