Skip to content

Remove redundant statistics from FileScanConfig #14937

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

FileScanConfig has statistics (FileScanConfig::statistics) but so does file_source

/// Estimated overall statistics of the files, taking `filters` into account.
/// Defaults to [`Statistics::new_unknown`].
pub statistics: Statistics,

And

/// Return projected statistics
fn statistics(&self) -> datafusion_common::Result<Statistics>;

The fact there are two sets of statistics means

  1. there is a potential for bugs when they get out of sync such as was caused in bug: Physical plan round trip fails in some cases after datasource refactor #14679
  2. Planning takes that much longer

Describe the solution you'd like

It would be nice to remove the duplication so it is clear there is only a single statistics (held on the DataSource)

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions