-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Minor: update documentation for PrunableStatistics #16213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -129,6 +129,7 @@ pub trait PruningStatistics { | |
} | ||
|
||
/// Prune files based on their partition values. | ||
/// | ||
/// This is used both at planning time and execution time to prune | ||
/// files based on their partition values. | ||
/// This feeds into [`CompositePruningStatistics`] to allow pruning | ||
|
@@ -137,19 +138,21 @@ pub trait PruningStatistics { | |
#[derive(Clone)] | ||
pub struct PartitionPruningStatistics { | ||
/// Values for each column for each container. | ||
/// The outer vectors represent the columns while the inner | ||
/// vectors represent the containers. | ||
/// The order must match the order of the partition columns in | ||
/// [`PartitionPruningStatistics::partition_schema`]. | ||
/// | ||
/// The outer vectors represent the columns while the inner vectors | ||
/// represent the containers. The order must match the order of the | ||
/// partition columns in [`PartitionPruningStatistics::partition_schema`]. | ||
partition_values: Vec<ArrayRef>, | ||
/// The number of containers. | ||
/// | ||
/// Stored since the partition values are column-major and if | ||
/// there are no columns we wouldn't know the number of containers. | ||
num_containers: usize, | ||
/// The schema of the partition columns. | ||
/// This must **not** be the schema of the entire file or table: | ||
/// it must only be the schema of the partition columns, | ||
/// in the same order as the values in [`PartitionPruningStatistics::partition_values`]. | ||
/// | ||
/// This must **not** be the schema of the entire file or table: it must | ||
/// only be the schema of the partition columns, in the same order as the | ||
/// values in [`PartitionPruningStatistics::partition_values`]. | ||
partition_schema: SchemaRef, | ||
} | ||
|
||
|
@@ -258,7 +261,16 @@ impl PruningStatistics for PartitionPruningStatistics { | |
} | ||
|
||
/// Prune a set of containers represented by their statistics. | ||
/// Each [`Statistics`] represents a container (e.g. a file or a partition of files). | ||
/// | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the real change of the PR -- to reword this description to try and make it easier to understand |
||
/// Each [`Statistics`] represents a "container" -- some collection of data | ||
/// that has statistics of its columns. | ||
/// | ||
/// It is up to the caller to decide what each container represents. For | ||
/// example, they can come from a file (e.g. [`PartitionedFile`]) or a set of of | ||
/// files (e.g. [`FileGroup`]) | ||
/// | ||
/// [`PartitionedFile`]: https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.PartitionedFile.html | ||
/// [`FileGroup`]: https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/struct.FileGroup.html | ||
Comment on lines
+268
to
+273
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! |
||
#[derive(Clone)] | ||
pub struct PrunableStatistics { | ||
/// Statistics for each container. | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a driveby cleanup -- to add some whitespace for my whitespace OCD