Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ColumnChunkMetadataBuilder clear APIs #6523

Merged
merged 1 commit into from
Oct 8, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 40 additions & 3 deletions parquet/src/file/metadata/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1205,15 +1205,34 @@ impl ColumnChunkMetaData {

/// Converts this [`ColumnChunkMetaData`] into a [`ColumnChunkMetaDataBuilder`]
pub fn into_builder(self) -> ColumnChunkMetaDataBuilder {
ColumnChunkMetaDataBuilder(self)
ColumnChunkMetaDataBuilder::from(self)
}
}

/// Builder for column chunk metadata.
/// Builder for [`ColumnChunkMetaData`]
///
/// This builder is used to create a new column chunk metadata or modify an
/// existing one.
///
/// # Example
/// ```no_run
/// # use parquet::file::metadata::{ColumnChunkMetaData, ColumnChunkMetaDataBuilder};
/// # fn get_column_chunk_metadata() -> ColumnChunkMetaData { unimplemented!(); }
/// let column_chunk_metadata = get_column_chunk_metadata();
/// // create a new builder from existing column chunk metadata
/// let builder = ColumnChunkMetaDataBuilder::from(column_chunk_metadata);
/// // clear the statistics:
/// let column_chunk_metadata: ColumnChunkMetaData = builder
/// .clear_statistics()
/// .build()
/// .unwrap();
/// ```
pub struct ColumnChunkMetaDataBuilder(ColumnChunkMetaData);

impl ColumnChunkMetaDataBuilder {
/// Creates new column chunk metadata builder.
///
/// See also [`ColumnChunkMetaData::builder`]
fn new(column_descr: ColumnDescPtr) -> Self {
Self(ColumnChunkMetaData {
column_descr,
Expand Down Expand Up @@ -1297,7 +1316,7 @@ impl ColumnChunkMetaDataBuilder {
self
}

/// Sets optional dictionary page ofset in bytes.
/// Sets optional dictionary page offset in bytes.
pub fn set_dictionary_page_offset(mut self, value: Option<i64>) -> Self {
self.0.dictionary_page_offset = value;
self
Expand All @@ -1315,12 +1334,24 @@ impl ColumnChunkMetaDataBuilder {
self
}

/// Clears the statistics for this column chunk.
pub fn clear_statistics(mut self) -> Self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, set_statistics takes value: Statistics not value: Option<Statistics> and thus there is no way to make this change without breaking the API which we can't do until the next breaking release

self.0.statistics = None;
self
}

/// Sets page encoding stats for this column chunk.
pub fn set_page_encoding_stats(mut self, value: Vec<PageEncodingStats>) -> Self {
self.0.encoding_stats = Some(value);
self
}

/// Clears the page encoding stats for this column chunk.
pub fn clear_page_encoding_stats(mut self) -> Self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same reasoning applies to this API

self.0.encoding_stats = None;
self
}

/// Sets optional bloom filter offset in bytes.
pub fn set_bloom_filter_offset(mut self, value: Option<i64>) -> Self {
self.0.bloom_filter_offset = value;
Expand Down Expand Up @@ -1492,6 +1523,12 @@ impl ColumnIndexBuilder {
}
}

impl From<ColumnChunkMetaData> for ColumnChunkMetaDataBuilder {
fn from(value: ColumnChunkMetaData) -> Self {
ColumnChunkMetaDataBuilder(value)
}
}

/// Builder for offset index, part of the Parquet [PageIndex].
///
/// [PageIndex]: https://github.com/apache/parquet-format/blob/master/PageIndex.md
Expand Down
Loading