Changelog

53.0.0 (2024-08-31)

Full Changelog

Breaking changes:

parquet_derive: Match fields by name, support reading selected fields rather than all #6269 (double-free)
Update parquet object_store dependency to 0.11.0 #6264 [parquet] (alamb)
parquet Statistics - deprecate has_* APIs and add _opt functions that return Option<T> #6216 [parquet] (Michael-J-Ward)
Expose bulk ingest in flight sql client and server #6201 [arrow] [arrow-flight] (djanderson)
Upgrade protobuf definitions to flightsql 17.0 (#6133) #6169 [arrow-flight] (alamb)
Remove automatic buffering in ipc::reader::FileReader for for consistent buffering #6132 [arrow] (V0ldek)
No longer write Parquet column metadata after column chunks *and* in the footer #6117 [parquet] (etseidl)

Implemented enhancements:

Derive PartialEq and Eq for parquet::arrow::ProjectionMask #6329 [parquet]
Allow converting empty pyarrow.RecordBatch to arrow::RecordBatch #6318 [arrow]
Parquet writer should not write any min/max data to ColumnIndex when all values are null #6315 [parquet]
Parquet: Add union method to RowSelection #6307 [parquet]
Support writing UTC adjusted time arrow array to parquet #6277 [parquet]
A better way to resize the buffer for the snappy encode/decode #6276 [parquet]
parquet_derive: support reading selected columns from parquet file #6268
Tests for invalid parquet files #6261 [parquet]
Implement date_part for Duration #6245 [arrow]
Avoid unnecessary null buffer construction when converting arrays to a different type #6243 [parquet] [arrow]
Add parquet_opendal in related projects #6235
Look into optimizing reading FixedSizeBinary arrays from parquet #6219 [parquet] [arrow]
Add benchmarks for BYTE_STREAM_SPLIT encoded Parquet FIXED_LEN_BYTE_ARRAY data #6203 [parquet]
Make it easy to write parquet to object_store -- Implement AsyncFileWriter for a type that implements obj_store::MultipartUpload for AsyncArrowWriter #6200 [parquet]
Remove test duplication in parquet statistics tets #6185 [parquet]
Support BinaryView Types in C Schema FFI #6170 [arrow]
speedup take_byte_view kernel #6167 [arrow]
Add support for StringView and BinaryView statistics in StatisticsConverter #6164 [parquet]
Support casting BinaryView --> Utf8 and LargeUtf8 #6162 [arrow]
Implement filter kernel specially for FixedSizeByteArray #6153 [arrow]
Use LevelHistogram throughout Parquet metadata #6134 [parquet]
Support DoPutStatementIngest from Arrow Flight SQL 17.0 #6124 [arrow] [arrow-flight]
ColumnMetaData should no longer be written inline with data #6115 [parquet]
Implement date_part for Interval #6113 [arrow]
Implement Into<Arc<dyn Array>> for ArrayData #6104
Allow flushing or non-buffered writes from arrow::ipc::writer::StreamWriter #6099 [arrow]
Default block_size for StringViewArray #6094 [arrow]
Remove Statistics::has_min_max_set and ValueStatistics::has_min_max_set and use Option instead #6093 [parquet]
Upgrade arrow-flight to tonic 0.12 #6072
Improve speed of row converter by skipping utf8 checks #6058 [arrow]
Extend support for BYTE_STREAM_SPLIT to FIXED_LEN_BYTE_ARRAY, INT32, and INT64 primitive types #6048 [parquet]
Release arrow-rs / parquet minor version 52.2.0 (August 2024) #5998 [parquet] [arrow]

Fixed bugs:

Invalid ColumnIndex written in parquet #6310 [parquet]
comparison_kernels benchmarks panic #6283 [arrow]
Printing schema metadata includes possibly incorrect compression level #6270 [parquet]
Don't panic when creating Field from FFI_ArrowSchema with no name #6251 [arrow]
lexsort_to_indices should not fallback to non-lexical sort if the datatype is not supported #6226 [arrow]
Parquet Statistics null_count does not distinguish between 0 and not specified #6215 [parquet]
Using a take kernel on a dense union can result in reaching "unreachable" code #6206 [arrow]
Adding sub day seconds to Date64 is ignored. #6198 [arrow]
mismatch between parquet type is_optional codes and comment #6191 [parquet]

Documentation updates:

Minor: improve filter documentation #6317 [arrow] (alamb)
Minor: Improve comments on GenericByteViewArray::bytes_iter(), prefix_iter() and suffix_iter() #6306 [arrow] (alamb)
Minor: improve RowFilter and ArrowPredicate docs #6301 [parquet] (alamb)
Improve documentation for MutableArrayData #6272 [arrow] (alamb)
Add examples to StringViewBuilder and BinaryViewBuilder #6240 [arrow] (alamb)
minor: enhance document for ParquetField #6239 [parquet] (mapleFU)
Minor: Improve Type documentation #6224 [arrow] (alamb)
Minor: Update DateType::Date64 docs #6223 [arrow] (alamb)
Add (more) Parquet Metadata Documentation #6184 [parquet] (alamb)
Add additional documentation and examples to ArrayAccessor #6141 [arrow] (alamb)
Minor: improve comments in temporal.rs tests #6140 [arrow] (alamb)
Minor: Update release schedule in README #6125 (alamb)

Closed issues:

Simplify take octokit workflow #6279
Make the bearer token visible in FlightSqlServiceClient #6253 [arrow] [arrow-flight]
Port take workflow to use oktokit #6242
Remove SchemaBuilder dependency from StructArray constructors #6138 [arrow]

Merged pull requests:

Derive PartialEq and Eq for parquet::arrow::ProjectionMask #6330 [parquet] (thinkharderdev)
Support zero column RecordBatches in pyarrow integration (use RecordBatchOptions when converting a pyarrow RecordBatch) #6320 [arrow] (Michael-J-Ward)
Fix writing of invalid Parquet ColumnIndex when row group contains null pages #6319 [parquet] (adriangb)
Pass empty vectors as min/max for all null pages when building ColumnIndex #6316 [parquet] (etseidl)
Update tonic-build requirement from =0.12.0 to =0.12.2 #6314 [arrow] [arrow-flight] (dependabot[bot])
Parquet: add union method to RowSelection #6308 [parquet] (sdd)
Specialize filter for structs and sparse unions #6304 [arrow] (gstvg)
Err on try_from_le_slice #6295 [parquet] (samuelcolvin)
fix reference in doctest to size_of which is not imported by default #6286 [arrow] (rtyler)
Support writing UTC adjusted time arrays to parquet #6278 [parquet] (aykut-bozkurt)
Minor: pub use ByteView in arrow and improve documentation #6275 [arrow] (alamb)
Fix accessing name from ffi schema #6273 [arrow] (kylebarron)
Do not print compression level in schema printer #6271 [parquet] (ttencate)
ci: use octokit to add assignee #6267 (dsgibbons)
Add tests for bad parquet files #6262 [parquet] (alamb)
Add Statistics::distinct_count_opt and deprecate Statistics::distinct_count #6259 [parquet] (alamb)
Minor: move FallibleRequestStream and FallibleTonicResponseStream to a module #6258 [arrow] [arrow-flight] (alamb)
Make the bearer token visible in FlightSqlServiceClient #6254 [arrow] [arrow-flight] (ccciudatu)
Use unary() for array conversion in Parquet array readers, speed up Decimal128, Decimal256 and Float16 #6252 [parquet] [arrow] (etseidl)
Update tower requirement from 0.4.13 to 0.5.0 #6250 [arrow] [arrow-flight] (dependabot[bot])
Implement date_part for durations #6246 [arrow] (nrc)
Remove unnecessary null buffer construction when converting arrays to a different type #6244 [parquet] [arrow] (etseidl)
Implement PartialEq for GenericByteViewArray #6241 [arrow] (alamb)
Minor: Remove non standard footer from LICENSE.txt / reference to Apache Aurora #6237 (alamb)
docs: Add parquet_opendal in related projects #6236 (Xuanwo)
Avoid infinite loop in bad parquet by checking the number of rep levels #6232 [parquet] (jp0317)
Specialize Prefix/Suffix Match for Like/ILike between Array and Scalar for StringViewArray #6231 [arrow] (xinlifoobar)
fix: lexsort_to_indices should not fallback to non-lexical sort if the datatype is not supported #6225 [arrow] (viirya)
Modest improvement to FixedLenByteArray BYTE_STREAM_SPLIT arrow decoder #6222 [parquet] (etseidl)
Improve performance of FixedLengthBinary decoding #6220 [parquet] (etseidl)
Update documentation for Parquet BYTE_STREAM_SPLIT encoding #6212 [parquet] (etseidl)
Improve interval parsing #6211 [arrow] (samuelcolvin)
minor: Suggest take on interleave docs #6210 [arrow] (gstvg)
fix: Correctly handle take on dense union of a single selected type #6209 [arrow] (gstvg)
Add time dictionary coercions #6208 [arrow] (adriangb)
fix(arrow): restrict the range of temporal values produced via data_gen #6205 [arrow] (kyle-mccarthy)
Add benchmarks for BYTE_STREAM_SPLIT encoded Parquet FIXED_LEN_BYTE_ARRAY data #6204 [parquet] (etseidl)
Move ParquetMetadataWriter to its own module, update documentation #6202 [parquet] (alamb)
Add ThriftMetadataWriter for writing Parquet metadata #6197 [parquet] (adriangb)
Update zstd-sys requirement from >=2.0.0, <2.0.13 to >=2.0.0, <2.0.14 #6196 [parquet] (dependabot[bot])
fix parquet type is_optional comments #6192 [parquet] (jp0317)
Remove duplicated statistics tests in parquet #6190 [parquet] (Kev1n8)
Benchmarks for bool_and #6189 [arrow] (simonvandel)
Fix typo in documentation of Float64Array #6188 [arrow] (mesejo)
Make it clear that StatisticsConverter can not panic #6187 [parquet] (alamb)
add filter benchmark for FixedSizeBinaryArray #6186 [arrow] (chloro-pn)
Update sysinfo requirement from 0.30.12 to 0.31.2 #6182 [parquet] (dependabot[bot])
Add support for StringView and BinaryView statistics in StatisticsConverter #6181 [parquet] (Kev1n8)
Support casting between BinaryView <--> Utf8 and LargeUtf8 #6180 [arrow] (xinlifoobar)
Implement specialized filter kernel for FixedSizeByteArray #6178 [arrow] (chloro-pn)
Support StringView and BinaryView in CDataInterface #6171 [arrow] (a10y)
Optimize take kernel for BinaryViewArray and StringViewArray #6168 [arrow] (a10y)
Support Parquet BYTE_STREAM_SPLIT for INT32, INT64, and FIXED_LEN_BYTE_ARRAY primitive types #6159 [parquet] (etseidl)
Fix comparison kernel benchmarks #6147 [arrow] (samuelcolvin)
improve LIKE regex performance up to 12x #6145 [arrow] (samuelcolvin)
Optimize min_boolean and bool_and #6144 [arrow] (simonvandel)
Reduce bounds check in RowIter, add unsafe Rows::row_unchecked #6142 [arrow] (XiangpengHao)
Minor: Simplify StructArray constructors #6139 [arrow] (Rafferty97)
Implement exponential block size growing strategy for StringViewBuilder #6136 [arrow] (XiangpengHao)
Use LevelHistogram in PageIndex #6135 [parquet] (etseidl)
Add ArrowError::ArithmeticError #6130 [arrow] (andygrove)
Improve LIKE performance for "contains" style queries #6128 [arrow] (samuelcolvin)
Add BooleanArray::new_from_packed and BooleanArray::new_from_u8 #6127 [arrow] (chloro-pn)
improvements to (i)starts_with and (i)ends_with performance #6118 [arrow] (samuelcolvin)
Fix Clippy for the Rust 1.80 release #6116 [parquet] [arrow] [arrow-flight] (alamb)
added a flush method to IPC writers #6108 [arrow] (V0ldek)
Add support for level histograms added in PARQUET-2261 to ParquetMetaData #6105 [parquet] (etseidl)
Implement date_part for intervals #6071 [arrow] (nrc)
feat(parquet): Implement AsyncFileWriter for object_store::buffered::BufWriter #6013 [parquet] (Xuanwo)

* This Changelog was automatically generated by github_changelog_generator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

Changelog

53.0.0 (2024-08-31)

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

53.0.0 (2024-08-31)