53.0.0 (2024-08-31)
Breaking changes:
- parquet_derive: Match fields by name, support reading selected fields rather than all #6269 (double-free)
- Update parquet object_store dependency to 0.11.0 #6264 [parquet] (alamb)
- parquet Statistics - deprecate
has_*
APIs and add_opt
functions that returnOption<T>
#6216 [parquet] (Michael-J-Ward) - Expose bulk ingest in flight sql client and server #6201 [arrow] [arrow-flight] (djanderson)
- Upgrade protobuf definitions to flightsql 17.0 (#6133) #6169 [arrow-flight] (alamb)
- Remove automatic buffering in
ipc::reader::FileReader
for for consistent buffering #6132 [arrow] (V0ldek) - No longer write Parquet column metadata after column chunks *and* in the footer #6117 [parquet] (etseidl)
Implemented enhancements:
- Derive
PartialEq
andEq
forparquet::arrow::ProjectionMask
#6329 [parquet] - Allow converting empty
pyarrow.RecordBatch
toarrow::RecordBatch
#6318 [arrow] - Parquet writer should not write any min/max data to ColumnIndex when all values are null #6315 [parquet]
- Parquet: Add
union
method toRowSelection
#6307 [parquet] - Support writing
UTC adjusted time
arrow array to parquet #6277 [parquet] - A better way to resize the buffer for the snappy encode/decode #6276 [parquet]
- parquet_derive: support reading selected columns from parquet file #6268
- Tests for invalid parquet files #6261 [parquet]
- Implement
date_part
forDuration
#6245 [arrow] - Avoid unnecessary null buffer construction when converting arrays to a different type #6243 [parquet] [arrow]
- Add
parquet_opendal
in related projects #6235 - Look into optimizing reading FixedSizeBinary arrays from parquet #6219 [parquet] [arrow]
- Add benchmarks for
BYTE_STREAM_SPLIT
encoded ParquetFIXED_LEN_BYTE_ARRAY
data #6203 [parquet] - Make it easy to write parquet to object_store -- Implement
AsyncFileWriter
for a type that implementsobj_store::MultipartUpload
forAsyncArrowWriter
#6200 [parquet] - Remove test duplication in parquet statistics tets #6185 [parquet]
- Support BinaryView Types in C Schema FFI #6170 [arrow]
- speedup take_byte_view kernel #6167 [arrow]
- Add support for
StringView
andBinaryView
statistics inStatisticsConverter
#6164 [parquet] - Support casting
BinaryView
-->Utf8
andLargeUtf8
#6162 [arrow] - Implement
filter
kernel specially forFixedSizeByteArray
#6153 [arrow] - Use
LevelHistogram
throughout Parquet metadata #6134 [parquet] - Support DoPutStatementIngest from Arrow Flight SQL 17.0 #6124 [arrow] [arrow-flight]
- ColumnMetaData should no longer be written inline with data #6115 [parquet]
- Implement date_part for
Interval
#6113 [arrow] - Implement
Into<Arc<dyn Array>>
forArrayData
#6104 - Allow flushing or non-buffered writes from
arrow::ipc::writer::StreamWriter
#6099 [arrow] - Default block_size for
StringViewArray
#6094 [arrow] - Remove
Statistics::has_min_max_set
andValueStatistics::has_min_max_set
and useOption
instead #6093 [parquet] - Upgrade arrow-flight to tonic 0.12 #6072
- Improve speed of row converter by skipping utf8 checks #6058 [arrow]
- Extend support for BYTE_STREAM_SPLIT to FIXED_LEN_BYTE_ARRAY, INT32, and INT64 primitive types #6048 [parquet]
- Release arrow-rs / parquet minor version
52.2.0
(August 2024) #5998 [parquet] [arrow]
Fixed bugs:
- Invalid
ColumnIndex
written in parquet #6310 [parquet] - comparison_kernels benchmarks panic #6283 [arrow]
- Printing schema metadata includes possibly incorrect compression level #6270 [parquet]
- Don't panic when creating
Field
fromFFI_ArrowSchema
with no name #6251 [arrow] - lexsort_to_indices should not fallback to non-lexical sort if the datatype is not supported #6226 [arrow]
- Parquet Statistics null_count does not distinguish between
0
and not specified #6215 [parquet] - Using a take kernel on a dense union can result in reaching "unreachable" code #6206 [arrow]
- Adding sub day seconds to Date64 is ignored. #6198 [arrow]
- mismatch between parquet type
is_optional
codes and comment #6191 [parquet]
Documentation updates:
- Minor: improve filter documentation #6317 [arrow] (alamb)
- Minor: Improve comments on GenericByteViewArray::bytes_iter(), prefix_iter() and suffix_iter() #6306 [arrow] (alamb)
- Minor: improve
RowFilter
andArrowPredicate
docs #6301 [parquet] (alamb) - Improve documentation for
MutableArrayData
#6272 [arrow] (alamb) - Add examples to
StringViewBuilder
andBinaryViewBuilder
#6240 [arrow] (alamb) - minor: enhance document for ParquetField #6239 [parquet] (mapleFU)
- Minor: Improve Type documentation #6224 [arrow] (alamb)
- Minor: Update
DateType::Date64
docs #6223 [arrow] (alamb) - Add (more) Parquet Metadata Documentation #6184 [parquet] (alamb)
- Add additional documentation and examples to
ArrayAccessor
#6141 [arrow] (alamb) - Minor: improve comments in temporal.rs tests #6140 [arrow] (alamb)
- Minor: Update release schedule in README #6125 (alamb)
Closed issues:
- Simplify take octokit workflow #6279
- Make the bearer token visible in FlightSqlServiceClient #6253 [arrow] [arrow-flight]
- Port
take
workflow to useoktokit
#6242 - Remove
SchemaBuilder
dependency fromStructArray
constructors #6138 [arrow]
Merged pull requests:
- Derive PartialEq and Eq for parquet::arrow::ProjectionMask #6330 [parquet] (thinkharderdev)
- Support zero column
RecordBatch
es in pyarrow integration (use RecordBatchOptions when converting a pyarrow RecordBatch) #6320 [arrow] (Michael-J-Ward) - Fix writing of invalid Parquet ColumnIndex when row group contains null pages #6319 [parquet] (adriangb)
- Pass empty vectors as min/max for all null pages when building ColumnIndex #6316 [parquet] (etseidl)
- Update tonic-build requirement from =0.12.0 to =0.12.2 #6314 [arrow] [arrow-flight] (dependabot[bot])
- Parquet: add
union
method toRowSelection
#6308 [parquet] (sdd) - Specialize filter for structs and sparse unions #6304 [arrow] (gstvg)
- Err on
try_from_le_slice
#6295 [parquet] (samuelcolvin) - fix reference in doctest to size_of which is not imported by default #6286 [arrow] (rtyler)
- Support writing UTC adjusted time arrays to parquet #6278 [parquet] (aykut-bozkurt)
- Minor:
pub use ByteView
in arrow and improve documentation #6275 [arrow] (alamb) - Fix accessing name from ffi schema #6273 [arrow] (kylebarron)
- Do not print compression level in schema printer #6271 [parquet] (ttencate)
- ci: use octokit to add assignee #6267 (dsgibbons)
- Add tests for bad parquet files #6262 [parquet] (alamb)
- Add
Statistics::distinct_count_opt
and deprecateStatistics::distinct_count
#6259 [parquet] (alamb) - Minor: move
FallibleRequestStream
andFallibleTonicResponseStream
to a module #6258 [arrow] [arrow-flight] (alamb) - Make the bearer token visible in FlightSqlServiceClient #6254 [arrow] [arrow-flight] (ccciudatu)
- Use
unary()
for array conversion in Parquet array readers, speed upDecimal128
,Decimal256
andFloat16
#6252 [parquet] [arrow] (etseidl) - Update tower requirement from 0.4.13 to 0.5.0 #6250 [arrow] [arrow-flight] (dependabot[bot])
- Implement date_part for durations #6246 [arrow] (nrc)
- Remove unnecessary null buffer construction when converting arrays to a different type #6244 [parquet] [arrow] (etseidl)
- Implement PartialEq for GenericByteViewArray #6241 [arrow] (alamb)
- Minor: Remove non standard footer from LICENSE.txt / reference to Apache Aurora #6237 (alamb)
- docs: Add parquet_opendal in related projects #6236 (Xuanwo)
- Avoid infinite loop in bad parquet by checking the number of rep levels #6232 [parquet] (jp0317)
- Specialize Prefix/Suffix Match for
Like/ILike
between Array and Scalar for StringViewArray #6231 [arrow] (xinlifoobar) - fix: lexsort_to_indices should not fallback to non-lexical sort if the datatype is not supported #6225 [arrow] (viirya)
- Modest improvement to FixedLenByteArray BYTE_STREAM_SPLIT arrow decoder #6222 [parquet] (etseidl)
- Improve performance of
FixedLengthBinary
decoding #6220 [parquet] (etseidl) - Update documentation for Parquet BYTE_STREAM_SPLIT encoding #6212 [parquet] (etseidl)
- Improve interval parsing #6211 [arrow] (samuelcolvin)
- minor: Suggest take on interleave docs #6210 [arrow] (gstvg)
- fix: Correctly handle take on dense union of a single selected type #6209 [arrow] (gstvg)
- Add time dictionary coercions #6208 [arrow] (adriangb)
- fix(arrow): restrict the range of temporal values produced via
data_gen
#6205 [arrow] (kyle-mccarthy) - Add benchmarks for
BYTE_STREAM_SPLIT
encoded ParquetFIXED_LEN_BYTE_ARRAY
data #6204 [parquet] (etseidl) - Move
ParquetMetadataWriter
to its own module, update documentation #6202 [parquet] (alamb) - Add
ThriftMetadataWriter
for writing Parquet metadata #6197 [parquet] (adriangb) - Update zstd-sys requirement from >=2.0.0, <2.0.13 to >=2.0.0, <2.0.14 #6196 [parquet] (dependabot[bot])
- fix parquet type
is_optional
comments #6192 [parquet] (jp0317) - Remove duplicated statistics tests in parquet #6190 [parquet] (Kev1n8)
- Benchmarks for
bool_and
#6189 [arrow] (simonvandel) - Fix typo in documentation of Float64Array #6188 [arrow] (mesejo)
- Make it clear that
StatisticsConverter
can not panic #6187 [parquet] (alamb) - add filter benchmark for
FixedSizeBinaryArray
#6186 [arrow] (chloro-pn) - Update sysinfo requirement from 0.30.12 to 0.31.2 #6182 [parquet] (dependabot[bot])
- Add support for
StringView
andBinaryView
statistics inStatisticsConverter
#6181 [parquet] (Kev1n8) - Support casting between BinaryView <--> Utf8 and LargeUtf8 #6180 [arrow] (xinlifoobar)
- Implement specialized filter kernel for
FixedSizeByteArray
#6178 [arrow] (chloro-pn) - Support
StringView
andBinaryView
in CDataInterface #6171 [arrow] (a10y) - Optimize
take
kernel forBinaryViewArray
andStringViewArray
#6168 [arrow] (a10y) - Support Parquet
BYTE_STREAM_SPLIT
for INT32, INT64, and FIXED_LEN_BYTE_ARRAY primitive types #6159 [parquet] (etseidl) - Fix comparison kernel benchmarks #6147 [arrow] (samuelcolvin)
- improve
LIKE
regex performance up to 12x #6145 [arrow] (samuelcolvin) - Optimize
min_boolean
andbool_and
#6144 [arrow] (simonvandel) - Reduce bounds check in
RowIter
, addunsafe Rows::row_unchecked
#6142 [arrow] (XiangpengHao) - Minor: Simplify
StructArray
constructors #6139 [arrow] (Rafferty97) - Implement exponential block size growing strategy for
StringViewBuilder
#6136 [arrow] (XiangpengHao) - Use
LevelHistogram
inPageIndex
#6135 [parquet] (etseidl) - Add ArrowError::ArithmeticError #6130 [arrow] (andygrove)
- Improve
LIKE
performance for "contains" style queries #6128 [arrow] (samuelcolvin) - Add
BooleanArray::new_from_packed
andBooleanArray::new_from_u8
#6127 [arrow] (chloro-pn) - improvements to
(i)starts_with
and(i)ends_with
performance #6118 [arrow] (samuelcolvin) - Fix Clippy for the Rust 1.80 release #6116 [parquet] [arrow] [arrow-flight] (alamb)
- added a flush method to IPC writers #6108 [arrow] (V0ldek)
- Add support for level histograms added in PARQUET-2261 to
ParquetMetaData
#6105 [parquet] (etseidl) - Implement date_part for intervals #6071 [arrow] (nrc)
- feat(parquet): Implement AsyncFileWriter for
object_store::buffered::BufWriter
#6013 [parquet] (Xuanwo)
* This Changelog was automatically generated by github_changelog_generator