Skip to content

Latest commit

 

History

History
792 lines (609 loc) · 88 KB

CHANGELOG.md

File metadata and controls

792 lines (609 loc) · 88 KB

Changelog

9.0.0 (2022-02-04)

Full Changelog

Breaking changes:

  • Add Send + Sync to DataType, RowGroupReader, FileReader, ChunkReader. #1264
  • Rename the function Bitmap::len to Bitmap::bit_len to clarify its meaning #1242 [parquet] [arrow] (HaoYang670)
  • Remove unused / broken memory-check feature #1222 [arrow] (jhorstmann)
  • Potentially buffer multiple RecordBatches before writing a parquet row group in ArrowWriter #1214 [parquet] [arrow] (tustvold)

Implemented enhancements:

  • Add async arrow parquet reader #1154 [parquet] [arrow] (tustvold)
  • Rename Bitmap::len to Bitmap::bit_len #1233
  • Extend CSV schema inference to allow scientific notation for floating point types #1215 [arrow]
  • Write Multiple RecordBatch to Parquet Row Group #1211
  • Add doc examples for eq_dyn etc. #1202 [arrow]
  • Add comparison kernels for BinaryArray #1108
  • impl ArrowNativeType for i128 #1098
  • Remove Copy trait bound from dyn scalar kernels #1243 [arrow] (matthewmturner)
  • Add into_inner for IPC FileWriter #1236 [arrow] (yjshen)
  • [Minor]Re-export array::builder::make_builder to make it available for downstream #1235 [arrow] (yjshen)

Fixed bugs:

  • Parquet v8.0.0 panics when reading all null column to NullArray #1245 [parquet]
  • Get Unknown configuration option rust-version when running the rust format command #1240
  • Bitmap Length Validation is Incorrect #1231 [arrow]
  • Writing sliced ListArray or MapArray ignore offsets #1226 [parquet]
  • Remove broken memory-tracking crate feature #1171
  • Revert making parquet::data_type and parquet::arrow::schema experimental #1244 [parquet] (tustvold)

Documentation updates:

Performance improvements:

  • Improve performance for arithmetic kernels with simd feature enabled (except for division/modulo) #1221 [arrow] (jhorstmann)
  • Do not concatenate identical dictionaries #1219 [arrow] (tustvold)
  • Preserve dictionary encoding when decoding parquet into Arrow arrays, 60x perf improvement (#171) #1180 [parquet] (tustvold)

Closed issues:

  • UnalignedBitChunkIterator to that iterates through already aligned u64 blocks #1227
  • Remove unused ArrowArrayReader in parquet #1197 [parquet]

Merged pull requests:

8.0.0 (2022-01-20)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Parquet reader should be able to read structs within list #1186 [parquet]
  • Disable serde_json arbitrary_precision feature flag #1174 [arrow]
  • Simplify and reduce code duplication in arithmetic.rs #1160 [arrow]
  • Return Err from JSON writer rather than panic! for unsupported types #1157 [arrow]
  • Support scalar mathematics kernels for Array and scalar value #1153 [arrow]
  • Support DecimalArray in sort kernel #1137
  • Parquet Fuzz Tests #1053
  • BooleanBufferBuilder Append Packed #1038 [arrow]
  • parquet Performance Optimization: StructArrayReader Redundant Level & Bitmap Computation #1034 [parquet]
  • Reduce Public Parquet API #1032 [parquet]
  • Add from_iter_values for binary array #1188 [arrow] (Jimexist)
  • Add support for MapArray in json writer #1149 [arrow] (helgikrs)

Fixed bugs:

  • Empty string arrays with no nulls are not equal #1208 [arrow]
  • Pretty print a RecordBatch containing Float16 triggers a panic #1193 [arrow]
  • Writing structs nested in lists produces an incorrect output #1184 [parquet]
  • Undefined behavior for GenericStringArray::from_iter_values if reported iterator upper bound is incorrect #1144 [arrow]
  • Interval comparisons with simd feature asserts #1136 [arrow]
  • RecordReader Permits Illegal Types #1132 [parquet]

Security fixes:

Documentation updates:

Performance improvements:

  • Improve parquet reading performance for columns with nulls by preserving bitmask when possible (#1037) #1054 [parquet] [arrow] (tustvold)
  • Improve parquet performance: Skip levels computation for required struct arrays in parquet #1035 [parquet] (tustvold)

Closed issues:

  • Generify ColumnReaderImpl and RecordReader #1040 [parquet]
  • Parquet Preserve BitMask #1037

Merged pull requests:

7.0.0 (2022-1-07)

Full Changelog

Arrow

Breaking changes:

  • pretty_format_batches now returns Result<impl Display> rather than String: #975
  • MutableBuffer::typed_data_mut is marked unsafe: #1029
  • UnionArray updated match latest Arrow spec, added UnionMode, UnionArray::new() marked unsafe: #885

New Features:

  • Support for Float16Array types #888
  • IPC support for UnionArray #654
  • Dynamic comparison kernels for scalars (e.g. eq_dyn_scalar), including DictionaryArray: #1113

Enhancements:

  • Added Schema::with_metadata and Field::with_metadata #1092
  • Support for custom datetime format for inference and parsing csv files #1112
  • Implement Array for ArrayRef for easier use #1129
  • Pretty printing display support for FixedSizeBinaryArray #1097
  • Dependency Upgrades: pyo3, parquet-format, prost, tonic
  • Avoid allocating vector of indices in lexicographical_partition_ranges#998

Parquet

Fixed bugs:

  • (parquet) Fix reading of dictionary encoded pages with null values: #1130

Changelog

6.5.0 (2021-12-23)

Full Changelog

6.4.0 (2021-12-10)

Full Changelog

6.3.0 (2021-11-26)

Full Changelog

Changes:

6.2.0 (2021-11-12)

Full Changelog

Features / Fixes:

6.1.0 (2021-10-29)

Full Changelog

Features / Fixes:

Other:

6.0.0 (2021-10-13)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Improve parquet binary writer speed by reducing allocations #819
  • Expose buffer operations #808
  • Add doc examples of writing parquet files using ArrowWriter #788

Fixed bugs:

  • JSON reader can create null struct children on empty lists #825
  • Incorrect null count for cast kernel for list arrays #815
  • minute and second temporal kernels do not respect timezone #500
  • Fix data corruption in json decoder f64-to-i64 cast #652 [arrow] (xianwill)

Documentation updates:

5.5.0 (2021-09-24)

Full Changelog

Implemented enhancements:

  • parquet should depend on a small set of arrow features #800
  • Support equality on RecordBatch #735

Fixed bugs:

  • Converting from string to timestamp uses microseconds instead of milliseconds #780
  • Document has no link to RowColumIter #762
  • length on slices with null doesn't work #744

5.4.0 (2021-09-10)

Full Changelog

Implemented enhancements:

  • Upgrade lexical-core to 0.8 #747
  • append_nulls and append_trusted_len_iter for PrimitiveBuilder #725
  • Optimize MutableArrayData::extend for null buffers #397

Fixed bugs:

  • Arithmetic with scalars doesn't work on slices #742
  • Comparisons with scalar don't work on slices #740
  • unary kernel doesn't respect offset #738
  • new_null_array creates invalid struct arrays #734
  • --no-default-features is broken for parquet #733 [parquet]
  • Bitmap::len returns the number of bytes, not bits. #730
  • Decimal logical type is formatted incorrectly by print_schema #713
  • parquet_derive does not support chrono time values #711
  • Numeric overflow when formatting Decimal type #710
  • The integration tests are not running #690

Closed issues:

  • Question: Is there no way to create a DictionaryArray with a pre-arranged mapping? #729

5.3.0 (2021-08-26)

Full Changelog

Implemented enhancements:

  • Add optimized filter kernel for regular expression matching #697
  • Can't cast from timestamp array to string array #587

Fixed bugs:

  • 'Encoding DELTA_BYTE_ARRAY is not supported' with parquet arrow readers #708
  • Support reading json string into binary data type. #701

Closed issues:

  • Resolve Issues with prettytable-rs dependency #69 [arrow]

5.2.0 (2021-08-12)

Full Changelog

Implemented enhancements:

  • Make rand an optional dependency #671
  • Remove undefined behavior in value method of boolean and primitive arrays #645
  • Avoid materialization of indices in filter_record_batch for single arrays #636
  • Add a note about arrow crate security / safety #627
  • Allow the creation of String arrays from an interator of &Option<&str> #598
  • Support arrow map datatype #395

Fixed bugs:

  • Parquet fixed length byte array columns write byte array statistics #660 [parquet]
  • Parquet boolean columns write Int32 statistics #659 [parquet]
  • Writing Parquet with a boolean column fails #657
  • JSON decoder data corruption for large i64/u64 #653
  • Incorrect min/max statistics for strings in parquet files #641 [parquet]

Closed issues:

  • Release candidate verifying script seems work on macOS #640
  • Update CONTRIBUTING #342

5.1.0 (2021-07-29)

Full Changelog

Implemented enhancements:

  • Make FFI_ArrowArray empty() public #602
  • exponential sort can be used to speed up lexico partition kernel #586
  • Implement sort() for binary array #568
  • primitive sorting can be improved and more consistent with and without limit if sorted unstably #553

Fixed bugs:

  • Confusing memory usage with CSV reader #623
  • FFI implementation deviates from specification for array release #595
  • Parquet file content is different if ~/.cargo is in a git checkout #589
  • Ensure output of MIRI is checked for success #581
  • MIRI failure in array::ffi::tests::test_struct and other ffi tests #580
  • ListArray equality check may return wrong result #570
  • cargo audit failed #561
  • ArrayData::slice() does not work for nested types such as StructArray #554

Documentation updates:

  • More examples of how to construct Arrays #301

Closed issues:

  • Implement StringBuilder::append_option #263 [arrow]

5.0.0 (2021-07-14)

Full Changelog

Breaking changes:

Implemented enhancements:

Fixed bugs:

  • Error building on master - error: cyclic package dependency: package ahash v0.7.4 depends on itself. Cycle #544
  • IPC reader panics with out of bounds error #541
  • Take kernel doesn't handle nulls and structs correctly #530 [arrow]
  • master fails to compile with default-features=false #529
  • README developer instructions out of date #523
  • Update rustc and packed_simd in CI before 5.0 release #517
  • Incorrect memory usage calculation for dictionary arrays #503 [arrow]
  • sliced null buffers lead to incorrect result in take kernel (and probably on other places) #502
  • Cast of utf8 types and list container types don't respect offset #334 [arrow]
  • fix take kernel null handling on structs #531 [arrow] (bjchambers)
  • Correct array memory usage calculation for dictionary arrays #505 [arrow] (jhorstmann)
  • parquet: improve BOOLEAN writing logic and report error on encoding fail #443 [parquet] (garyanaplan)
  • Fix bug with null buffer offset in boolean not kernel #418 [arrow] (jhorstmann)
  • respect offset in utf8 and list casts #335 [arrow] (ritchie46)
  • Fix comparison of dictionaries with different values arrays (#332) #333 [arrow] (tustvold)
  • ensure null-counts are written for all-null columns #307 [parquet] (crepererum)
  • fix invalid null handling in filter #296 [arrow] (ritchie46)
  • fix NaN handling in parquet statistics #256 (crepererum)

Documentation updates:

Merged pull requests:

4.4.0 (2021-06-24)

Full Changelog

Breaking changes:

  • migrate partition kernel to use Iterator trait #437 [arrow]
  • Remove DictionaryArray::keys_array #391 [arrow]

Implemented enhancements:

  • sort kernel boolean sort can be O(n) #447 [arrow]
  • C data interface for decimal128, timestamp, date32 and date64 #413
  • Add Decimal to CsvWriter #405
  • Use iterators to increase performance of creating Arrow arrays #200 [parquet]

Fixed bugs:

  • Release Audit Tool (RAT) is not being triggered #481
  • Security Vulnerabilities: flatbuffers: read_scalar and read_scalar_at allow transmuting values without unsafe blocks #476
  • Clippy broken after upgrade to rust 1.53 #467
  • Pull Request Labeler is not working #462
  • Arrow 4.3 release: error[E0658]: use of unstable library feature 'partition_point': new API #456
  • parquet reading hangs when row_group contains more than 2048 rows of data #349
  • Fail to build arrow #247
  • JSON reader does not implement iterator #193 [arrow]

Security fixes:

  • Ensure a successful MIRI Run on CI #227

Closed issues:

  • sort kernel has a lot of unnecessary wrapping #446
  • [Parquet] Plain encoded boolean column chunks limited to 2048 values #48 [parquet]

4.3.0 (2021-06-10)

Full Changelog

Implemented enhancements:

  • Add partitioning kernel for sorted arrays #428 [arrow]
  • Implement sort by float lists #427 [arrow]
  • Derive Eq and PartialEq for SortOptions #426 [arrow]
  • use prettier and github action to normalize markdown document syntax #399
  • window::shift can work for more than just primitive array type #392
  • Doctest for ArrayBuilder #366

Fixed bugs:

  • Boolean not kernel does not take offset of null buffer into account #417
  • my contribution not marged in 4.2 release #394
  • window::shift shall properly handle boundary cases #387
  • Parquet WriterProperties.max_row_group_size not wired up #257
  • Out of bound reads in chunk iterator #198 [arrow]

4.2.0 (2021-05-29)

Full Changelog

Breaking changes:

  • DictionaryArray::values() clones the underlying ArrayRef #313 [arrow]

Implemented enhancements:

  • Simplify shift kernel using null array #371
  • Provide Arc-based constructor for parquet::util::cursor::SliceableCursor #368
  • Add badges to crates #361
  • Consider inlining PrimitiveArray::value #328
  • Implement automated release verification script #327
  • Add wasm32 to the list of target architectures of the simd feature #316
  • add with_escape for csv::ReaderBuilder #315 [arrow]
  • IPC feature gate #310
  • csv feature gate #309 [arrow]
  • Add shrink_to / shrink_to_fit to MutableBuffer #297

Fixed bugs:

  • Incorrect crate setup instructions #364
  • Arrow-flight only register rerun-if-changed if file exists #350
  • Dictionary Comparison Uses Wrong Values Array #332
  • Undefined behavior in FFI implementation #322
  • All-null column get wrong parquet null-counts #306 [parquet]
  • Filter has inconsistent null handling #295

4.1.0 (2021-05-17)

Full Changelog

Implemented enhancements:

  • Add Send to ArrayBuilder #290 [arrow]
  • Improve performance of bound checking option #280 [arrow]
  • extend compute kernel arity to include nullary functions #276
  • Implement FFI / CDataInterface for Struct Arrays #251 [arrow]
  • Add support for pretty-printing Decimal numbers #230 [arrow]
  • CSV Reader String Dictionary Support #228 [arrow]
  • Add Builder interface for adding Arrays to record batches #210 [arrow]
  • Support auto-vectorization for min/max #209 [arrow]
  • Support LargeUtf8 in sort kernel #25 [arrow]

Fixed bugs:

  • no method named select_nth_unstable_by found for mutable reference &mut [T] #283
  • Rust 1.52 Clippy error #266
  • NaNs can break parquet statistics #255 [parquet]
  • u64::MAX does not roundtrip through parquet #254 [parquet]
  • Integration tests failing to compile (flatbuffer) #249 [arrow]
  • Fix compatibility quirks between arrow and parquet structs #245 [parquet]
  • Unable to write non-null Arrow structs to Parquet #244 [parquet]
  • schema: missing field metadata when deserialize #241 [arrow]
  • Arrow does not compile due to flatbuffers upgrade #238 [arrow]
  • Sort with limit panics for the limit includes some but not all nulls, for large arrays #235 [arrow]
  • arrow-rs contains a copy of the "format" directory #233 [arrow]
  • Fix SEGFAULT/ SIGILL in child-data ffi #206 [arrow]
  • Read list field correctly in <struct<list>> #167 [parquet]
  • FFI listarray lead to undefined behavior. #20

Security fixes:

Documentation updates:

  • Comment out the instructions in the PR template #277
  • Update links to datafusion and ballista in README.md #19
  • Update "repository" in Cargo.toml #12

Closed issues:

  • Arrow Aligned Vec #268
  • [Rust]: Tracking issue for AVX-512 #220 [arrow]
  • Umbrella issue for clippy integration #217 [arrow]
  • Support sort #215 [arrow]
  • Support stable Rust #214 [arrow]
  • Remove Rust and point integration tests to arrow-rs repo #211 [arrow]
  • ArrayData buffers are inconsistent accross implementations #207
  • 3.0.1 patch release #204
  • Document patch release process #202
  • Simplify Offset #186 [arrow]
  • Typed Bytes #185 [arrow]
  • [CI]docker-compose setup should enable caching #175
  • Improve take primitive performance #174
  • [CI] Try out buildkite #165 [arrow]
  • Update assignees in JIRA where missing #160
  • [Rust]: From<ArrayDataRef> implementations should validate data type #103 [arrow]
  • [DataFusion] Verify that projection push down does not remove aliases columns #99 [arrow]
  • [Rust][DataFusion] Implement modulus expression #98 [arrow]
  • [DataFusion] Add constant folding to expressions during logically planning #96 [arrow]
  • [DataFusion] DataFrame.collect should return RecordBatchReader #95 [arrow]
  • [Rust][DataFusion] Add FORMAT to explain plan and an easy to visualize format #94 [arrow]
  • [DataFusion] Implement metrics framework #90 [arrow]
  • [DataFusion] Implement micro benchmarks for each operator #89 [arrow]
  • [DataFusion] Implement pretty print for physical query plan #88 [arrow]
  • [Archery] Support rust clippy in the lint command #83
  • [rust][datafusion] optimize count(*) queries on parquet sources #75 [arrow]
  • [Rust][DataFusion] Improve like/nlike performance #71 [arrow]
  • [DataFusion] Implement optimizer rule to remove redundant projections #56 [arrow]
  • [DataFusion] Parquet data source does not support complex types #39 [arrow]
  • Merge utils from Parquet and Arrow #32 [arrow] [parquet]
  • Add benchmarks for Parquet #30 [parquet]
  • Mark methods that do not perform bounds checking as unsafe #28 [arrow]
  • Test issue #24 [arrow]
  • This is a test issue #11

For older versions, see apache/arrow/CHANGELOG.md

* This Changelog was automatically generated by github_changelog_generator