Skip to content

Latest commit

 

History

History
1941 lines (1566 loc) · 254 KB

CHANGELOG-old.md

File metadata and controls

1941 lines (1566 loc) · 254 KB

Historical Changelog

21.0.0 (2022-08-18)

Full Changelog

Breaking changes:

Implemented enhancements:

  • add into_inner method to ArrowWriter #2491 [parquet]
  • Remove byteorder dependency #2472 [parquet]
  • Return Structured ColumnCloseResult from GenericColumnWriter::close #2465 [parquet]
  • Push ChunkReader into SerializedPageReader #2463 [parquet]
  • Support SerializedPageReader::skip_page without OffsetIndex #2459 [parquet]
  • Support Time64/Time32 comparison #2457 [arrow]
  • Revise FromIterator for Decimal128Array to use Into instead of Borrow #2441 [parquet]
  • Support RowFilter withinParquetRecordBatchReader #2431 [parquet]
  • Remove the field StructBuilder::len #2429 [arrow]
  • Standardize creation and configuration of parquet --> Arrow readers ( ParquetRecordBatchReaderBuilder) #2427 [parquet]
  • Use OffsetIndex to Prune IO in ParquetRecordBatchStream #2426 [parquet]
  • Support peek_next_page and skip_next_page in InMemoryPageReader #2406 [parquet]
  • Support casting from Utf8/LargeUtf8 to Binary/LargeBinary #2402 [arrow]
  • Support casting between Decimal128 and Decimal256 arrays #2375 [arrow]
  • Combine multiple selections into the same batch size in skip_records #2358 [parquet]
  • Add API to change timezone for timestamp array #2346 [arrow]
  • Change the output of read_buffer Arrow IPC API to return Result<_> #2342 [arrow]
  • Allow skip_records in GenericColumnReader to skip across row groups #2331 [parquet]
  • Optimize the validation of Decimal256 #2320 [arrow]
  • Implement Skip for DeltaBitPackDecoder #2281 [parquet]
  • Changes to ParquetRecordBatchStream to support row filtering in DataFusion #2270 [parquet]
  • Add ArrayReader::skip_records API #2197 [parquet]

Fixed bugs:

  • Panic in SerializedPageReader without offset index #2503 [parquet]
  • MapArray columns don't handle null values correctly #2484 [arrow]
  • There is no compiler error when using an invalid Decimal type. #2440 [arrow]
  • Flight SQL Server sends incorrect response for DoPutUpdateResult #2403 [arrow-flight]
  • AsyncFileReaderNo Longer Object-Safe #2372 [parquet]
  • StructBuilder Does not Verify Child Lengths #2252 [arrow]

Closed issues:

Merged pull requests:

20.0.0 (2022-08-05)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Add the constant data type constructors for ListArray #2311 [arrow]
  • Update FlightSqlService trait to pass session info along #2308 [arrow-flight]
  • Optimize take_bits for non-null indices #2306 [arrow]
  • Make FFI support optional via Feature Flag ffi #2302 [arrow]
  • Mark ffi::ArrowArray::try_new is safe #2301 [arrow]
  • Remove test_utils from default arrow-rs features #2298 [arrow]
  • Remove JsonEqual trait #2296 [arrow]
  • Move with_precision_and_scale to Decimal array traits #2291 [arrow]
  • Improve readability and maybe performance of string --> numeric/time/date/timetamp cast kernels #2285 [arrow]
  • Add vectorized unpacking for 8, 16, and 64 bit integers #2276 [parquet]
  • Use initial capacity for interner hashmap #2273 [arrow]
  • Impl FromIterator for Decimal256Array #2248 [arrow]
  • Separate ArrayReader::next_batchwith ArrayReader::read_records and ArrayReader::consume_batch #2236 [parquet]
  • Rename DataType::Decimal to DataType::Decimal128 #2228 [arrow]
  • Automatically Grow Parquet BitWriter Buffer #2226 [parquet]
  • Add append_option support to Decimal128Builder and Decimal256Builder #2224 [arrow]
  • Split the FixedSizeBinaryArray and FixedSizeListArray from array_binary.rs and array_list.rs #2217 [arrow]
  • Don't Box Values in PrimitiveDictionaryBuilder #2215 [arrow]
  • Use BitChunks in equal_bits #2186 [arrow]
  • Implement Hash for Schema #2182 [arrow]
  • read decimal data type from parquet file with binary physical type #2159 [parquet]
  • The GenericStringBuilder should use GenericBinaryBuilder #2156 [arrow]
  • Update Rust version to 1.62 #2143 [parquet] [arrow] [arrow-flight]
  • Check precision and scale against maximum value when constructing Decimal128 and Decimal256 #2139 [arrow]
  • Use ArrayAccessor in Decimal128Iter and Decimal256Iter #2138 [arrow]
  • Use ArrayAccessor and FromIterator in Cast Kernels #2137 [arrow]
  • Add TypedDictionaryArray for more ergonomic interaction with DictionaryArray #2136 [arrow]
  • Use ArrayAccessor in Comparison Kernels #2135 [arrow]
  • Support peek_next_page() and skip_next_page in InMemoryColumnChunkReader #2129 [parquet]
  • Lazily materialize the null buffer builder for all array builders. #2125 [arrow]
  • Do value validation for Decimal256 #2112 [arrow]
  • Support skip_def_levels for ColumnLevelDecoder #2107 [parquet]
  • Add integration test for scan rows with selection #2106 [parquet]
  • Support for casting from Utf8/String to Time32 / Time64 #2053 [arrow]
  • Update prost and tonic related crates #2268 [arrow-flight] (carols10cents)

Fixed bugs:

  • temporal conversion functions cannot work on negative input properly #2325 [arrow]
  • IPC writer should truncate string array with all empty string #2312 [arrow]
  • Error order for comparing Decimal128 or Decimal256 #2256 [arrow]
  • Fix maximum and minimum for decimal values for precision greater than 38 #2246 [arrow]
  • IntervalMonthDayNanoType::make_value() does not match C implementation #2234 [arrow]
  • FlightSqlService trait does not allow impls to do handshake #2210 [arrow-flight]
  • EnabledStatistics::None not working #2185 [parquet]
  • Boolean ArrayData Equality Incorrect Slice Handling #2184 [arrow]
  • Publicly export MapFieldNames #2118 [arrow]

Documentation updates:

  • Update instructions on How to join the slack #arrow-rust channel -- or maybe try to switch to discord?? #2192
  • [Minor] Improve arrow and parquet READMEs, document parquet feature flags #2324 [parquet] [arrow] (alamb)

Performance improvements:

Closed issues:

  • Fix wrong logic in calculate_row_count when skipping values #2328 [parquet]
  • Support filter for parquet data type #2126 [parquet]
  • Make skip value in ByteArrayDecoderDictionary avoid decoding #2088 [parquet]

Merged pull requests:

19.0.0 (2022-07-22)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Use total_cmp from std #2130 [arrow]
  • Permit parallel fetching of column chunks in ParquetRecordBatchStream #2110 [parquet]
  • The GenericBinaryBuilder should use buffer builders directly. #2104 [arrow]
  • Pass generate_decimal256_case arrow integration test #2093 [arrow]
  • Rename weekday and weekday0 kernels to to num_days_from_monday and days_since_sunday #2065 [arrow]
  • Improve performance of filter_dict #2062 [arrow]
  • Improve performance of set_bits #2060 [arrow]
  • Lazily materialize the null buffer builder of BooleanBuilder #2058 [arrow]
  • BooleanArray::from_iter should omit validity buffer if all values are valid #2055 [arrow]
  • FFI_ArrowSchema should set DICTIONARY_ORDERED flag if a field's dictionary is ordered #2049 [arrow]
  • Support peek_next_page() and skip_next_page in SerializedPageReader #2043 [parquet]
  • Support FFI / C Data Interface for MapType #2037 [arrow]
  • The DecimalArrayBuilder should use FixedSizedBinaryBuilder #2026 [arrow]
  • Enable serialized_reader read specific Page by passing row ranges. #1976 [parquet]

Fixed bugs:

  • type_id and value_offset are incorrect for sliced UnionArray #2086 [arrow]
  • Boolean take kernel does not handle null indices correctly #2057 [arrow]
  • Don't double-count nulls in write_batch_with_statistics #2046 [parquet]
  • Parquet Writer Ignores Statistics specification in WriterProperties #2014 [parquet]

Documentation updates:

  • Improve docstrings + examples for as_primitive_array cast functions #2114 [arrow] (alamb)

Closed issues:

  • Why does serde_json specify the preserve_order feature in arrow package #2095 [arrow]
  • Support skip_values in DictionaryDecoder #2079 [parquet]
  • Support skip_values in ColumnValueDecoderImpl #2078 [parquet]
  • Support skip_values in ByteArrayColumnValueDecoder #2072 [parquet]
  • Several Builder::append methods returning results even though they are infallible #2071
  • Improve formatting of logical plans containing subqueries #2059
  • Return reference from UnionArray::child #2035
  • support write page index #1777 [parquet]

Merged pull requests:

18.0.0 (2022-07-08)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Add DataType::Dictionary support to subtract_scalar, multiply_scalar, divide_scalar #2019 [arrow]
  • Support DictionaryArray in add_scalar kernel #2017 [arrow]
  • Enable column page index read test for all types #2010 [parquet]
  • Simplify FixedSizeBinaryBuilder #2007 [arrow]
  • Support Decimal256Builder and Decimal256Array #1999 [arrow]
  • Support DictionaryArray in unary kernel #1989 [arrow]
  • Add kernel to quickly compute comparisons on Arrays #1987 [arrow]
  • Support DictionaryArray in divide kernel #1982 [arrow]
  • Implement Into<ArrayData> for T: Array #1979 [arrow]
  • Support DictionaryArray in multiply kernel #1972 [arrow]
  • Support DictionaryArray in subtract kernel #1970 [arrow]
  • Declare DecimalArray::length as a constant #1967 [arrow]
  • Support DictionaryArray in add kernel #1950 [arrow]
  • Add builder style methods to Field #1934 [arrow]
  • Make StringDictionaryBuilder faster #1851 [arrow]
  • concat_elements_utf8 should accept arbitrary number of input arrays #1748 [arrow]

Fixed bugs:

  • Array reader for list columns fails to decode if batches fall on row group boundaries #2025 [parquet]
  • ColumnWriterImpl::write_batch_with_statistics incorrect distinct count in statistics #2016 [parquet]
  • ColumnWriterImpl::write_batch_with_statistics can write incorrect page statistics #2015 [parquet]
  • RowFormatter is not part of the public api #2008 [parquet]
  • Infinite Loop possible in ColumnReader::read_batch For Corrupted Files #1997 [parquet]
  • PrimitiveBuilder::finish_dict does not validate dictionary offsets #1978 [arrow]
  • Incorrect n_buffers in FFI_ArrowArray #1959 [arrow]
  • DecimalArray::from_fixed_size_list_array fails when offset > 0 #1958 [arrow]
  • Incorrect (but ignored) metadata written after ColumnChunk #1946 [parquet]
  • Send + Sync impl for Allocation may not be sound unless Allocation is Send + Sync as well #1944 [arrow]
  • Disallow cast from other datatypes to NullType #1923 [arrow]

Documentation updates:

  • The doc of FixedSizeListArray::value_length is incorrect. #1908 [arrow]

Closed issues:

  • Column chunk statistics of min_bytes and max_bytes return wrong size #2021 [parquet]
  • [Discussion] Refactor the Decimals by using constant generic. #2001
  • Move DecimalArray to a new file #1985 [arrow]
  • Support DictionaryArray in multiply kernel #1974
  • close function instead of mutable reference #1969 [parquet]
  • Incorrect null_count of DictionaryArray #1962 [arrow]
  • Support multi diskRanges for ChunkReader #1955 [parquet]
  • Persisting Arrow timestamps with Parquet produces missing TIMESTAMP in schema #1920 [parquet]
  • Sperate get_next_page_header from get_next_page in PageReader #1834 [parquet]

Merged pull requests:

17.0.0 (2022-06-24)

Full Changelog

Breaking changes:

Implemented enhancements:

  • add a small doc example showing ArrowWriter being used with a cursor #1927 [parquet]
  • Support cast to/from NULL and DataType::Decimal #1921 [arrow]
  • Add Decimal256 API #1913 [arrow]
  • Add DictionaryArray::key function #1911 [arrow]
  • Support specifying capacities for ListArrays in MutableArrayData #1884 [arrow]
  • Explicitly declare the features used for each dependency #1876 [parquet] [arrow] [arrow-flight]
  • Add Decimal128 API and use it in DecimalArray and DecimalBuilder #1870 [arrow]
  • PrimitiveArray::from_iter should omit validity buffer if all values are valid #1856 [arrow]
  • Add from(v: Vec<Option<&[u8]>>) and from(v: Vec<&[u8]>) for FixedSizedBInaryArray #1852 [arrow]
  • Add Vec-inspired APIs to BufferBuilder #1850 [arrow]
  • PyArrow intergation test for C Stream Interface #1847 [arrow]
  • Add nilike support in comparison #1845 [arrow]
  • Split up arrow::array::builder module #1843 [arrow]
  • Add quarter support in temporal kernels #1835 [arrow]
  • Rename ArrayData::validate_dictionary_offset to ArrayData::validate_values #1812 [arrow]
  • Clean up the testing code for substring kernel #1801 [arrow]
  • Speed up substring_by_char kernel #1800 [arrow]

Fixed bugs:

  • unable to write parquet file with UTC timestamp #1932 [parquet]
  • Incorrect max and min decimals #1916 [arrow]
  • dynamic_types example does not print the projection #1902 [arrow]
  • log2(0) panicked at 'attempt to subtract with overflow', parquet/src/util/bit_util.rs:148:5 #1901 [parquet]
  • Final slicing in combine_option_bitmap needs to use bit slices #1899 [arrow]
  • Dictionary IPC writer writes incorrect schema #1892 [arrow]
  • Creating a RecordBatch with null values in non-nullable fields does not cause an error #1888 [arrow]
  • Upgrade regex dependency #1874 [arrow]
  • Miri reports leaks in ffi tests #1872 [arrow]
  • AVX512 + simd binary and/or kernels slower than autovectorized version #1829 [arrow]

Documentation updates:

  • Blog post about arrow 10.0.0 - 16.0.0 #1808
  • Add README for the compute module. #1940 [arrow] (HaoYang670)
  • minor: clarify docstring on DictionaryArray::lookup_key #1910 [arrow] (alamb)
  • minor: add a diagram to docstring for DictionaryArray #1909 [arrow] (alamb)
  • Closes #1902: Print the original and projected RecordBatch in dynamic_types example #1903 [arrow] (martin-g)

Closed issues:

Merged pull requests:

16.0.0 (2022-06-10)

Full Changelog

Breaking changes:

Implemented enhancements:

  • List equality method should work on empty offset ListArray #1817 [arrow]
  • Command line tool for convert CSV to Parquet #1797 [parquet]
  • IPC writer should write validity buffer for UnionArray in V4 IPC message #1793 [arrow]
  • Add function for row alignment with page mask #1790 [parquet]
  • Rust IPC Read should be able to read V4 UnionType Array #1788 [arrow]
  • combine_option_bitmap should accept arbitrary number of input arrays. #1780 [arrow]
  • Add substring_by_char kernels for slicing on character boundaries #1768 [arrow]
  • Support reading PageIndex from column metadata #1761 [parquet]
  • Support casting from DataType::Utf8 to DataType::Boolean #1740 [arrow]
  • Make current position available in FileWriter. #1691 [parquet]
  • Support writing parquet to stdout #1687 [parquet]

Fixed bugs:

  • Incorrect Offset Validation for Sliced List Array Children #1814 [arrow]
  • Parquet Snappy Codec overwrites Existing Data in Decompression Buffer #1806 [parquet]
  • flight_data_to_arrow_batch does not support RecordBatches with no columns #1783 [arrow-flight]
  • parquet does not compile with features=["zstd"] #1630 [parquet]

Documentation updates:

Closed issues:

Merged pull requests:

15.0.0 (2022-05-27)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Rename the string kernel to concatenate_elements #1747 [arrow]
  • ArrayDataBuilder::null_bit_buffer should accept Option<Buffer> as input type #1737 [arrow]
  • Fix schema comparison for non_canonical_map when running flight test #1730 [arrow]
  • Add support in aggregate kernel for BinaryArray #1724 [arrow]
  • Fix incorrect null_count in generate_unions_case integration test #1712 [arrow]
  • Keep type ids in Union datatype to follow Arrow spec and integrate with other implementations #1690 [arrow]
  • Support Reading Alternative List Representations to Arrow From Parquet #1680 [parquet]
  • Speed up the offsets checking #1675 [arrow]
  • Separate Parquet -> Arrow Schema Conversion From ArrayBuilder #1655 [parquet]
  • Add leaf_columns argument to ArrowReader::get_record_reader_by_columns #1653 [parquet]
  • Implement string_concat kernel #1540 [arrow]
  • Improve Unit Test Coverage of ArrayReaderBuilder #1484 [parquet]

Fixed bugs:

  • Parquet write failure (from record batches) when data is nested two levels deep #1744 [parquet]
  • IPC reader may break on projection #1735 [arrow]
  • Latest nightly fails to build with feature simd #1734 [arrow]
  • Trying to write parquet file in parallel results in corrupt file #1717 [parquet]
  • Roundtrip failure when using DELTA_BINARY_PACKED #1708 [parquet]
  • ArrayData::try_new cannot always return expected error. #1707 [arrow]
  • "out of order projection is not supported" after Fix Parquet Arrow Schema Inference #1701 [parquet]
  • Rust is not interoperability with C++ for IPC schemas with dictionaries #1694 [arrow]
  • Incorrect Repeated Field Schema Inference #1681 [parquet]
  • Parquet Treats Embedded Arrow Schema as Authoritative #1663 [parquet]
  • parquet_to_arrow_schema_by_columns Incorrectly Handles Nested Types #1654 [parquet]
  • Inconsistent Arrow Schema When Projecting Nested Parquet File #1652 [parquet]
  • StructArrayReader Cannot Handle Nested Lists #1651 [parquet]
  • Bug (substring kernel): The null buffer is not aligned when offset != 0 #1639 [arrow]

Documentation updates:

  • Parquet command line tool does not install "globally" #1710 [parquet]
  • Improve integration test document to follow Arrow C++ repo CI #1742 [arrow] (viirya)

Merged pull requests:

14.0.0 (2022-05-13)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Add support for DataType::Duration in ffi interface #1688 [arrow]
  • Fix generate_unions_case integration test #1676 [arrow]
  • Add DictionaryArray support for bit_length kernel #1673 [arrow]
  • Add DictionaryArray support for length kernel #1672 [arrow]
  • flight_client_scenarios integration test should receive schema from flight data #1669 [arrow]
  • Unpin Flatbuffer version dependency #1667 [arrow]
  • Add dictionary array support for substring function #1656 [arrow]
  • Exclude dict_id and dict_is_ordered from equality comparison of Field #1646 [arrow]
  • Remove StringOffsetTrait and BinaryOffsetTrait #1644 [arrow]
  • Add tests and examples for UnionArray::from(data: ArrayData) #1643 [arrow]
  • Add methods pub fn offsets_buffer, pub fn types_ids_bufferand pub fn data_buffer for ArrayDataBuilder #1640 [arrow]
  • Fix generate_nested_dictionary_case integration test failure for Rust cases #1635 [arrow]
  • Expose ArrowWriter row group flush in public API #1626 [parquet]
  • Add substring support for FixedSizeBinaryArray #1618 [arrow]
  • Add PrettyPrint for UnionArrays #1594 [arrow]
  • Add SIMD support for the length kernel #1489 [arrow]
  • Support dictionary arrays in length and bit_length #1674 [arrow] (viirya)
  • Add dictionary array support for substring function #1665 [arrow] (sunchao)
  • Add DecimalType support in new_null_array #1659 [arrow] (yjshen)

Fixed bugs:

  • Docs.rs build is broken #1695
  • Interoperability with C++ for IPC schemas with dictionaries #1694
  • UnionArray::is_null incorrect #1625 [arrow]
  • Published Parquet documentation missing arrow::async_reader #1617 [parquet]
  • Files written with Julia's Arrow.jl in IPC format cannot be read by arrow-rs #1335 [arrow]

Documentation updates:

Closed issues:

  • Make OffsetSizeTrait::IS_LARGE as a const value #1658
  • Question: Why are there 3 types of OffsetSizeTraits? #1638
  • Written Parquet file way bigger than input files #1627
  • Ensure there is a single zero in the offsets buffer for an empty ListArray. #1620
  • Filtering UnionArray Changes DataType #1595

Merged pull requests:

13.0.0 (2022-04-29)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Read/write nested dictionary under fixed size list in ipc stream reader/write #1609 [arrow]
  • Add support for BinaryArray in substring kernel #1593 [arrow]
  • Read/write nested dictionary under large list in ipc stream reader/write #1584 [arrow]
  • Read/write nested dictionary under map in ipc stream reader/write #1582 [arrow]
  • Implement Clone for JSON DecoderOptions #1580 [arrow]
  • Add utf-8 validation checking to substring kernel #1575 [arrow]
  • Support casting to/from DataType::Null in cast kernel #1572 [arrow] (WinkerDu)

Fixed bugs:

  • Parquet schema should allow scale == precision for decimal type #1606 [parquet]
  • ListArray::from(ArrayData) dereferences invalid pointer when offsets are empty #1601 [arrow]
  • ArrayData Equality Incorrect Null Mask Offset Handling #1599
  • Filtering UnionArray Incorrect Handles Runs #1598
  • [Safety] Filtering Dense UnionArray Produces Invalid Offsets #1596
  • [Safety] UnionBuilder Doesn't Check Types #1591
  • Union Layout Should Not Support Separate Validity Mask #1590
  • Incorrect nullable flag when reading maps ( test_read_maps fails when force_validate is active) #1587 [parquet]
  • Output of ipc::reader::tests::projection_should_work fails validation #1548 [arrow]
  • Incorrect min/max statistics for decimals with byte-array notation #1532

Documentation updates:

Closed issues:

  • Dense UnionArray Offsets Are i32 not i8 #1597 [arrow]
  • Replace &Option<T> with Option<&T> in some APIs #1556 [parquet] [arrow]
  • Improve ergonomics of parquet::basic::LogicalType #1554 [parquet]
  • Mark the current substring function as unsafe and rename it. #1541 [arrow]
  • Requirements for Async Parquet API #1473 [parquet]

Merged pull requests:

12.0.0 (2022-04-15)

Full Changelog

Breaking changes:

  • Add ArrowReaderOptions to ParquetFileArrowReader, add option to skip decoding arrow metadata from parquet (#1459) #1558 [parquet] (tustvold)
  • Support RecordBatch with zero columns but non zero row count, add field to RecordBatchOptions (#1536) #1552 [arrow] (tustvold)
  • Consolidate JSON Reader options and DecoderOptions #1539 [arrow] (alamb)
  • Update prost, prost-derive and prost-types to 0.10, tonic, and tonic-build to 0.7 #1510 [arrow-flight] (alamb)
  • Add Json DecoderOptions and support custom format_string for each field #1451 [arrow] (sum12)

Implemented enhancements:

  • Read/write nested dictionary in ipc stream reader/writer #1565 [arrow]
  • Support FixedSizeBinary in the Arrow C data interface #1553 [arrow]
  • Support Empty Column Projection in ParquetRecordBatchReader #1537 [parquet]
  • Support RecordBatch with zero columns but non zero row count #1536 [arrow]
  • Add support for Date32/Date64<--> String/LargeString in cast kernel #1535 [arrow]
  • Support creating arrays from externally owned memory like Vec or String #1516 [arrow]
  • Speed up the substring kernel #1511 [arrow]
  • Handle Parquet Files With Inconsistent Timestamp Units #1459 [parquet]

Fixed bugs:

  • Error Infering Schema for LogicalType::UNKNOWN #1557 [parquet]
  • Read dictionary from nested struct in ipc stream reader panics #1549 [arrow]
  • filter produces invalid sparse UnionArrays #1547 [arrow]
  • Documentation for GenericListBuilder is not exposed. #1518 [arrow]
  • cannot read parquet file #1515 [parquet]
  • The substring kernel panics when chars > U+0x007F #1478 [arrow]
  • Hang due to infinite loop when reading some parquet files with RLE encoding and bit packing #1458 [parquet]

Documentation updates:

Closed issues:

  • Interesting benchmark results of min_max_helper #1400

Merged pull requests:

11.1.0 (2022-03-31)

Full Changelog

Implemented enhancements:

  • Implement size_hint and ExactSizedIterator for DecimalArray #1505 [arrow]
  • Support calculate length by chars for StringArray #1493 [arrow]
  • Add length kernel support for ListArray #1470 [arrow]
  • The length kernel should work with BinaryArrays #1464 [arrow]
  • FFI for Arrow C Stream Interface #1348 [arrow]
  • Improve performance of DictionaryArray::try_new() #1313 [arrow]

Fixed bugs:

  • MIRI error in math_checked_divide_op/try_from_trusted_len_iter #1496 [arrow]
  • Parquet Writer Incorrect Definition Levels for Nested NullArray #1480 [parquet]
  • FFI: ArrowArray::try_from_raw shouldn't clone #1425 [arrow]
  • Parquet reader fails to read null list. #1399 [parquet]

Documentation updates:

  • A small mistake in the doc of BinaryArray and LargeBinaryArray #1455 [arrow]
  • A small mistake in the doc of GenericBinaryArray::take_iter_unchecked #1454 [arrow]
  • Add links in the doc of BinaryOffsetSizeTrait #1453 [arrow]
  • The doc of FixedSizeBinaryArray is confusing. #1452 [arrow]
  • Clarify docs that SlicesIterator ignores null values #1504 [arrow] (alamb)
  • Update the doc of BinaryArray and LargeBinaryArray #1471 [arrow] (HaoYang670)

Closed issues:

  • packed_simd v.s. portable_simd, which should be used? #1492
  • Cleanup: Use Arrow take kernel Within parquet ListArrayReader #1482 [parquet]

Merged pull requests:

11.0.0 (2022-03-17)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Fix generate_interval_case integration test failure #1445
  • Make the doc examples of ListArray and LargeListArray more readable #1433
  • Redundant if and abs in shift() #1427
  • Improve substring kernel performance #1422 [arrow]
  • Add missing value_unchecked() of FixedSizeBinaryArray #1419
  • Remove duplicate bound check in function shift #1408
  • Support dictionary array in C data interface #1397
  • filter kernel should work with UnionArrays #1394 [arrow]
  • filter kernel should work with FixedSizeListArrayss #1393 [arrow]
  • Add doc examples for creating FixedSizeListArray #1392 [arrow]
  • Update rust-version to 1.59 #1377
  • Arrow IPC projection support #1338
  • Implement basic FlightSQL Server #1386 [arrow-flight] (wangfenjin)

Fixed bugs:

  • DictionaryArray::try_new ignores validity bitmap of the keys #1429 [arrow]
  • The doc of GenericListArray is confusing #1424
  • DeltaBitPackDecoder Incorrectly Handles Non-Zero MiniBlock Bit Width Padding #1417 [parquet]
  • DeltaBitPackEncoder Pads Miniblock BitWidths With Arbitrary Values #1416 [parquet]
  • Possible unaligned write with MutableBuffer::push #1410 [arrow]
  • Integration Test is failing on master branch #1398 [arrow]

Documentation updates:

Merged pull requests:

10.0.0 (2022-03-04)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Add extract month and day in temporal.rs #1387
  • Add clone to IpcWriteOptions #1381 [arrow]
  • Support MapArray in filter kernel #1378 [arrow]
  • Add week temporal kernel #1375 [arrow]
  • Improve performance of compare_dict_op #1371 [arrow]
  • Add support for LargeUtf8 in json writer #1357 [parquet]
  • Make arrow::array::builder::MapBuilder public #1354 [arrow]
  • Refactor StructArray::from #1351 [arrow]
  • Refactor RecordBatch::validate_new_batch #1350 [arrow]
  • Remove redundant has_ methods for optional column metadata fields #1344 [parquet]
  • Add write method to JsonWriter #1340 [arrow]
  • Refactor the code of Bitmap::new #1337 [arrow]
  • Use DictionaryArray's iterator in compare_dict_op #1329 [arrow]
  • Add as_decimal_array(arr: &dyn Array) -> &DecimalArray #1312 [arrow]
  • More ergonomic / idiomatic primitive array creation from iterators #1298 [arrow]
  • Implement DictionaryArray support in eq_dyn, neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn #1201 [arrow]

Fixed bugs:

  • cargo clippy fails on the master branch #1362 [arrow]
  • ArrowArray::try_from_raw should not assume the pointers are from Arc #1333 [arrow]
  • Fix CSV Writer::new to accept delimiter and make WriterBuilder::build use it #1328 [arrow]
  • Make bounds configurable via builder when reading CSV #1327 [arrow]
  • Add with_datetime_format() to CSV WriterBuilder #1272 [arrow]

Performance improvements:

  • Improve performance of min and max aggregation kernels without nulls #1373 [arrow]

Closed issues:

  • Consider removing redundant has_XXX metadata functions in ColumnChunkMetadata #1332

Merged pull requests:

9.1.0 (2022-02-19)

Full Changelog

Implemented enhancements:

Fixed bugs:

  • len is not a parameter of MutableArrayData::extend #1316
  • module data_type is private in Rust Parquet 8.0.0 #1302 [parquet]
  • Test failure: bit_chunk_iterator #1294
  • csv_writer benchmark fails with "no such file or directory" #1292

Documentation updates:

Performance improvements:

Closed issues:

  • Expose column and offset index metadata offset #1317
  • Expose bloom filter metadata offset #1308
  • Improve ergonomics to construct DictionaryArrays from Key and Value arrays #1299
  • Make it easier to iterate over DictionaryArray #1295 [arrow]
  • (WON'T FIX) Don't Interwine Bit and Byte Aligned Operations in BitReader #1282
  • how to create arrow::array from streamReader #1278
  • Remove scientific notation when converting floats to strings. #983

Merged pull requests:

9.0.2 (2022-02-09)

Full Changelog

Breaking changes:

  • Add Send + Sync to DataType, RowGroupReader, FileReader, ChunkReader. #1264
  • Rename the function Bitmap::len to Bitmap::bit_len to clarify its meaning #1242 [parquet] [arrow] (HaoYang670)
  • Remove unused / broken memory-check feature #1222 [arrow] (jhorstmann)
  • Potentially buffer multiple RecordBatches before writing a parquet row group in ArrowWriter #1214 [parquet] [arrow] (tustvold)

Implemented enhancements:

  • Add async arrow parquet reader #1154 [parquet] [arrow] (tustvold)
  • Rename Bitmap::len to Bitmap::bit_len #1233
  • Extend CSV schema inference to allow scientific notation for floating point types #1215 [arrow]
  • Write Multiple RecordBatch to Parquet Row Group #1211
  • Add doc examples for eq_dyn etc. #1202 [arrow]
  • Add comparison kernels for BinaryArray #1108
  • impl ArrowNativeType for i128 #1098
  • Remove Copy trait bound from dyn scalar kernels #1243 [arrow] (matthewmturner)
  • Add into_inner for IPC FileWriter #1236 [arrow] (yjshen)
  • [Minor]Re-export array::builder::make_builder to make it available for downstream #1235 [arrow] (yjshen)

Fixed bugs:

  • Parquet v8.0.0 panics when reading all null column to NullArray #1245 [parquet]
  • Get Unknown configuration option rust-version when running the rust format command #1240
  • Bitmap Length Validation is Incorrect #1231 [arrow]
  • Writing sliced ListArray or MapArray ignore offsets #1226 [parquet]
  • Remove broken memory-tracking crate feature #1171
  • Revert making parquet::data_type and parquet::arrow::schema experimental #1244 [parquet] (tustvold)

Documentation updates:

Performance improvements:

  • Improve performance for arithmetic kernels with simd feature enabled (except for division/modulo) #1221 [arrow] (jhorstmann)
  • Do not concatenate identical dictionaries #1219 [arrow] (tustvold)
  • Preserve dictionary encoding when decoding parquet into Arrow arrays, 60x perf improvement (#171) #1180 [parquet] (tustvold)

Closed issues:

  • UnalignedBitChunkIterator to that iterates through already aligned u64 blocks #1227
  • Remove unused ArrowArrayReader in parquet #1197 [parquet]

Merged pull requests:

8.0.0 (2022-01-20)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Parquet reader should be able to read structs within list #1186 [parquet]
  • Disable serde_json arbitrary_precision feature flag #1174 [arrow]
  • Simplify and reduce code duplication in arithmetic.rs #1160 [arrow]
  • Return Err from JSON writer rather than panic! for unsupported types #1157 [arrow]
  • Support scalar mathematics kernels for Array and scalar value #1153 [arrow]
  • Support DecimalArray in sort kernel #1137
  • Parquet Fuzz Tests #1053
  • BooleanBufferBuilder Append Packed #1038 [arrow]
  • parquet Performance Optimization: StructArrayReader Redundant Level & Bitmap Computation #1034 [parquet]
  • Reduce Public Parquet API #1032 [parquet]
  • Add from_iter_values for binary array #1188 [arrow] (Jimexist)
  • Add support for MapArray in json writer #1149 [arrow] (helgikrs)

Fixed bugs:

  • Empty string arrays with no nulls are not equal #1208 [arrow]
  • Pretty print a RecordBatch containing Float16 triggers a panic #1193 [arrow]
  • Writing structs nested in lists produces an incorrect output #1184 [parquet]
  • Undefined behavior for GenericStringArray::from_iter_values if reported iterator upper bound is incorrect #1144 [arrow]
  • Interval comparisons with simd feature asserts #1136 [arrow]
  • RecordReader Permits Illegal Types #1132 [parquet]

Security fixes:

Documentation updates:

Performance improvements:

  • Improve parquet reading performance for columns with nulls by preserving bitmask when possible (#1037) #1054 [parquet] [arrow] (tustvold)
  • Improve parquet performance: Skip levels computation for required struct arrays in parquet #1035 [parquet] (tustvold)

Closed issues:

  • Generify ColumnReaderImpl and RecordReader #1040 [parquet]
  • Parquet Preserve BitMask #1037

Merged pull requests:

7.0.0 (2022-1-07)

Full Changelog

Arrow

Breaking changes:

  • pretty_format_batches now returns Result<impl Display> rather than String: #975
  • MutableBuffer::typed_data_mut is marked unsafe: #1029
  • UnionArray updated match latest Arrow spec, added UnionMode, UnionArray::new() marked unsafe: #885

New Features:

  • Support for Float16Array types #888
  • IPC support for UnionArray #654
  • Dynamic comparison kernels for scalars (e.g. eq_dyn_scalar), including DictionaryArray: #1113

Enhancements:

  • Added Schema::with_metadata and Field::with_metadata #1092
  • Support for custom datetime format for inference and parsing csv files #1112
  • Implement Array for ArrayRef for easier use #1129
  • Pretty printing display support for FixedSizeBinaryArray #1097
  • Dependency Upgrades: pyo3, parquet-format, prost, tonic
  • Avoid allocating vector of indices in lexicographical_partition_ranges#998

Parquet

Fixed bugs:

  • (parquet) Fix reading of dictionary encoded pages with null values: #1130

Changelog

6.5.0 (2021-12-23)

Full Changelog

6.4.0 (2021-12-10)

Full Changelog

6.3.0 (2021-11-26)

Full Changelog

Changes:

6.2.0 (2021-11-12)

Full Changelog

Features / Fixes:

6.1.0 (2021-10-29)

Full Changelog

Features / Fixes:

Other:

6.0.0 (2021-10-13)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Improve parquet binary writer speed by reducing allocations #819
  • Expose buffer operations #808
  • Add doc examples of writing parquet files using ArrowWriter #788

Fixed bugs:

  • JSON reader can create null struct children on empty lists #825
  • Incorrect null count for cast kernel for list arrays #815
  • minute and second temporal kernels do not respect timezone #500
  • Fix data corruption in json decoder f64-to-i64 cast #652 [arrow] (xianwill)

Documentation updates:

5.5.0 (2021-09-24)

Full Changelog

Implemented enhancements:

  • parquet should depend on a small set of arrow features #800
  • Support equality on RecordBatch #735

Fixed bugs:

  • Converting from string to timestamp uses microseconds instead of milliseconds #780
  • Document has no link to RowColumIter #762
  • length on slices with null doesn't work #744

5.4.0 (2021-09-10)

Full Changelog

Implemented enhancements:

  • Upgrade lexical-core to 0.8 #747
  • append_nulls and append_trusted_len_iter for PrimitiveBuilder #725
  • Optimize MutableArrayData::extend for null buffers #397

Fixed bugs:

  • Arithmetic with scalars doesn't work on slices #742
  • Comparisons with scalar don't work on slices #740
  • unary kernel doesn't respect offset #738
  • new_null_array creates invalid struct arrays #734
  • --no-default-features is broken for parquet #733 [parquet]
  • Bitmap::len returns the number of bytes, not bits. #730
  • Decimal logical type is formatted incorrectly by print_schema #713
  • parquet_derive does not support chrono time values #711
  • Numeric overflow when formatting Decimal type #710
  • The integration tests are not running #690

Closed issues:

  • Question: Is there no way to create a DictionaryArray with a pre-arranged mapping? #729

5.3.0 (2021-08-26)

Full Changelog

Implemented enhancements:

  • Add optimized filter kernel for regular expression matching #697
  • Can't cast from timestamp array to string array #587

Fixed bugs:

  • 'Encoding DELTA_BYTE_ARRAY is not supported' with parquet arrow readers #708
  • Support reading json string into binary data type. #701

Closed issues:

  • Resolve Issues with prettytable-rs dependency #69 [arrow]

5.2.0 (2021-08-12)

Full Changelog

Implemented enhancements:

  • Make rand an optional dependency #671
  • Remove undefined behavior in value method of boolean and primitive arrays #645
  • Avoid materialization of indices in filter_record_batch for single arrays #636
  • Add a note about arrow crate security / safety #627
  • Allow the creation of String arrays from an interator of &Option<&str> #598
  • Support arrow map datatype #395

Fixed bugs:

  • Parquet fixed length byte array columns write byte array statistics #660 [parquet]
  • Parquet boolean columns write Int32 statistics #659 [parquet]
  • Writing Parquet with a boolean column fails #657
  • JSON decoder data corruption for large i64/u64 #653
  • Incorrect min/max statistics for strings in parquet files #641 [parquet]

Closed issues:

  • Release candidate verifying script seems work on macOS #640
  • Update CONTRIBUTING #342

5.1.0 (2021-07-29)

Full Changelog

Implemented enhancements:

  • Make FFI_ArrowArray empty() public #602
  • exponential sort can be used to speed up lexico partition kernel #586
  • Implement sort() for binary array #568
  • primitive sorting can be improved and more consistent with and without limit if sorted unstably #553

Fixed bugs:

  • Confusing memory usage with CSV reader #623
  • FFI implementation deviates from specification for array release #595
  • Parquet file content is different if ~/.cargo is in a git checkout #589
  • Ensure output of MIRI is checked for success #581
  • MIRI failure in array::ffi::tests::test_struct and other ffi tests #580
  • ListArray equality check may return wrong result #570
  • cargo audit failed #561
  • ArrayData::slice() does not work for nested types such as StructArray #554

Documentation updates:

  • More examples of how to construct Arrays #301

Closed issues:

  • Implement StringBuilder::append_option #263 [arrow]

5.0.0 (2021-07-14)

Full Changelog

Breaking changes:

Implemented enhancements:

Fixed bugs:

  • Error building on master - error: cyclic package dependency: package ahash v0.7.4 depends on itself. Cycle #544
  • IPC reader panics with out of bounds error #541
  • Take kernel doesn't handle nulls and structs correctly #530 [arrow]
  • master fails to compile with default-features=false #529
  • README developer instructions out of date #523
  • Update rustc and packed_simd in CI before 5.0 release #517
  • Incorrect memory usage calculation for dictionary arrays #503 [arrow]
  • sliced null buffers lead to incorrect result in take kernel (and probably on other places) #502
  • Cast of utf8 types and list container types don't respect offset #334 [arrow]
  • fix take kernel null handling on structs #531 [arrow] (bjchambers)
  • Correct array memory usage calculation for dictionary arrays #505 [arrow] (jhorstmann)
  • parquet: improve BOOLEAN writing logic and report error on encoding fail #443 [parquet] (garyanaplan)
  • Fix bug with null buffer offset in boolean not kernel #418 [arrow] (jhorstmann)
  • respect offset in utf8 and list casts #335 [arrow] (ritchie46)
  • Fix comparison of dictionaries with different values arrays (#332) #333 [arrow] (tustvold)
  • ensure null-counts are written for all-null columns #307 [parquet] (crepererum)
  • fix invalid null handling in filter #296 [arrow] (ritchie46)
  • fix NaN handling in parquet statistics #256 (crepererum)

Documentation updates:

Merged pull requests:

4.4.0 (2021-06-24)

Full Changelog

Breaking changes:

  • migrate partition kernel to use Iterator trait #437 [arrow]
  • Remove DictionaryArray::keys_array #391 [arrow]

Implemented enhancements:

  • sort kernel boolean sort can be O(n) #447 [arrow]
  • C data interface for decimal128, timestamp, date32 and date64 #413
  • Add Decimal to CsvWriter #405
  • Use iterators to increase performance of creating Arrow arrays #200 [parquet]

Fixed bugs:

  • Release Audit Tool (RAT) is not being triggered #481
  • Security Vulnerabilities: flatbuffers: read_scalar and read_scalar_at allow transmuting values without unsafe blocks #476
  • Clippy broken after upgrade to rust 1.53 #467
  • Pull Request Labeler is not working #462
  • Arrow 4.3 release: error[E0658]: use of unstable library feature 'partition_point': new API #456
  • parquet reading hangs when row_group contains more than 2048 rows of data #349
  • Fail to build arrow #247
  • JSON reader does not implement iterator #193 [arrow]

Security fixes:

  • Ensure a successful MIRI Run on CI #227

Closed issues:

  • sort kernel has a lot of unnecessary wrapping #446
  • [Parquet] Plain encoded boolean column chunks limited to 2048 values #48 [parquet]

4.3.0 (2021-06-10)

Full Changelog

Implemented enhancements:

  • Add partitioning kernel for sorted arrays #428 [arrow]
  • Implement sort by float lists #427 [arrow]
  • Derive Eq and PartialEq for SortOptions #426 [arrow]
  • use prettier and github action to normalize markdown document syntax #399
  • window::shift can work for more than just primitive array type #392
  • Doctest for ArrayBuilder #366

Fixed bugs:

  • Boolean not kernel does not take offset of null buffer into account #417
  • my contribution not marged in 4.2 release #394
  • window::shift shall properly handle boundary cases #387
  • Parquet WriterProperties.max_row_group_size not wired up #257
  • Out of bound reads in chunk iterator #198 [arrow]

4.2.0 (2021-05-29)

Full Changelog

Breaking changes:

  • DictionaryArray::values() clones the underlying ArrayRef #313 [arrow]

Implemented enhancements:

  • Simplify shift kernel using null array #371
  • Provide Arc-based constructor for parquet::util::cursor::SliceableCursor #368
  • Add badges to crates #361
  • Consider inlining PrimitiveArray::value #328
  • Implement automated release verification script #327
  • Add wasm32 to the list of target architectures of the simd feature #316
  • add with_escape for csv::ReaderBuilder #315 [arrow]
  • IPC feature gate #310
  • csv feature gate #309 [arrow]
  • Add shrink_to / shrink_to_fit to MutableBuffer #297

Fixed bugs:

  • Incorrect crate setup instructions #364
  • Arrow-flight only register rerun-if-changed if file exists #350
  • Dictionary Comparison Uses Wrong Values Array #332
  • Undefined behavior in FFI implementation #322
  • All-null column get wrong parquet null-counts #306 [parquet]
  • Filter has inconsistent null handling #295

4.1.0 (2021-05-17)

Full Changelog

Implemented enhancements:

  • Add Send to ArrayBuilder #290 [arrow]
  • Improve performance of bound checking option #280 [arrow]
  • extend compute kernel arity to include nullary functions #276
  • Implement FFI / CDataInterface for Struct Arrays #251 [arrow]
  • Add support for pretty-printing Decimal numbers #230 [arrow]
  • CSV Reader String Dictionary Support #228 [arrow]
  • Add Builder interface for adding Arrays to record batches #210 [arrow]
  • Support auto-vectorization for min/max #209 [arrow]
  • Support LargeUtf8 in sort kernel #25 [arrow]

Fixed bugs:

  • no method named select_nth_unstable_by found for mutable reference &mut [T] #283
  • Rust 1.52 Clippy error #266
  • NaNs can break parquet statistics #255 [parquet]
  • u64::MAX does not roundtrip through parquet #254 [parquet]
  • Integration tests failing to compile (flatbuffer) #249 [arrow]
  • Fix compatibility quirks between arrow and parquet structs #245 [parquet]
  • Unable to write non-null Arrow structs to Parquet #244 [parquet]
  • schema: missing field metadata when deserialize #241 [arrow]
  • Arrow does not compile due to flatbuffers upgrade #238 [arrow]
  • Sort with limit panics for the limit includes some but not all nulls, for large arrays #235 [arrow]
  • arrow-rs contains a copy of the "format" directory #233 [arrow]
  • Fix SEGFAULT/ SIGILL in child-data ffi #206 [arrow]
  • Read list field correctly in <struct<list>> #167 [parquet]
  • FFI listarray lead to undefined behavior. #20

Security fixes:

Documentation updates:

  • Comment out the instructions in the PR template #277
  • Update links to datafusion and ballista in README.md #19
  • Update "repository" in Cargo.toml #12

Closed issues:

  • Arrow Aligned Vec #268
  • [Rust]: Tracking issue for AVX-512 #220 [arrow]
  • Umbrella issue for clippy integration #217 [arrow]
  • Support sort #215 [arrow]
  • Support stable Rust #214 [arrow]
  • Remove Rust and point integration tests to arrow-rs repo #211 [arrow]
  • ArrayData buffers are inconsistent accross implementations #207
  • 3.0.1 patch release #204
  • Document patch release process #202
  • Simplify Offset #186 [arrow]
  • Typed Bytes #185 [arrow]
  • [CI]docker-compose setup should enable caching #175
  • Improve take primitive performance #174
  • [CI] Try out buildkite #165 [arrow]
  • Update assignees in JIRA where missing #160
  • [Rust]: From<ArrayDataRef> implementations should validate data type #103 [arrow]
  • [DataFusion] Verify that projection push down does not remove aliases columns #99 [arrow]
  • [Rust][DataFusion] Implement modulus expression #98 [arrow]
  • [DataFusion] Add constant folding to expressions during logically planning #96 [arrow]
  • [DataFusion] DataFrame.collect should return RecordBatchReader #95 [arrow]
  • [Rust][DataFusion] Add FORMAT to explain plan and an easy to visualize format #94 [arrow]
  • [DataFusion] Implement metrics framework #90 [arrow]
  • [DataFusion] Implement micro benchmarks for each operator #89 [arrow]
  • [DataFusion] Implement pretty print for physical query plan #88 [arrow]
  • [Archery] Support rust clippy in the lint command #83
  • [rust][datafusion] optimize count(*) queries on parquet sources #75 [arrow]
  • [Rust][DataFusion] Improve like/nlike performance #71 [arrow]
  • [DataFusion] Implement optimizer rule to remove redundant projections #56 [arrow]
  • [DataFusion] Parquet data source does not support complex types #39 [arrow]
  • Merge utils from Parquet and Arrow #32 [arrow] [parquet]
  • Add benchmarks for Parquet #30 [parquet]
  • Mark methods that do not perform bounds checking as unsafe #28 [arrow]
  • Test issue #24 [arrow]
  • This is a test issue #11