21.0.0 (2022-08-18)
Breaking changes:
- Return structured
ColumnCloseResult
(#2465) #2466 [parquet] (tustvold) - Push
ChunkReader
intoSerializedPageReader
(#2463) #2464 [parquet] (tustvold) - Revise FromIterator for Decimal128Array to use Into instead of Borrow #2442 [parquet] [arrow] (viirya)
- Use Fixed-Length Array in BasicDecimal new and raw_value #2405 [arrow] (HaoYang670)
- Remove deprecated ParquetWriter #2380 [parquet] (tustvold)
- Remove deprecated SliceableCursor and InMemoryWriteableCursor #2378 [parquet] (tustvold)
Implemented enhancements:
- add into_inner method to ArrowWriter #2491 [parquet]
- Remove byteorder dependency #2472 [parquet]
- Return Structured ColumnCloseResult from GenericColumnWriter::close #2465 [parquet]
- Push
ChunkReader
intoSerializedPageReader
#2463 [parquet] - Support SerializedPageReader::skip_page without OffsetIndex #2459 [parquet]
- Support Time64/Time32 comparison #2457 [arrow]
- Revise FromIterator for Decimal128Array to use Into instead of Borrow #2441 [parquet]
- Support
RowFilter
withinParquetRecordBatchReader
#2431 [parquet] - Remove the field
StructBuilder::len
#2429 [arrow] - Standardize creation and configuration of parquet --> Arrow readers (
ParquetRecordBatchReaderBuilder
) #2427 [parquet] - Use
OffsetIndex
to Prune IO inParquetRecordBatchStream
#2426 [parquet] - Support
peek_next_page
andskip_next_page
inInMemoryPageReader
#2406 [parquet] - Support casting from
Utf8
/LargeUtf8
toBinary
/LargeBinary
#2402 [arrow] - Support casting between
Decimal128
andDecimal256
arrays #2375 [arrow] - Combine multiple selections into the same batch size in
skip_records
#2358 [parquet] - Add API to change timezone for timestamp array #2346 [arrow]
- Change the output of
read_buffer
Arrow IPC API to returnResult<_>
#2342 [arrow] - Allow
skip_records
inGenericColumnReader
to skip across row groups #2331 [parquet] - Optimize the validation of
Decimal256
#2320 [arrow] - Implement Skip for
DeltaBitPackDecoder
#2281 [parquet] - Changes to
ParquetRecordBatchStream
to support row filtering in DataFusion #2270 [parquet] - Add
ArrayReader::skip_records
API #2197 [parquet]
Fixed bugs:
- Panic in SerializedPageReader without offset index #2503 [parquet]
- MapArray columns don't handle null values correctly #2484 [arrow]
- There is no compiler error when using an invalid Decimal type. #2440 [arrow]
- Flight SQL Server sends incorrect response for
DoPutUpdateResult
#2403 [arrow-flight] AsyncFileReader
No Longer Object-Safe #2372 [parquet]- StructBuilder Does not Verify Child Lengths #2252 [arrow]
Closed issues:
Merged pull requests:
- Fix bug in page skipping #2504 [parquet] (thinkharderdev)
- Fix
MapArrayReader
(#2484) (#1699) (#1561) #2500 [parquet] (tustvold) - Add API to Retrieve Finished Writer from Parquet Writer #2498 [parquet] (jiacai2050)
- Derive Copy,Clone for BasicDecimal #2495 [arrow] (tustvold)
- remove byteorder dependency from parquet #2486 [parquet] (psvri)
- parquet-read: add support to read parquet data from stdin #2482 [parquet] (nvartolomei)
- Remove Position trait (#1163) #2479 [parquet] (tustvold)
- Add ChunkReader::get_bytes #2478 [parquet] (tustvold)
- RFC: Simplify decimal (#2440) #2477 [arrow] (tustvold)
- Use Parquet OffsetIndex to prune IO with RowSelection #2473 [parquet] (thinkharderdev)
- Remove unnecessary Option from Int96 #2471 [parquet] (tustvold)
- remove len field from StructBuilder #2468 [arrow] (psvri)
- Make Parquet reader filter APIs public (#1792) #2467 [parquet] (tustvold)
- enable ipc compression feature for integration test #2462 (liukun4515)
- Simplify implementation of Schema #2461 [arrow] (HaoYang670)
- Support skip_page missing OffsetIndex Fallback in SerializedPageReader #2460 [parquet] (Ted-Jiang)
- support time32/time64 comparison #2458 [arrow] (waitingkuo)
- Utf8array casting #2456 [arrow] (psvri)
- Remove outdated license text #2455 (alamb)
- Support RowFilter within ParquetRecordBatchReader (#2431) #2452 [parquet] (tustvold)
- benchmark: decimal builder and vec to decimal array #2450 [arrow] (liukun4515)
- Collocate Decimal Array Validation Logic #2446 [arrow] (liukun4515)
- Minor: Move From trait for Decimal256 impl to decimal.rs #2443 [arrow] (liukun4515)
- decimal benchmark: arrow reader decimal from parquet int32 and int64 #2438 [parquet] (liukun4515)
- MINOR: Simplify
split_second
function #2436 [arrow] (viirya) - Add ParquetRecordBatchReaderBuilder (#2427) #2435 [parquet] (tustvold)
- refactor: refine validation for decimal128 array #2428 [arrow] (liukun4515)
- Benchmark of casting decimal arrays #2424 [arrow] (viirya)
- Test non-annotated repeated fields (#2394) #2422 [parquet] (tustvold)
- Fix #2416 Automatic version updates for github actions with dependabot #2417 (iemejia)
- Add validation logic for StructBuilder::finish #2413 [arrow] (psvri)
- test: add test for reading decimal value from primitive array reader #2411 [parquet] (liukun4515)
- Upgrade ahash to 0.8 #2410 [parquet] [arrow] (Dandandan)
- Support peek_next_page and skip_next_page in InMemoryPageReader #2407 [parquet] (Ted-Jiang)
- Fix DoPutUpdateResult #2404 [arrow-flight] (avantgardnerio)
- Implement Skip for DeltaBitPackDecoder #2393 [parquet] (Ted-Jiang)
- fix: Don't instantiate the scalar composition code quadratically for dictionaries #2391 [arrow] (Marwes)
- MINOR: Remove unused trait and some cleanup #2389 [arrow] (viirya)
- Decouple parquet fuzz tests from converter (#1661) #2386 [parquet] (tustvold)
- Rewrite
Decimal
andDecimalArray
usingconst_generic
#2383 [parquet] [arrow] (HaoYang670) - Simplify BitReader (~5-10% faster) #2381 [parquet] (tustvold)
- Fix parquet clippy lints (#1254) #2377 [parquet] (tustvold)
- Cast between
Decimal128
andDecimal256
arrays #2376 [arrow] (viirya) - support compression for IPC with revamped feature flags #2369 [arrow] (alamb)
- Implement AsyncFileReader for
Box<dyn AsyncFileReader>
#2368 [parquet] (tustvold) - Remove get_byte_ranges where bound #2366 [parquet] (tustvold)
- refactor: Make read_num_bytes a function instead of a macro #2364 [parquet] (Marwes)
- refactor: Group metrics into page and column metrics structs #2363 [parquet] (Marwes)
- Speed up
Decimal256
validation based on bytes comparison and add benchmark test #2360 [parquet] [arrow] (liukun4515) - Combine multiple selections into the same batch size in skip_records #2359 [parquet] (Ted-Jiang)
- Add API to change timezone for timestamp array #2347 [arrow] (viirya)
- Clean the code in
field.rs
and add more tests #2345 [arrow] (HaoYang670) - Add Parquet RowFilter API #2335 [parquet] (tustvold)
- Make skip_records in complex_object_array can skip cross row groups #2332 [parquet] (Ted-Jiang)
- Integrate Record Skipping into Column Reader Fuzz Test #2315 [parquet] (Ted-Jiang)
20.0.0 (2022-08-05)
Breaking changes:
- Add more const evaluation for
GenericBinaryArray
andGenericListArray
: addPREFIX
and data type constructor #2327 [parquet] [arrow] (HaoYang670) - Make FFI support optional, change APIs to be
safe
(#2302) #2303 [arrow] (tustvold) - Remove
test_utils
from default features (#2298) #2299 [arrow] (tustvold) - Rename
DataType::Decimal
toDataType::Decimal128
#2229 [parquet] [arrow] (viirya) - Add
Decimal128Iter
andDecimal256Iter
and do maximum precision/scale check #2140 [arrow] (viirya)
Implemented enhancements:
- Add the constant data type constructors for
ListArray
#2311 [arrow] - Update
FlightSqlService
trait to pass session info along #2308 [arrow-flight] - Optimize
take_bits
for non-null indices #2306 [arrow] - Make FFI support optional via Feature Flag
ffi
#2302 [arrow] - Mark
ffi::ArrowArray::try_new
is safe #2301 [arrow] - Remove test_utils from default arrow-rs features #2298 [arrow]
- Remove
JsonEqual
trait #2296 [arrow] - Move
with_precision_and_scale
toDecimal
array traits #2291 [arrow] - Improve readability and maybe performance of string --> numeric/time/date/timetamp cast kernels #2285 [arrow]
- Add vectorized unpacking for 8, 16, and 64 bit integers #2276 [parquet]
- Use initial capacity for interner hashmap #2273 [arrow]
- Impl FromIterator for Decimal256Array #2248 [arrow]
- Separate
ArrayReader::next_batch
withArrayReader::read_records
andArrayReader::consume_batch
#2236 [parquet] - Rename
DataType::Decimal
toDataType::Decimal128
#2228 [arrow] - Automatically Grow Parquet BitWriter Buffer #2226 [parquet]
- Add
append_option
support toDecimal128Builder
andDecimal256Builder
#2224 [arrow] - Split the
FixedSizeBinaryArray
andFixedSizeListArray
fromarray_binary.rs
andarray_list.rs
#2217 [arrow] - Don't
Box
Values inPrimitiveDictionaryBuilder
#2215 [arrow] - Use BitChunks in equal_bits #2186 [arrow]
- Implement
Hash
forSchema
#2182 [arrow] - read decimal data type from parquet file with binary physical type #2159 [parquet]
- The
GenericStringBuilder
should useGenericBinaryBuilder
#2156 [arrow] - Update Rust version to 1.62 #2143 [parquet] [arrow] [arrow-flight]
- Check precision and scale against maximum value when constructing
Decimal128
andDecimal256
#2139 [arrow] - Use
ArrayAccessor
inDecimal128Iter
andDecimal256Iter
#2138 [arrow] - Use
ArrayAccessor
andFromIterator
in Cast Kernels #2137 [arrow] - Add
TypedDictionaryArray
for more ergonomic interaction withDictionaryArray
#2136 [arrow] - Use
ArrayAccessor
in Comparison Kernels #2135 [arrow] - Support
peek_next_page()
and skip_next_page
inInMemoryColumnChunkReader
#2129 [parquet] - Lazily materialize the null buffer builder for all array builders. #2125 [arrow]
- Do value validation for
Decimal256
#2112 [arrow] - Support
skip_def_levels
forColumnLevelDecoder
#2107 [parquet] - Add integration test for scan rows with selection #2106 [parquet]
- Support for casting from Utf8/String to
Time32
/Time64
#2053 [arrow] - Update prost and tonic related crates #2268 [arrow-flight] (carols10cents)
Fixed bugs:
- temporal conversion functions cannot work on negative input properly #2325 [arrow]
- IPC writer should truncate string array with all empty string #2312 [arrow]
- Error order for comparing
Decimal128
orDecimal256
#2256 [arrow] - Fix maximum and minimum for decimal values for precision greater than 38 #2246 [arrow]
IntervalMonthDayNanoType::make_value()
does not match C implementation #2234 [arrow]FlightSqlService
trait does not allowimpl
s to do handshake #2210 [arrow-flight]EnabledStatistics::None
not working #2185 [parquet]- Boolean ArrayData Equality Incorrect Slice Handling #2184 [arrow]
- Publicly export MapFieldNames #2118 [arrow]
Documentation updates:
- Update instructions on How to join the slack #arrow-rust channel -- or maybe try to switch to discord?? #2192
- [Minor] Improve arrow and parquet READMEs, document parquet feature flags #2324 [parquet] [arrow] (alamb)
Performance improvements:
- Improve speed of writing string dictionaries to parquet by skipping a copy(#1764) #2322 [parquet] [arrow] (tustvold)
Closed issues:
- Fix wrong logic in calculate_row_count when skipping values #2328 [parquet]
- Support filter for parquet data type #2126 [parquet]
- Make skip value in ByteArrayDecoderDictionary avoid decoding #2088 [parquet]
Merged pull requests:
- fix: Fix skip error in calculate_row_count. #2329 [parquet] (Ted-Jiang)
- temporal conversion functions should work on negative input properly #2326 [arrow] (viirya)
- Increase DeltaBitPackEncoder miniblock size to 64 for 64-bit integers (#2282) #2319 [parquet] (tustvold)
- Remove JsonEqual #2317 [parquet] [arrow] (viirya)
- fix: IPC writer should truncate string array with all empty string #2314 [arrow] (JasonLi-cn)
- Pass pull
Request<FlightDescriptor>
toFlightSqlService
impl
s #2309 [parquet] [arrow-flight] (avantgardnerio) - Speedup take_boolean / take_bits for non-null indices (~4 - 5x speedup) #2307 [arrow] (Dandandan)
- Add typed dictionary (#2136) #2297 [arrow] (tustvold)
- [Minor] Improve types shown in cast error messages #2295 [arrow] (alamb)
- Move
with_precision_and_scale
toBasicDecimalArray
trait #2292 [parquet] [arrow] (viirya) - Replace the
fn get_data_type
byconst DATA_TYPE
in BinaryArray and StringArray #2289 [arrow] (HaoYang670) - Clean up string casts and improve performance #2284 [arrow] (alamb)
- [Minor] Add tests for temporal cast error paths #2283 [arrow] (alamb)
- Add unpack8, unpack16, unpack64 (#2276) ~10-50% faster #2278 [parquet] (tustvold)
- Fix bugs in the
from_list
function. #2277 [arrow] (HaoYang670) - fix: use signed comparator to compare decimal128 and decimal256 #2275 [arrow] (liukun4515)
- Use initial capacity for interner hashmap #2272 [parquet] (Dandandan)
- Remove fallibility from paruqet RleEncoder (#2226) #2259 [parquet] (tustvold)
- Fix escaped like wildcards in
like_utf8
/nlike_utf8
kernels #2258 [arrow] (daniel-martinez-maqueda-sap) - Add tests for reading nested decimal arrays from parquet #2254 [parquet] (tustvold)
- feat: Implement string cast operations for Time32 and Time64 #2251 [arrow] (stuartcarnie)
- move
FixedSizeList
toarray_fixed_size_list.rs
#2250 [arrow] (HaoYang670) - Impl FromIterator for Decimal256Array #2247 [arrow] (viirya)
- Fix max and min value for decimal precision greater than 38 #2245 [arrow] (viirya)
- Make
Schema::fields
andSchema::metadata
pub
(public) #2239 [arrow] (alamb) - [Minor] Improve Schema metadata mismatch error #2238 [arrow] (alamb)
- Separate ArrayReader::next_batch with read_records and consume_batch #2237 [parquet] (Ted-Jiang)
- Update
IntervalMonthDayNanoType::make_value()
to conform to specifications #2235 [arrow] (avantgardnerio) - Disable value validation for Decimal256 case #2232 [arrow] (viirya)
- Automatically grow parquet BitWriter (#2226) (~10% faster) #2231 [parquet] (tustvold)
- Only trigger
arrow
CI on changes to arrow #2227 (alamb) - Add append_option support to decimal builders #2225 [arrow] (bphillips-exos)
- Optimized writing of byte array to parquet (#1764) (2x faster) #2221 [parquet] (tustvold)
- Increase test coverage of ArrowWriter #2220 [parquet] (tustvold)
- Update instructions on how to join the Slack channel #2219 (HaoYang670)
- Move
FixedSizeBinaryArray
toarray_fixed_size_binary.rs
#2218 [arrow] (HaoYang670) - Avoid boxing in PrimitiveDictionaryBuilder #2216 [arrow] (tustvold)
- remove redundant CI benchmark check, cleanups #2212 [parquet] (alamb)
- Update
FlightSqlService
trait to proxy handshake #2211 [arrow-flight] (avantgardnerio) - parquet: export json api with
serde_json
feature name #2209 [parquet] (flisky) - Cleanup record skipping logic and tests (#2158) #2199 [parquet] (tustvold)
- Use BitChunks in equal_bits #2194 [arrow] (tustvold)
- Fix disabling parquet statistics (#2185) #2191 [parquet] (tustvold)
- Change CI names to match crate names #2189 (alamb)
- Fix offset handling in boolean_equal (#2184) #2187 [arrow] (tustvold)
- Implement
Hash
forSchema
#2183 [arrow] (crepererum) - Let the
StringBuilder
useBinaryBuilder
#2181 [arrow] (HaoYang670) - Use ArrayAccessor and FromIterator in Cast Kernels #2169 [arrow] (viirya)
- Split most arrow specific CI checks into their own workflows (reduce common CI time to 21 minutes) #2168 (alamb)
- Remove another attempt to cache target directory in action.yaml #2167 (alamb)
- Run actions on push to master, pull requests #2166 (alamb)
- Break parquet_derive and arrow_flight tests into their own workflows #2165 (alamb)
- [minor] use type aliases refine code. #2161 [parquet] (Ted-Jiang)
- parquet reader: Support reading decimals from parquet
BYTE_ARRAY
type #2160 [parquet] (liukun4515) - Add integration test for scan rows with selection #2158 [parquet] (Ted-Jiang)
- Use ArrayAccessor in Comparison Kernels #2157 [arrow] (viirya)
- Implement
peek\_next\_page
andskip\_next\_page
for `InMemoryColumnCh… #2155 [parquet] (thinkharderdev) - Avoid decoding unneeded values in ByteArrayDecoderDictionary #2154 [parquet] (thinkharderdev)
- Only run integration tests when
arrow
changes #2152 (alamb) - Break out docs CI job to its own github action #2151 (alamb)
- Do not pretend to cache rust build artifacts, speed up CI by ~20% #2150 (alamb)
- Update rust version to 1.62 #2144 [parquet] [arrow] [arrow-flight] (Ted-Jiang)
- Make MapFieldNames public (#2118) #2134 [arrow] (tustvold)
- Add ArrayAccessor trait, remove duplication in array iterators (#1948) #2133 [arrow] (tustvold)
- Lazily materialize the null buffer builder for all array builders. #2127 [arrow] (HaoYang670)
- Faster parquet DictEncoder (~20%) #2123 [parquet] (tustvold)
- Add validation for Decimal256 #2113 [arrow] (viirya)
- Support skip_def_levels for ColumnLevelDecoder #2111 [parquet] (Ted-Jiang)
- Donate
object_store
code from object_store_rs to arrow-rs #2081 (alamb) - Improve
validate_utf8
performance #2048 [arrow] (tfeda)
19.0.0 (2022-07-22)
Breaking changes:
- Rename
DecimalArray``/DecimalBuilder
toDecimal128Array
/Decimal128Builder
#2101 [arrow] - Change builder
append
methods to be infallible where possible #2103 [parquet] [arrow] (jhorstmann) - Return reference from
UnionArray::child
(#2035) #2099 [arrow] (tustvold) - Remove
preserve_order
feature fromserde_json
dependency (#2095) #2098 [parquet] [arrow] (tustvold) - Rename
weekday
andweekday0
kernels to tonum_days_from_monday
andnum_days_since_sunday
#2066 [arrow] (alamb) - Remove
null_count
fromwrite_batch_with_statistics
#2047 [parquet] (tustvold)
Implemented enhancements:
- Use
total_cmp
from std #2130 [arrow] - Permit parallel fetching of column chunks in
ParquetRecordBatchStream
#2110 [parquet] - The
GenericBinaryBuilder
should use buffer builders directly. #2104 [arrow] - Pass
generate_decimal256_case
arrow integration test #2093 [arrow] - Rename
weekday
andweekday0
kernels to tonum_days_from_monday
anddays_since_sunday
#2065 [arrow] - Improve performance of
filter_dict
#2062 [arrow] - Improve performance of
set_bits
#2060 [arrow] - Lazily materialize the null buffer builder of
BooleanBuilder
#2058 [arrow] BooleanArray::from_iter
should omit validity buffer if all values are valid #2055 [arrow]- FFI_ArrowSchema should set
DICTIONARY_ORDERED
flag if a field's dictionary is ordered #2049 [arrow] - Support
peek_next_page()
andskip_next_page
inSerializedPageReader
#2043 [parquet] - Support FFI / C Data Interface for
MapType
#2037 [arrow] - The
DecimalArrayBuilder
should useFixedSizedBinaryBuilder
#2026 [arrow] - Enable
serialized_reader
read specific Page by passing row ranges. #1976 [parquet]
Fixed bugs:
type_id
andvalue_offset
are incorrect for slicedUnionArray
#2086 [arrow]- Boolean
take
kernel does not handle null indices correctly #2057 [arrow] - Don't double-count nulls in
write_batch_with_statistics
#2046 [parquet] - Parquet Writer Ignores Statistics specification in
WriterProperties
#2014 [parquet]
Documentation updates:
Closed issues:
- Why does
serde_json
specify thepreserve_order
feature inarrow
package #2095 [arrow] - Support
skip_values
in DictionaryDecoder #2079 [parquet] - Support skip_values in ColumnValueDecoderImpl #2078 [parquet]
- Support
skip_values
inByteArrayColumnValueDecoder
#2072 [parquet] - Several
Builder::append
methods returning results even though they are infallible #2071 - Improve formatting of logical plans containing subqueries #2059
- Return reference from
UnionArray::child
#2035 - support write page index #1777 [parquet]
Merged pull requests:
- Use
total_cmp
from std #2131 [arrow] (Dandandan) - fix clippy #2124 (alamb)
- Fix logical merge conflict:
match
arms have incompatible types #2121 (alamb) - Update
GenericBinaryBuilder
to use buffer builders directly. #2117 [arrow] (HaoYang670) - Simplify null mask preservation in parquet reader #2116 [parquet] (tustvold)
- Add get_byte_ranges method to AsyncFileReader trait #2115 [parquet] (thinkharderdev)
- add test for skip_values in DictionaryDecoder and fix it #2105 [parquet] (Ted-Jiang)
- Define Decimal128Builder and Decimal128Array #2102 [parquet] [arrow] (viirya)
- Support skip_values in DictionaryDecoder #2100 [parquet] (thinkharderdev)
- Pass generate_decimal256_case integration test, add
DataType::Decimal256
#2094 [parquet] [arrow] (viirya) DecimalBuilder
should useFixedSizeBinaryBuilder
#2092 [arrow] (HaoYang670)- Array writer indirection #2091 [parquet] (tustvold)
- Remove doc hidden from GenericColumnReader #2090 [parquet] (tustvold)
- Support skip_values in ColumnValueDecoderImpl #2089 [parquet] (thinkharderdev)
- type_id and value_offset are incorrect for sliced UnionArray #2087 [arrow] (viirya)
- Add IPC truncation test case for StructArray #2083 [arrow] (viirya)
- Improve performance of set_bits by using copy_from_slice instead of setting individual bytes #2077 [arrow] (jhorstmann)
- Support skip_values in ByteArrayColumnValueDecoder #2076 [parquet] (Ted-Jiang)
- Lazily materialize the null buffer builder of boolean builder #2073 [arrow] (HaoYang670)
- Fix windows CI (#2069) #2070 (tustvold)
- Test utf8_validation checks char boundaries #2068 [arrow] (tustvold)
- feat(compute): Support doy (day of year) for temporal #2067 [arrow] (ovr)
- Support nullable indices in boolean take kernel and some optimizations #2064 [arrow] (jhorstmann)
- Improve performance of filter_dict #2063 [arrow] (viirya)
- Ignore null buffer when creating ArrayData if null count is zero #2056 [arrow] (jhorstmann)
- feat(compute): Support week0 (PostgreSQL behaviour) for temporal #2052 [arrow] (ovr)
- Set DICTIONARY_ORDERED flag for FFI_ArrowSchema #2050 [arrow] (viirya)
- Generify parquet write path (#1764) #2045 [parquet] (tustvold)
- Support peek_next_page() and skip_next_page in serialized_reader. #2044 [parquet] (Ted-Jiang)
- Support MapType in FFI #2042 [arrow] (viirya)
- Add support of converting
FixedSizeBinaryArray
toDecimalArray
#2041 [arrow] (HaoYang670) - Truncate IPC record batch #2040 [arrow] (viirya)
- Refine the List builder #2034 [arrow] (HaoYang670)
- Add more tests of RecordReader Batch Size Edge Cases (#2025) #2032 [parquet] (tustvold)
- Add support for adding intervals to dates #2031 [arrow] (avantgardnerio)
18.0.0 (2022-07-08)
Breaking changes:
- Fix several bugs in parquet writer statistics generation, add
EnabledStatistics
to control level of statistics generated #2022 [parquet] (tustvold) - Add page index reader test for all types and support empty index. #2012 [parquet] (Ted-Jiang)
- Add
Decimal256Builder
andDecimal256Array
; Decimal arrays now implementBasicDecimalArray
trait #2000 [parquet] [arrow] (viirya) - Simplify
ColumnReader::read_batch
#1995 [parquet] [arrow] (tustvold) - Remove
PrimitiveBuilder::finish_dict
(#1978) #1980 [arrow] (tustvold) - Disallow cast from other datatypes to
NullType
#1942 [arrow] (liukun4515) - Add column index writer for parquet #1935 [parquet] (liukun4515)
Implemented enhancements:
- Add
DataType::Dictionary
support tosubtract_scalar
,multiply_scalar
,divide_scalar
#2019 [arrow] - Support DictionaryArray in
add_scalar
kernel #2017 [arrow] - Enable column page index read test for all types #2010 [parquet]
- Simplify
FixedSizeBinaryBuilder
#2007 [arrow] - Support
Decimal256Builder
andDecimal256Array
#1999 [arrow] - Support
DictionaryArray
inunary
kernel #1989 [arrow] - Add kernel to quickly compute comparisons on
Array
s #1987 [arrow] - Support
DictionaryArray
individe
kernel #1982 [arrow] - Implement
Into<ArrayData>
forT: Array
#1979 [arrow] - Support
DictionaryArray
inmultiply
kernel #1972 [arrow] - Support
DictionaryArray
insubtract
kernel #1970 [arrow] - Declare
DecimalArray::length
as a constant #1967 [arrow] - Support
DictionaryArray
inadd
kernel #1950 [arrow] - Add builder style methods to
Field
#1934 [arrow] - Make
StringDictionaryBuilder
faster #1851 [arrow] concat_elements_utf8
should accept arbitrary number of input arrays #1748 [arrow]
Fixed bugs:
- Array reader for list columns fails to decode if batches fall on row group boundaries #2025 [parquet]
ColumnWriterImpl::write_batch_with_statistics
incorrect distinct count in statistics #2016 [parquet]ColumnWriterImpl::write_batch_with_statistics
can write incorrect page statistics #2015 [parquet]RowFormatter
is not part of the public api #2008 [parquet]- Infinite Loop possible in
ColumnReader::read_batch
For Corrupted Files #1997 [parquet] PrimitiveBuilder::finish_dict
does not validate dictionary offsets #1978 [arrow]- Incorrect
n_buffers
inFFI_ArrowArray
#1959 [arrow] DecimalArray::from_fixed_size_list_array
fails whenoffset > 0
#1958 [arrow]- Incorrect (but ignored) metadata written after ColumnChunk #1946 [parquet]
Send
+Sync
impl forAllocation
may not be sound unlessAllocation
isSend
+Sync
as well #1944 [arrow]- Disallow cast from other datatypes to
NullType
#1923 [arrow]
Documentation updates:
Closed issues:
- Column chunk statistics of
min_bytes
andmax_bytes
return wrong size #2021 [parquet] - [Discussion] Refactor the
Decimal
s by using constant generic. #2001 - Move
DecimalArray
to a new file #1985 [arrow] - Support
DictionaryArray
inmultiply
kernel #1974 - close function instead of mutable reference #1969 [parquet]
- Incorrect
null_count
of DictionaryArray #1962 [arrow] - Support multi diskRanges for ChunkReader #1955 [parquet]
- Persisting Arrow timestamps with Parquet produces missing
TIMESTAMP
in schema #1920 [parquet] - Sperate get_next_page_header from get_next_page in PageReader #1834 [parquet]
Merged pull requests:
- Consistent case in Index enumeration #2029 [parquet] (tustvold)
- Fix record delimiting on row group boundaries (#2025) #2027 [parquet] (tustvold)
- Add builder style APIs For
Field
:with_name
,with_data_type
andwith_nullable
#2024 [arrow] (alamb) - Add dictionary support to subtract_scalar, multiply_scalar, divide_scalar #2020 [arrow] (viirya)
- Support DictionaryArray in add_scalar kernel #2018 [arrow] (viirya)
- Refine the
FixedSizeBinaryBuilder
#2013 [arrow] (HaoYang670) - Add RowFormatter to record public API #2009 [parquet] (FabioBatSilva)
- Fix parquet test_common feature flags #2003 [parquet] (tustvold)
- Stub out Skip Records API (#1792) #1998 [parquet] [arrow-flight] (tustvold)
- Implement
Into<ArrayData>
forT: Array
#1992 [parquet] [arrow] (heyrutvik) - Add unary_cmp #1991 [arrow] (viirya)
- Support DictionaryArray in unary kernel #1990 [arrow] (viirya)
- Refine
FixedSizeListBuilder
#1988 [arrow] (HaoYang670) - Move
DecimalArray
to array_decimal.rs #1986 [arrow] (HaoYang670) - MINOR: Fix clippy error after updating rust toolchain #1984 [parquet] [arrow] [arrow-flight] (viirya)
- Support dictionary array for divide kernel #1983 [arrow] (viirya)
- Support dictionary array for subtract and multiply kernel #1971 [arrow] (viirya)
- Declare the value_length of decimal array as a
const
#1968 [arrow] (HaoYang670) - Fix the behavior of
from_fixed_size_list
when offset > 0 #1964 [arrow] (HaoYang670) - Calculate n_buffers in FFI_ArrowArray by data layout #1960 [arrow] (viirya)
- Fix the doc of
FixedSizeListArray::value_length
#1957 [arrow] (HaoYang670) - Use InMemoryColumnChunkReader (~20% faster) #1956 [parquet] (tustvold)
- Unpin clap (#1867) #1954 [parquet] (tustvold)
- Set is_adjusted_to_utc if any timezone set (#1932) #1953 [parquet] [arrow] (tustvold)
- Add add_dyn for DictionaryArray support #1951 [arrow] (viirya)
- write
ColumnMetadata
after the column chunk data, not theColumnChunk
#1947 [parquet] (liukun4515) - Require Send+Sync bounds for Allocation trait #1945 [arrow] (jhorstmann)
- Faster StringDictionaryBuilder (~60% faster) (#1851) #1861 [arrow] (tustvold)
- Arbitrary size concat elements utf8 #1787 [arrow] (Ismail-Maj)
17.0.0 (2022-06-24)
Breaking changes:
- Add validation to
RecordBatch
for non-nullable fields containing null values #1890 [arrow] (andygrove) - Rename
ArrayData::validate_dict_offsets
toArrayData::validate_values
#1889 [arrow] (frolovdev) - Add
Decimal128
API and use it in DecimalArray and DecimalBuilder #1871 [parquet] [arrow] (viirya) - Mark typed buffer APIs
safe
(#996) (#1027) #1866 [parquet] [arrow] (tustvold)
Implemented enhancements:
- add a small doc example showing
ArrowWriter
being used with a cursor #1927 [parquet] - Support
cast
to/fromNULL
andDataType::Decimal
#1921 [arrow] - Add
Decimal256
API #1913 [arrow] - Add
DictionaryArray::key
function #1911 [arrow] - Support specifying capacities for
ListArrays
inMutableArrayData
#1884 [arrow] - Explicitly declare the features used for each dependency #1876 [parquet] [arrow] [arrow-flight]
- Add Decimal128 API and use it in DecimalArray and DecimalBuilder #1870 [arrow]
PrimitiveArray::from_iter
should omit validity buffer if all values are valid #1856 [arrow]- Add
from(v: Vec<Option<&[u8]>>)
andfrom(v: Vec<&[u8]>)
forFixedSizedBInaryArray
#1852 [arrow] - Add
Vec
-inspired APIs toBufferBuilder
#1850 [arrow] - PyArrow intergation test for C Stream Interface #1847 [arrow]
- Add
nilike
support incomparison
#1845 [arrow] - Split up
arrow::array::builder
module #1843 [arrow] - Add
quarter
support intemporal
kernels #1835 [arrow] - Rename
ArrayData::validate_dictionary_offset
toArrayData::validate_values
#1812 [arrow] - Clean up the testing code for
substring
kernel #1801 [arrow] - Speed up
substring_by_char
kernel #1800 [arrow]
Fixed bugs:
- unable to write parquet file with UTC timestamp #1932 [parquet]
- Incorrect max and min decimals #1916 [arrow]
dynamic_types
example does not print the projection #1902 [arrow]log2(0)
panicked at'attempt to subtract with overflow', parquet/src/util/bit_util.rs:148:5
#1901 [parquet]- Final slicing in
combine_option_bitmap
needs to use bit slices #1899 [arrow] - Dictionary IPC writer writes incorrect schema #1892 [arrow]
- Creating a
RecordBatch
with null values in non-nullable fields does not cause an error #1888 [arrow] - Upgrade
regex
dependency #1874 [arrow] - Miri reports leaks in ffi tests #1872 [arrow]
- AVX512 + simd binary and/or kernels slower than autovectorized version #1829 [arrow]
Documentation updates:
- Blog post about arrow 10.0.0 - 16.0.0 #1808
- Add README for the compute module. #1940 [arrow] (HaoYang670)
- minor: clarify docstring on
DictionaryArray::lookup_key
#1910 [arrow] (alamb) - minor: add a diagram to docstring for DictionaryArray #1909 [arrow] (alamb)
- Closes #1902: Print the original and projected RecordBatch in dynamic_types example #1903 [arrow] (martin-g)
Closed issues:
Merged pull requests:
- Set adjusted to UTC if UTC timezone (#1932) #1937 [parquet] (tustvold)
- Split up parquet::arrow::array_reader (#1483) #1933 [parquet] (tustvold)
- Add ArrowWriter doctest (#1927) #1930 [parquet] (tustvold)
- Update indexmap dependency #1929 [arrow] (tustvold)
- Complete and fixup split of
arrow::array::builder
module (#1843) #1928 [arrow] (tustvold) - MINOR: Replace
checked_add/sub().unwrap()
with+/-
#1924 [arrow] (HaoYang670) - Support casting
NULL
to/fromDecimal
#1922 [arrow] (liukun4515) - Update half requirement from 1.8 to 2.0 #1919 [arrow] (dependabot[bot])
- Fix max and min decimal for max precision #1917 [arrow] (viirya)
- Add
Decimal256
API #1914 [arrow] (viirya) - Add
DictionaryArray::key
function #1912 [arrow] (alamb) - Fix misaligned reference and logic error in crc32 #1906 [parquet] (saethlin)
- Refine the
bit_util
of Parquet. #1905 [parquet] (HaoYang670) - Use bit_slice in combine_option_bitmap #1900 [arrow] (jhorstmann)
- Issue #1876: Explicitly declare the used features for each dependency in integration_testing #1898 (martin-g)
- Issue #1876: Explicitly declare the used features for each dependency in parquet_derive_test #1897 [parquet] (martin-g)
- Issue #1876: Explicitly declare the used features for each dependency in parquet_derive #1896 (martin-g)
- Issue #1876: Explicitly declare the used features for each dependency in parquet #1895 [parquet] (martin-g)
- Minor: Add examples to docstring for
weekday
#1894 [arrow] (alamb) - Correct nullable in read_dictionary #1893 [arrow] (viirya)
- Feature add weekday temporal kernel #1891 [arrow] (nl5887)
- Support specifying list capacities for
MutableArrayData
#1885 [arrow] (jhorstmann) - Issue #1876: Explicitly declare the used features for each dependency in parquet #1881 [parquet] (martin-g)
- Issue #1876: Explicitly declare the used features for each dependency in arrow-flight #1880 [arrow-flight] (martin-g)
- Split up arrow::array::builder module (#1843) #1879 [arrow] (DaltonModlin)
- Fix memory leak in ffi test #1878 [arrow] (viirya)
- Issue #1876 - Explicitly declare the used features for each dependency #1877 [arrow] (martin-g)
- Fixes #1874 - Upgrade
regex
dependency to 1.5.6 #1875 [arrow] (martin-g) - Do not print exit code from miri, instead it should be the return value of the script #1873 (jhorstmann)
- Update vendored gRPC #1869 [arrow-flight] (tustvold)
- Expose
BitSliceIterator
andBitIndexIterator
(#1864) #1865 [arrow] (tustvold) - Exclude some long-running tests when running under miri #1863 [arrow] (jhorstmann)
- Add vec-inspired APIs to BufferBuilder (#1850) #1860 [arrow] (tustvold)
- Omit validity buffer in PrimitiveArray::from_iter when all values are valid #1859 [arrow] (jhorstmann)
- Add two
from
methods forFixedSizeBinaryArray
#1854 [arrow] (HaoYang670) - Clean up the test code of
substring
kernel. #1853 [arrow] (HaoYang670) - Add PyArrow integration test for C Stream Interface #1848 [arrow] (viirya)
- Add
nilike
support incomparison
#1846 [arrow] (MazterQyou) - MINOR: Remove version check from
test_command_help
#1844 [parquet] (viirya) - Implement UnionArray FieldData using Type Erasure #1842 [arrow] (tustvold)
- Add
quarter
support intemporal
#1836 [arrow] (MazterQyou) - speed up
substring_by_char
by about 2.5x #1832 [arrow] (HaoYang670) - Remove simd and avx512 bitwise kernels in favor of autovectorization #1830 [arrow] (jhorstmann)
- Refactor parquet::arrow module #1827 [parquet] (tustvold)
- docs: remove experimental marker on C Stream Interface #1821 [arrow] (wjones127)
- Separate Page IO from Page Decode #1810 [parquet] (tustvold)
16.0.0 (2022-06-10)
Breaking changes:
- Seal
ArrowNativeType
andOffsetSizeTrait
for safety (#1028) #1819 [arrow] (tustvold) - Improve API for
csv::infer_file_schema
by removing redundant ref #1776 [arrow] (tustvold)
Implemented enhancements:
- List equality method should work on empty offset
ListArray
#1817 [arrow] - Command line tool for convert CSV to Parquet #1797 [parquet]
- IPC writer should write validity buffer for
UnionArray
in V4 IPC message #1793 [arrow] - Add function for row alignment with page mask #1790 [parquet]
- Rust IPC Read should be able to read V4 UnionType Array #1788 [arrow]
combine_option_bitmap
should accept arbitrary number of input arrays. #1780 [arrow]- Add
substring_by_char
kernels for slicing on character boundaries #1768 [arrow] - Support reading
PageIndex
from column metadata #1761 [parquet] - Support casting from
DataType::Utf8
toDataType::Boolean
#1740 [arrow] - Make current position available in
FileWriter
. #1691 [parquet] - Support writing parquet to
stdout
#1687 [parquet]
Fixed bugs:
- Incorrect Offset Validation for Sliced List Array Children #1814 [arrow]
- Parquet Snappy Codec overwrites Existing Data in Decompression Buffer #1806 [parquet]
flight_data_to_arrow_batch
does not supportRecordBatch
es with no columns #1783 [arrow-flight]- parquet does not compile with
features=["zstd"]
#1630 [parquet]
Documentation updates:
- Update arrow module docs #1840 [arrow] (tustvold)
- Update safety disclaimer #1837 [arrow] (tustvold)
- Update ballista readme link #1765 (tustvold)
- Move changelog archive to
CHANGELOG-old.md
#1759 (alamb)
Closed issues:
DataType::Decimal
Non-Compliant? #1779 [arrow]- Further simplify the offset validation #1770 [arrow]
- Best way to convert arrow to Rust native type #1760 [arrow]
- Why
Parquet
is a part ofArrow
? #1715 [parquet] [arrow]
Merged pull requests:
- Make equals_datatype method public, enabling other modules #1838 [arrow] (nl5887)
- [Minor] Clarify
PageIterator
Documentation #1831 [parquet] (Ted-Jiang) - Update MIRI pin #1828 (tustvold)
- Change to use
resolver v2
, test more feature flag combinations in CI, fix errors (#1630) #1822 [parquet] [arrow] (tustvold) - Add ScalarBuffer abstraction (#1811) #1820 [arrow] (tustvold)
- Fix list equal for empty offset list array #1818 [arrow] (viirya)
- Fix Decimal and List ArrayData Validation (#1813) (#1814) #1816 [arrow] (tustvold)
- Don't overwrite existing data on snappy decompress (#1806) #1807 [parquet] (tustvold)
- Rename
arrow/benches/string_kernels.rs
toarrow/benches/substring_kernels.rs
#1805 [arrow] (HaoYang670) - Add public API for decoding parquet footer #1804 [parquet] (tustvold)
- Add AsyncFileReader trait #1803 [parquet] (tustvold)
- add parquet-fromcsv (#1) #1798 [parquet] (kazuk)
- Use IPC row count info in IPC reader #1796 [arrow] (viirya)
- Fix typos in the Memory and Buffers section of the docs home #1795 [arrow] (datapythonista)
- Write validity buffer for UnionArray in V4 IPC message #1794 [arrow] (viirya)
- feat:Add function for row alignment with page mask #1791 [parquet] (Ted-Jiang)
- Read and skip validity buffer of UnionType Array for V4 ipc message #1789 [arrow] [arrow-flight] (viirya)
- Add
Substring_by_char
#1784 [arrow] (HaoYang670) - Add
ParquetFileArrowReader::try_new
#1782 [parquet] (tustvold) - Arbitrary size combine option bitmap #1781 [arrow] (Ismail-Maj)
- Implement
ChunkReader
forBytes
, deprecateSliceableCursor
#1775 [parquet] (tustvold) - Access metadata of flushed row groups on write (#1691) #1774 [parquet] (tustvold)
- Simplify ParquetFileArrowReader Metadata API #1773 [parquet] (tustvold)
- MINOR: Unpin nightly version as packed_simd releases new version #1771 (viirya)
- Update comfy-table requirement from 5.0 to 6.0 #1769 [arrow] (dependabot[bot])
- Optionally disable
validate_decimal_precision
check inDecimalBuilder.append_value
for interop test #1767 [arrow] (viirya) - Minor: Clean up the code of MutableArrayData #1763 [arrow] (HaoYang670)
- Support reading PageIndex from parquet metadata, prepare for skipping pages at reading #1762 [parquet] (Ted-Jiang)
- Support casting
Utf8
toBoolean
#1738 [arrow] (MazterQyou)
15.0.0 (2022-05-27)
Breaking changes:
- Change
ArrayDataBuilder::null_bit_buffer
to acceptOption<Buffer>
rather thanBuffer
#1739 [arrow] (HaoYang670) - Remove
null_count
fromArrayData::try_new()
#1721 [arrow] (HaoYang670) - Change parquet writers to use standard
std:io::Write
rather customParquetWriter
trait (#1717) (#1163) #1719 [parquet] (tustvold) - Add explicit column mask for selection in parquet:
ProjectionMask
(#1701) #1716 [parquet] (tustvold) - Add type_ids in Union datatype #1703 [parquet] [arrow] (viirya)
- Fix Parquet Reader's Arrow Schema Inference #1682 [parquet] [arrow] (tustvold)
Implemented enhancements:
- Rename the
string
kernel toconcatenate_elements
#1747 [arrow] ArrayDataBuilder::null_bit_buffer
should acceptOption<Buffer>
as input type #1737 [arrow]- Fix schema comparison for non_canonical_map when running flight test #1730 [arrow]
- Add support in aggregate kernel for
BinaryArray
#1724 [arrow] - Fix incorrect null_count in
generate_unions_case
integration test #1712 [arrow] - Keep type ids in Union datatype to follow Arrow spec and integrate with other implementations #1690 [arrow]
- Support Reading Alternative List Representations to Arrow From Parquet #1680 [parquet]
- Speed up the offsets checking #1675 [arrow]
- Separate Parquet -> Arrow Schema Conversion From ArrayBuilder #1655 [parquet]
- Add
leaf_columns
argument toArrowReader::get_record_reader_by_columns
#1653 [parquet] - Implement
string_concat
kernel #1540 [arrow] - Improve Unit Test Coverage of ArrayReaderBuilder #1484 [parquet]
Fixed bugs:
- Parquet write failure (from record batches) when data is nested two levels deep #1744 [parquet]
- IPC reader may break on projection #1735 [arrow]
- Latest nightly fails to build with feature simd #1734 [arrow]
- Trying to write parquet file in parallel results in corrupt file #1717 [parquet]
- Roundtrip failure when using DELTA_BINARY_PACKED #1708 [parquet]
ArrayData::try_new
cannot always return expected error. #1707 [arrow]- "out of order projection is not supported" after Fix Parquet Arrow Schema Inference #1701 [parquet]
- Rust is not interoperability with C++ for IPC schemas with dictionaries #1694 [arrow]
- Incorrect Repeated Field Schema Inference #1681 [parquet]
- Parquet Treats Embedded Arrow Schema as Authoritative #1663 [parquet]
- parquet_to_arrow_schema_by_columns Incorrectly Handles Nested Types #1654 [parquet]
- Inconsistent Arrow Schema When Projecting Nested Parquet File #1652 [parquet]
- StructArrayReader Cannot Handle Nested Lists #1651 [parquet]
- Bug (
substring
kernel): The null buffer is not aligned whenoffset != 0
#1639 [arrow]
Documentation updates:
- Parquet command line tool does not install "globally" #1710 [parquet]
- Improve integration test document to follow Arrow C++ repo CI #1742 [arrow] (viirya)
Merged pull requests:
- Test for list array equality with different offsets #1756 [arrow] (alamb)
- Rename
string_concat
toconcat_elements_utf8
#1754 [arrow] (alamb) - Rename the
string
kernel toconcat_elements
. #1752 [arrow] (HaoYang670) - Support writing nested lists to parquet #1746 [parquet] (tustvold)
- Pin nightly version to bypass packed_simd build error #1743 (viirya)
- Fix projection in IPC reader #1736 [arrow] (iyupeng)
cargo install
installs not globally #1732 [parquet] (kazuk)- Fix schema comparison for non_canonical_map when running flight test #1731 (viirya)
- Add
min_binary
andmax_binary
aggregate kernels #1725 [arrow] (HaoYang670) - Fix parquet benchmarks #1723 [parquet] (tustvold)
- Fix BitReader::get_batch zero extension (#1708) #1722 [parquet] (tustvold)
- Implementation string concat #1720 [arrow] (Ismail-Maj)
- Check the length of
null_bit_buffer
inArrayData::try_new()
#1714 [arrow] (HaoYang670) - Fix incorrect null_count in
generate_unions_case
integration test #1713 [arrow] (viirya) - Fix: Null buffer accounts for
offset
insubstring
kernel. #1704 [arrow] (HaoYang670) - Minor: Refine
OffsetSizeTrait
to extendnum::Integer
#1702 [arrow] (HaoYang670) - Fix StructArrayReader handling nested lists (#1651) #1700 [parquet] (tustvold)
- Speed up the offsets checking #1684 [arrow] (HaoYang670)
14.0.0 (2022-05-13)
Breaking changes:
- Use
bytes
in parquet rather than custom Buffer implementation (#1474) #1683 [parquet] (tustvold) - Rename
OffsetSize::fn is_large
toconst OffsetSize::IS_LARGE
#1664 [parquet] [arrow] (HaoYang670) - Remove
StringOffsetTrait
andBinaryOffsetTrait
#1645 [arrow] (HaoYang670) - Fix
generate_nested_dictionary_case
integration test failure #1636 [arrow] [arrow-flight] (viirya)
Implemented enhancements:
- Add support for
DataType::Duration
in ffi interface #1688 [arrow] - Fix
generate_unions_case
integration test #1676 [arrow] - Add
DictionaryArray
support forbit_length
kernel #1673 [arrow] - Add
DictionaryArray
support forlength
kernel #1672 [arrow] - flight_client_scenarios integration test should receive schema from flight data #1669 [arrow]
- Unpin Flatbuffer version dependency #1667 [arrow]
- Add dictionary array support for substring function #1656 [arrow]
- Exclude dict_id and dict_is_ordered from equality comparison of
Field
#1646 [arrow] - Remove
StringOffsetTrait
andBinaryOffsetTrait
#1644 [arrow] - Add tests and examples for
UnionArray::from(data: ArrayData)
#1643 [arrow] - Add methods
pub fn offsets_buffer
,pub fn types_ids_buffer
andpub fn data_buffer
forArrayDataBuilder
#1640 [arrow] - Fix
generate_nested_dictionary_case
integration test failure for Rust cases #1635 [arrow] - Expose
ArrowWriter
row group flush in public API #1626 [parquet] - Add
substring
support forFixedSizeBinaryArray
#1618 [arrow] - Add PrettyPrint for
UnionArray
s #1594 [arrow] - Add SIMD support for the
length
kernel #1489 [arrow] - Support dictionary arrays in length and bit_length #1674 [arrow] (viirya)
- Add dictionary array support for substring function #1665 [arrow] (sunchao)
- Add
DecimalType
support innew_null_array
#1659 [arrow] (yjshen)
Fixed bugs:
- Docs.rs build is broken #1695
- Interoperability with C++ for IPC schemas with dictionaries #1694
UnionArray::is_null
incorrect #1625 [arrow]- Published Parquet documentation missing
arrow::async_reader
#1617 [parquet] - Files written with Julia's Arrow.jl in IPC format cannot be read by arrow-rs #1335 [arrow]
Documentation updates:
- Correct arrow-flight readme version #1641 [arrow-flight] (alamb)
Closed issues:
- Make
OffsetSizeTrait::IS_LARGE
as a const value #1658 - Question: Why are there 3 types of
OffsetSizeTrait
s? #1638 - Written Parquet file way bigger than input files #1627
- Ensure there is a single zero in the offsets buffer for an empty ListArray. #1620
- Filtering
UnionArray
Changes DataType #1595
Merged pull requests:
- Fix docs.rs build #1696 [parquet] (alamb)
- support duration in ffi #1689 [arrow] (ryan-jacobs1)
- fix bench command line options #1685 [parquet] [arrow] (kazuk)
- Enable branch protection #1679 (tustvold)
- Fix logical merge conflict in #1588 #1678 [parquet] (tustvold)
- Fix generate_unions_case for Rust case #1677 [arrow] (viirya)
- Receive schema from flight data #1670 (viirya)
- unpin flatbuffers dependency version #1668 [arrow] (Cheappie)
- Remove parquet dictionary converters (#1661) #1662 [parquet] (tustvold)
- Minor: simplify the function
GenericListArray::get_type
#1650 [arrow] (HaoYang670) - Pretty Print
UnionArray
s #1648 [arrow] (tfeda) - Exclude
dict_id
anddict_is_ordered
from equality comparison ofField
#1647 [arrow] (viirya) - expose row-group flush in public api #1634 [parquet] (Cheappie)
- Add
substring
support forFixedSizeBinaryArray
#1633 [arrow] (HaoYang670) - Fix UnionArray is_null #1632 [arrow] (viirya)
- Do not assume dictionaries exists in footer #1631 [arrow] (pcjentsch)
- Add support for nested list arrays from parquet to arrow arrays (#993) #1588 [parquet] (tustvold)
- Add
async
into doc features #1349 [parquet] (HaoYang670)
13.0.0 (2022-04-29)
Breaking changes:
- Update
parquet::basic::LogicalType
to be more idomatic #1612 [parquet] (tfeda) - Fix Null Mask Handling in
ArrayData
,UnionArray
, andMapArray
#1589 [arrow] (tustvold) - Replace
&Option<T>
withOption<&T>
in severalarrow
andparquet
APIs #1571 [parquet] [arrow] (tfeda)
Implemented enhancements:
- Read/write nested dictionary under fixed size list in ipc stream reader/write #1609 [arrow]
- Add support for
BinaryArray
insubstring
kernel #1593 [arrow] - Read/write nested dictionary under large list in ipc stream reader/write #1584 [arrow]
- Read/write nested dictionary under map in ipc stream reader/write #1582 [arrow]
- Implement
Clone
for JSONDecoderOptions
#1580 [arrow] - Add utf-8 validation checking to
substring
kernel #1575 [arrow] - Support casting to/from
DataType::Null
incast
kernel #1572 [arrow] (WinkerDu)
Fixed bugs:
- Parquet schema should allow scale == precision for decimal type #1606 [parquet]
- ListArray::from(ArrayData) dereferences invalid pointer when offsets are empty #1601 [arrow]
- ArrayData Equality Incorrect Null Mask Offset Handling #1599
- Filtering UnionArray Incorrect Handles Runs #1598
- [Safety] Filtering Dense UnionArray Produces Invalid Offsets #1596
- [Safety] UnionBuilder Doesn't Check Types #1591
- Union Layout Should Not Support Separate Validity Mask #1590
- Incorrect nullable flag when reading maps ( test_read_maps fails when
force_validate
is active) #1587 [parquet] - Output of
ipc::reader::tests::projection_should_work
fails validation #1548 [arrow] - Incorrect min/max statistics for decimals with byte-array notation #1532
Documentation updates:
Closed issues:
- Dense UnionArray Offsets Are i32 not i8 #1597 [arrow]
- Replace
&Option<T>
withOption<&T>
in some APIs #1556 [parquet] [arrow] - Improve ergonomics of
parquet::basic::LogicalType
#1554 [parquet] - Mark the current
substring
function asunsafe
and rename it. #1541 [arrow] - Requirements for Async Parquet API #1473 [parquet]
Merged pull requests:
- Nit: use the standard function
div_ceil
#1629 [arrow] (HaoYang670) - Update flatbuffers requirement from =2.1.1 to =2.1.2 #1622 [arrow] (dependabot[bot])
- Fix decimals min max statistics #1621 [parquet] (atefsawaed)
- Add example readme #1615 [arrow] (alamb)
- Improve docs and examples links on main readme #1614 [arrow] (alamb)
- Read/Write nested dictionaries under FixedSizeList in IPC #1610 [arrow] (viirya)
- Add
substring
support for binary #1608 [arrow] (HaoYang670) - Parquet: schema validation should allow scale == precision for decimal type #1607 [parquet] (sunchao)
- Don't access and validate offset buffer in ListArray::from(ArrayData) #1602 [arrow] (jhorstmann)
- Fix map nullable flag in
ParquetTypeConverter
#1592 [parquet] (viirya) - Read/write nested dictionary under large list in ipc stream reader/writer #1585 [arrow] (viirya)
- Read/write nested dictionary under map in ipc stream reader/writer #1583 [arrow] (viirya)
- Derive
Clone
andPartialEq
for jsonDecoderOptions
#1581 [arrow] (alamb) - Add utf-8 validation checking for
substring
#1577 [arrow] (HaoYang670) - Use
Option<T>
rather thanOption<&T>
for copy types in substring kernel #1576 [arrow] (tustvold) - Use littleendian arrow files for
projection_should_work
#1573 [arrow] (viirya)
12.0.0 (2022-04-15)
Breaking changes:
- Add
ArrowReaderOptions
toParquetFileArrowReader
, add option to skip decoding arrow metadata from parquet (#1459) #1558 [parquet] (tustvold) - Support
RecordBatch
with zero columns but non zero row count, add field toRecordBatchOptions
(#1536) #1552 [arrow] (tustvold) - Consolidate JSON Reader options and
DecoderOptions
#1539 [arrow] (alamb) - Update
prost
,prost-derive
andprost-types
to 0.10,tonic
, andtonic-build
to0.7
#1510 [arrow-flight] (alamb) - Add Json
DecoderOptions
and support customformat_string
for each field #1451 [arrow] (sum12)
Implemented enhancements:
- Read/write nested dictionary in ipc stream reader/writer #1565 [arrow]
- Support
FixedSizeBinary
in the Arrow C data interface #1553 [arrow] - Support Empty Column Projection in
ParquetRecordBatchReader
#1537 [parquet] - Support
RecordBatch
with zero columns but non zero row count #1536 [arrow] - Add support for
Date32
/Date64
<-->String
/LargeString
incast
kernel #1535 [arrow] - Support creating arrays from externally owned memory like
Vec
orString
#1516 [arrow] - Speed up the
substring
kernel #1511 [arrow] - Handle Parquet Files With Inconsistent Timestamp Units #1459 [parquet]
Fixed bugs:
- Error Infering Schema for LogicalType::UNKNOWN #1557 [parquet]
- Read dictionary from nested struct in ipc stream reader panics #1549 [arrow]
filter
produces invalid sparseUnionArray
s #1547 [arrow]- Documentation for
GenericListBuilder
is not exposed. #1518 [arrow] - cannot read parquet file #1515 [parquet]
- The
substring
kernel panics when chars > U+0x007F #1478 [arrow] - Hang due to infinite loop when reading some parquet files with RLE encoding and bit packing #1458 [parquet]
Documentation updates:
- Improve JSON reader documentation #1559 [arrow] (alamb)
- Improve doc string for
substring
kernel #1529 [arrow] (HaoYang670) - Expose documentation of
GenericListBuilder
#1525 [arrow] (comath) - Add a diagram to
take
kernel documentation #1524 [arrow] (alamb)
Closed issues:
- Interesting benchmark results of
min_max_helper
#1400
Merged pull requests:
- Fix incorrect
into_buffers
for UnionArray #1567 [arrow] (viirya) - Read/write nested dictionary in ipc stream reader/writer #1566 [arrow] (viirya)
- Support FixedSizeBinary and FixedSizeList for the C data interface #1564 [arrow] (sunchao)
- Split out ListArrayReader into separate module (#1483) #1563 [parquet] (tustvold)
- Split out
MapArray
into separate module (#1483) #1562 [parquet] (tustvold) - Support empty projection in
ParquetRecordBatchReader
#1560 [parquet] (tustvold) - fix infinite loop in not fully packed bit-packed runs #1555 [parquet] (tustvold)
- Add test for creating FixedSizeBinaryArray::try_from_sparse_iter failed when given all Nones #1551 [arrow] (alamb)
- Fix reading dictionaries from nested structs in ipc
StreamReader
#1550 [arrow] (dispanser) - Add support for Date32/64 <--> String/LargeString in
cast
kernel #1534 [arrow] (yjshen) - fix clippy errors in 1.60 #1527 [parquet] [arrow] (alamb)
- Mark
remove-old-releases.sh
executable #1522 (alamb) - Delete duplicate code in the
sort
kernel #1519 [arrow] (HaoYang670) - Fix reading nested lists from parquet files #1517 [parquet] (viirya)
- Speed up the
substring
kernel by about 2x #1512 [arrow] (HaoYang670) - Add
new_from_strings
to createMapArrays
#1507 [arrow] (viirya) - Decouple buffer deallocation from ffi and allow creating buffers from rust vec #1494 [arrow] (jhorstmann)
11.1.0 (2022-03-31)
Implemented enhancements:
- Implement
size_hint
andExactSizedIterator
for DecimalArray #1505 [arrow] - Support calculate length by chars for
StringArray
#1493 [arrow] - Add
length
kernel support forListArray
#1470 [arrow] - The length kernel should work with
BinaryArray
s #1464 [arrow] - FFI for Arrow C Stream Interface #1348 [arrow]
- Improve performance of
DictionaryArray::try_new()
#1313 [arrow]
Fixed bugs:
- MIRI error in math_checked_divide_op/try_from_trusted_len_iter #1496 [arrow]
- Parquet Writer Incorrect Definition Levels for Nested NullArray #1480 [parquet]
- FFI: ArrowArray::try_from_raw shouldn't clone #1425 [arrow]
- Parquet reader fails to read null list. #1399 [parquet]
Documentation updates:
- A small mistake in the doc of
BinaryArray
andLargeBinaryArray
#1455 [arrow] - A small mistake in the doc of
GenericBinaryArray::take_iter_unchecked
#1454 [arrow] - Add links in the doc of
BinaryOffsetSizeTrait
#1453 [arrow] - The doc of
FixedSizeBinaryArray
is confusing. #1452 [arrow] - Clarify docs that SlicesIterator ignores null values #1504 [arrow] (alamb)
- Update the doc of
BinaryArray
andLargeBinaryArray
#1471 [arrow] (HaoYang670)
Closed issues:
packed_simd
v.s.portable_simd
, which should be used? #1492- Cleanup: Use Arrow take kernel Within parquet ListArrayReader #1482 [parquet]
Merged pull requests:
- Implement
size_hint
andExactSizedIterator
forDecimalArray
#1506 [arrow] (alamb) - Add
StringArray::num_chars
for calculating number of characters #1503 [arrow] (HaoYang670) - Workaround nightly miri error in
try_from_trusted_len_iter
#1497 [arrow] (jhorstmann) - update doc of array_binary and array_string #1491 [arrow] (HaoYang670)
- Use Arrow take kernel within ListArrayReader #1490 [parquet] (viirya)
- Add
length
kernel support for List Array #1488 [arrow] (HaoYang670) - Support sort for
Decimal
data type #1487 [arrow] (yjshen) - Fix reading/writing nested null arrays (#1480) (#1036) (#1399) #1481 [parquet] (tustvold)
- Implement ArrayEqual for UnionArray #1469 [arrow] (viirya)
- Support the
length
kernel on Binary Array #1465 [arrow] (HaoYang670) - Remove Clone and copy source structs internally #1449 [arrow] (viirya)
- Fix Parquet reader for null lists #1448 [parquet] (viirya)
- Improve performance of DictionaryArray::try_new() #1435 [arrow] (jackwener)
- Add FFI for Arrow C Stream Interface #1384 [arrow] (viirya)
11.0.0 (2022-03-17)
Breaking changes:
- Replace
filter_row_groups
withReadOptions
in parquet SerializedFileReader #1389 [parquet] (yjshen) - Implement projection for arrow
IPC Reader
file / streams #1339 [arrow] [arrow-flight] (Dandandan)
Implemented enhancements:
- Fix generate_interval_case integration test failure #1445
- Make the doc examples of
ListArray
andLargeListArray
more readable #1433 - Redundant
if
andabs
inshift()
#1427 - Improve substring kernel performance #1422 [arrow]
- Add missing value_unchecked() of
FixedSizeBinaryArray
#1419 - Remove duplicate bound check in function
shift
#1408 - Support dictionary array in C data interface #1397
- filter kernel should work with
UnionArray
s #1394 [arrow] - filter kernel should work with
FixedSizeListArrays
s #1393 [arrow] - Add doc examples for creating FixedSizeListArray #1392 [arrow]
- Update
rust-version
to 1.59 #1377 - Arrow IPC projection support #1338
- Implement basic FlightSQL Server #1386 [arrow-flight] (wangfenjin)
Fixed bugs:
- DictionaryArray::try_new ignores validity bitmap of the keys #1429 [arrow]
- The doc of
GenericListArray
is confusing #1424 - DeltaBitPackDecoder Incorrectly Handles Non-Zero MiniBlock Bit Width Padding #1417 [parquet]
- DeltaBitPackEncoder Pads Miniblock BitWidths With Arbitrary Values #1416 [parquet]
- Possible unaligned write with MutableBuffer::push #1410 [arrow]
- Integration Test is failing on master branch #1398 [arrow]
Documentation updates:
- Rewrite doc of
GenericListArray
#1450 [arrow] (HaoYang670) - Fix integration doc about build.ninja location #1438 (viirya)
Merged pull requests:
- Rewrite doc example of
ListArray
andLargeListArray
#1447 [arrow] (HaoYang670) - Fix generate_interval_case in integration test #1446 [arrow] (viirya)
- Fix generate_decimal128_case in integration test #1440 (viirya)
filter
kernel should work with FixedSizeListArrays #1434 [arrow] (viirya)- Support nullable keys in DictionaryArray::try_new #1430 [arrow] (jhorstmann)
- remove redundant if/clamp_min/abs #1428 [arrow] (jackwener)
- Add doc example for creating
FixedSizeListArray
#1426 [arrow] (HaoYang670) - Directly write to MutableBuffer in substring #1423 [arrow] (viirya)
- Fix possibly unaligned writes in MutableBuffer #1421 [arrow] (jhorstmann)
- Add value_unchecked() and unit test #1420 [arrow] (jackwener)
- Fix DeltaBitPack MiniBlock Bit Width Padding #1418 [parquet] (tustvold)
- Update zstd requirement from 0.10 to 0.11 #1415 [parquet] (dependabot[bot])
- Set
default-features = false
forzstd
in the parquet crate to supportwasm32-unknown-unknown
#1414 [parquet] (kylebarron) - Add support for
UnionArray
infilter
kernel #1412 [arrow] (viirya) - Remove duplicate bound check in the function
shift
#1409 [arrow] (HaoYang670) - Add dictionary support for C data interface #1407 [arrow] (sunchao)
- Fix a small spelling mistake in docs. #1406 [arrow] (HaoYang670)
- Add unit test to check
FixedSizeBinaryArray
input all none #1405 [arrow] (jackwener) - Move csv Parser trait and its implementations to utils module #1385 [arrow] (sum12)
10.0.0 (2022-03-04)
Breaking changes:
- Remove existing has_ methods for optional fields in
ColumnChunkMetaData
#1346 [parquet] (shanisolomon) - Remove redundant
has_
methods inColumnChunkMetaData
#1345 [parquet] (shanisolomon)
Implemented enhancements:
- Add extract month and day in temporal.rs #1387
- Add clone to
IpcWriteOptions
#1381 [arrow] - Support
MapArray
infilter
kernel #1378 [arrow] - Add
week
temporal kernel #1375 [arrow] - Improve performance of
compare_dict_op
#1371 [arrow] - Add support for LargeUtf8 in json writer #1357 [parquet]
- Make
arrow::array::builder::MapBuilder
public #1354 [arrow] - Refactor
StructArray::from
#1351 [arrow] - Refactor
RecordBatch::validate_new_batch
#1350 [arrow] - Remove redundant has_ methods for optional column metadata fields #1344 [parquet]
- Add
write
method to JsonWriter #1340 [arrow] - Refactor the code of
Bitmap::new
#1337 [arrow] - Use DictionaryArray's iterator in
compare_dict_op
#1329 [arrow] - Add
as_decimal_array(arr: &dyn Array) -> &DecimalArray
#1312 [arrow] - More ergonomic / idiomatic primitive array creation from iterators #1298 [arrow]
- Implement DictionaryArray support in
eq_dyn
,neq_dyn
,lt_dyn
,lt_eq_dyn
,gt_dyn
,gt_eq_dyn
#1201 [arrow]
Fixed bugs:
cargo clippy
fails on themaster
branch #1362 [arrow]ArrowArray::try_from_raw
should not assume the pointers are from Arc #1333 [arrow]- Fix CSV Writer::new to accept delimiter and make WriterBuilder::build use it #1328 [arrow]
- Make bounds configurable via builder when reading CSV #1327 [arrow]
- Add
with_datetime_format()
to CSV WriterBuilder #1272 [arrow]
Performance improvements:
Closed issues:
- Consider removing redundant has_XXX metadata functions in
ColumnChunkMetadata
#1332
Merged pull requests:
- Support extract
day
andmonth
in temporal.rs #1388 [arrow] (Ted-Jiang) - Add write method to Json Writer #1383 [arrow] (matthewmturner)
- Derive
Clone
forIpcWriteOptions
#1382 [arrow] (matthewmturner) - feat: support maps in MutableArrayData #1379 [arrow] (helgikrs)
- Support extract
week
in temporal.rs #1376 [arrow] (Ted-Jiang) - Speed up the function
min_max_string
#1374 [arrow] (HaoYang670) - Improve performance if dictionary kernels, add benchmark and add
take_iter_unchecked
#1372 [arrow] (viirya) - Update pyo3 requirement from 0.15 to 0.16 #1369 [arrow] (dependabot[bot])
- Update contributing guide #1368 (HaoYang670)
- Allow primitive array creation from iterators of PrimitiveTypes (as well as
Option
) #1367 [arrow] (viirya) - Update flatbuffers requirement from =2.1.0 to =2.1.1 #1364 [arrow] (dependabot[bot])
- Fix clippy lints #1363 [parquet] [arrow] (HaoYang670)
- Refactor
RecordBatch::validate_new_batch
#1361 [arrow] (HaoYang670) - Refactor
StructArray::from
#1360 [arrow] (HaoYang670) - Update flatbuffers requirement from =2.0.0 to =2.1.0 #1359 [arrow] (dependabot[bot])
- fix: add LargeUtf8 support in json writer #1358 [arrow] (tiphaineruy)
- Add
as_decimal_array
function #1356 [arrow] (liukun4515) - Publicly export arrow::array::MapBuilder #1355 [arrow] (tjwilson90)
- Add with_datetime_format to csv WriterBuilder #1347 [arrow] (gsserge)
- Refactor
Bitmap::new
#1343 [arrow] (HaoYang670) - Remove delimiter from csv Writer #1342 [arrow] (gsserge)
- Make bounds configurable in csv ReaderBuilder #1341 [arrow] (gsserge)
ArrowArray::try_from_raw
should not assume the pointers are from Arc #1334 [arrow] (viirya)- Use DictionaryArray's iterator in
compare_dict_op
#1330 [arrow] (viirya) - Implement DictionaryArray support in neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn #1326 [arrow] (viirya)
- Arrow Rust + Conbench Integration #1289 (dianaclarke)
9.1.0 (2022-02-19)
Implemented enhancements:
- Exposing page encoding stats #1321
- Improve filter performance by special casing high and low selectivity predicates #1288 [arrow]
- Speed up
DeltaBitPackDecoder
#1281 [parquet] - Fix all clippy lints in arrow crate #1255 [arrow]
- Expose page encoding
ColumnChunkMetadata
#1322 [parquet] (shanisolomon) - Expose column index and offset index in
ColumnChunkMetadata
#1318 [parquet] (shanisolomon) - Expose bloom filter offset in
ColumnChunkMetadata
#1309 [parquet] (shanisolomon) - Add
DictionaryArray::try_new()
to create dictionaries from pre existing arrays #1300 [arrow] (alamb) - Add
DictionaryArray::keys_iter
, andtake_iter
for other array types #1296 [arrow] (viirya) - Make
rle
decoder public underexperimental
feature #1271 [parquet] (zeevm) - Add
DictionaryArray
support ineq_dyn
kernel #1263 [arrow] (viirya)
Fixed bugs:
len
is not a parameter ofMutableArrayData::extend
#1316- module
data_type
is private in Rust Parquet 8.0.0 #1302 [parquet] - Test failure: bit_chunk_iterator #1294
- csv_writer benchmark fails with "no such file or directory" #1292
Documentation updates:
Performance improvements:
- Vectorize DeltaBitPackDecoder, up to 5x faster decoding #1284 [parquet] (tustvold)
- Skip zero-ing primitive nulls #1280 [parquet] (tustvold)
- Add specialized filter kernels in
compute
module (up to 10x faster) #1248 [parquet] [arrow] (tustvold)
Closed issues:
- Expose column and offset index metadata offset #1317
- Expose bloom filter metadata offset #1308
- Improve ergonomics to construct
DictionaryArrays
fromKey
andValue
arrays #1299 - Make it easier to iterate over
DictionaryArray
#1295 [arrow] - (WON'T FIX) Don't Interwine Bit and Byte Aligned Operations in
BitReader
#1282 - how to create arrow::array from streamReader #1278
- Remove scientific notation when converting floats to strings. #983
Merged pull requests:
- Update the document of function
MutableArrayData::extend
#1336 [arrow] (HaoYang670) - Fix clippy lint
dead_code
#1324 [arrow] (gsserge) - fix test bug and ensure that bloom filter metadata is serialized in
to_thrift
#1320 [parquet] (shanisolomon) - Enable more clippy lints in arrow #1315 [arrow] (gsserge)
- Fix clippy lint
clippy::type_complexity
#1310 [arrow] (gsserge) - Fix clippy lint
clippy::float_equality_without_abs
#1305 [arrow] (gsserge) - Fix clippy
clippy::vec_init_then_push
lint #1303 [arrow] (gsserge) - Fix failing csv_writer bench #1293 [arrow] (andygrove)
- Changes for 9.0.2 #1291 [parquet] [arrow] [arrow-flight] (alamb)
- Fix bitmask creation also for simd comparisons with scalar #1290 [arrow] (jhorstmann)
- Fix simd comparison kernels #1286 [arrow] (jhorstmann)
- Restrict Decoder to compatible types (#1276) #1277 [parquet] (tustvold)
- Fix some clippy lints in parquet crate, rename
LevelEncoder
variants to conform to Rust standards #1273 [parquet] (HaoYang670) - Use new DecimalArray creation API in arrow crate #1249 [arrow] (alamb)
- Improve
DecimalArray
API ergonomics: additer()
,FromIterator
,with_precision_and_scale
#1223 [arrow] (alamb)
9.0.2 (2022-02-09)
Breaking changes:
- Add
Send
+Sync
toDataType
,RowGroupReader
,FileReader
,ChunkReader
. #1264 - Rename the function
Bitmap::len
toBitmap::bit_len
to clarify its meaning #1242 [parquet] [arrow] (HaoYang670) - Remove unused / broken
memory-check
feature #1222 [arrow] (jhorstmann) - Potentially buffer multiple
RecordBatches
before writing a parquet row group inArrowWriter
#1214 [parquet] [arrow] (tustvold)
Implemented enhancements:
- Add
async
arrow parquet reader #1154 [parquet] [arrow] (tustvold) - Rename
Bitmap::len
toBitmap::bit_len
#1233 - Extend CSV schema inference to allow scientific notation for floating point types #1215 [arrow]
- Write Multiple RecordBatch to Parquet Row Group #1211
- Add doc examples for
eq_dyn
etc. #1202 [arrow] - Add comparison kernels for
BinaryArray
#1108 impl ArrowNativeType for i128
#1098- Remove
Copy
trait bound from dyn scalar kernels #1243 [arrow] (matthewmturner) - Add
into_inner
for IPCFileWriter
#1236 [arrow] (yjshen) - [Minor]Re-export
array::builder::make_builder
to make it available for downstream #1235 [arrow] (yjshen)
Fixed bugs:
- Parquet v8.0.0 panics when reading all null column to NullArray #1245 [parquet]
- Get
Unknown configuration option rust-version
when running the rust format command #1240 Bitmap
Length Validation is Incorrect #1231 [arrow]- Writing sliced
ListArray
orMapArray
ignore offsets #1226 [parquet] - Remove broken
memory-tracking
crate feature #1171 - Revert making
parquet::data_type
andparquet::arrow::schema
experimental #1244 [parquet] (tustvold)
Documentation updates:
- Update parquet crate documentation and examples #1253 [parquet] [arrow] (alamb)
- Refresh parquet readme / contributing guide #1252 [parquet] (alamb)
- Add docs examples for dynamically compare functions #1250 [arrow] (HaoYang670)
- Add Rust Docs examples for UnionArray #1241 [arrow] (HaoYang670)
- Improve documentation for Bitmap #1237 [arrow] (alamb)
Performance improvements:
- Improve performance for arithmetic kernels with
simd
feature enabled (except for division/modulo) #1221 [arrow] (jhorstmann) - Do not concatenate identical dictionaries #1219 [arrow] (tustvold)
- Preserve dictionary encoding when decoding parquet into Arrow arrays, 60x perf improvement (#171) #1180 [parquet] (tustvold)
Closed issues:
UnalignedBitChunkIterator
to that iterates through already alignedu64
blocks #1227- Remove unused
ArrowArrayReader
in parquet #1197 [parquet]
Merged pull requests:
- Upgrade clap to 3.0.0 #1261 [parquet] (Jimexist)
- Update chrono-tz requirement from 0.4 to 0.6 #1259 [arrow] (dependabot[bot])
- Update zstd requirement from 0.9 to 0.10 #1257 [parquet] (dependabot[bot])
- Fix NullArrayReader (#1245) #1246 [parquet] (tustvold)
- dyn compare for binary array #1238 [arrow] (HaoYang670)
- Remove arrow array reader (#1197) #1234 [parquet] (tustvold)
- Fix null bitmap length validation (#1231) #1232 [arrow] (tustvold)
- Faster bitmask iteration #1228 [parquet] [arrow] (tustvold)
- Add non utf8 values into the test cases of BinaryArray comparison #1220 [arrow] (HaoYang670)
- Update DECIMAL_RE to allow scientific notation in auto inferred schemas #1216 [arrow] (pjmore)
- Fix simd comparison kernels #1286 [arrow] (jhorstmann)
- Fix bitmask creation also for simd comparisons with scalar #1290 [arrow] (jhorstmann)
8.0.0 (2022-01-20)
Breaking changes:
- Return error from JSON writer rather than panic #1205 [arrow] (Ted-Jiang)
- Remove
ArrowSignedNumericType
to Simplify and reduce code duplication in arithmetic kernels #1161 [arrow] (jhorstmann) - Restrict RecordReader and friends to scalar types (#1132) #1155 [parquet] (tustvold)
- Move more parquet functionality behind experimental feature flag (#1032) #1134 [parquet] (tustvold)
Implemented enhancements:
- Parquet reader should be able to read structs within list #1186 [parquet]
- Disable serde_json
arbitrary_precision
feature flag #1174 [arrow] - Simplify and reduce code duplication in arithmetic.rs #1160 [arrow]
- Return
Err
from JSON writer rather thanpanic!
for unsupported types #1157 [arrow] - Support
scalar
mathematics kernels forArray
and scalar value #1153 [arrow] - Support
DecimalArray
in sort kernel #1137 - Parquet Fuzz Tests #1053
- BooleanBufferBuilder Append Packed #1038 [arrow]
- parquet Performance Optimization: StructArrayReader Redundant Level & Bitmap Computation #1034 [parquet]
- Reduce Public Parquet API #1032 [parquet]
- Add
from_iter_values
for binary array #1188 [arrow] (Jimexist) - Add support for
MapArray
in json writer #1149 [arrow] (helgikrs)
Fixed bugs:
- Empty string arrays with no nulls are not equal #1208 [arrow]
- Pretty print a
RecordBatch
containingFloat16
triggers a panic #1193 [arrow] - Writing structs nested in lists produces an incorrect output #1184 [parquet]
- Undefined behavior for
GenericStringArray::from_iter_values
if reported iterator upper bound is incorrect #1144 [arrow] - Interval comparisons with
simd
feature asserts #1136 [arrow] - RecordReader Permits Illegal Types #1132 [parquet]
Security fixes:
- Fix undefined behavor in GenericStringArray::from_iter_values #1145 [arrow] (alamb)
- parquet: Optimized ByteArrayReader, Add UTF-8 Validation (#1040) #1082 [parquet] [arrow] (tustvold)
Documentation updates:
- Update parquet crate readme #1192 [parquet] (alamb)
- Document safety justification of some uses of
from_trusted_len_iter
#1148 [arrow] (alamb)
Performance improvements:
- Improve parquet reading performance for columns with nulls by preserving bitmask when possible (#1037) #1054 [parquet] [arrow] (tustvold)
- Improve parquet performance: Skip levels computation for required struct arrays in parquet #1035 [parquet] (tustvold)
Closed issues:
Merged pull requests:
- fix a bug in variable sized equality #1209 [arrow] (helgikrs)
- Pin WASM / packed SIMD tests to nightly-2022-01-17 #1204 (alamb)
- feat: add support for casting Duration/Interval to Int64Array #1196 [arrow] (e-dard)
- Add comparison support for fully qualified BinaryArray #1195 [arrow] (HaoYang670)
- Fix in display of
Float16Array
#1194 [arrow] (helgikrs) - update nightly version for miri #1189 (Jimexist)
- feat(parquet): support for reading structs nested within lists #1187 [parquet] (helgikrs)
- fix: Fix a bug in how definition levels are calculated for nested structs in a list #1185 [parquet] (helgikrs)
- Truncate bitmask on BooleanBufferBuilder::resize: #1183 [parquet] [arrow] (tustvold)
- Add ticket reference for false positive in clippy #1181 [arrow] (alamb)
- Fix record formatting in 1.58 #1178 [parquet] (tustvold)
- Serialize i128 as JSON string #1175 [arrow] (tustvold)
- Support DecimalType in
sort
andtake
kernels #1172 [arrow] (liukun4515) - Fix new clippy lints introduced in Rust 1.58 #1170 [parquet] [arrow] (alamb)
- Fix compilation error with simd feature #1169 [arrow] (jhorstmann)
- Fix bug while writing parquet with empty lists of structs #1166 [parquet] (helgikrs)
- Use tempfile for parquet tests #1165 [parquet] (tustvold)
- Remove left over dev/README.md file from arrow/arrow-rs split #1162 (alamb)
- Add multiply_scalar kernel #1159 [arrow] (viirya)
- Fuzz test different parquet encodings #1156 [parquet] (tustvold)
- Add subtract_scalar kernel #1152 [arrow] (viirya)
- Add add_scalar kernel #1151 [arrow] (viirya)
- Move simd right out of for_each loop #1150 [arrow] (viirya)
- Internal Remove
GenericStringArray::from_vec
andGenericStringArray::from_opt_vec
#1147 [arrow] (alamb) - Implement SIMD comparison operations for types with less than 4 lanes (i128) #1146 [arrow] (jhorstmann)
- Extends parquet fuzz tests to also tests nulls, dictionaries and row groups with multiple pages (#1053) #1110 [parquet] (tustvold)
- Generify ColumnReaderImpl and RecordReader (#1040) #1041 [parquet] (tustvold)
- BooleanBufferBuilder::append_packed (#1038) #1039 [arrow] (tustvold)
7.0.0 (2022-1-07)
Breaking changes:
pretty_format_batches
now returnsResult<impl Display>
rather thanString
: #975MutableBuffer::typed_data_mut
is markedunsafe
: #1029- UnionArray updated match latest Arrow spec, added
UnionMode
,UnionArray::new()
markedunsafe
: #885
New Features:
- Support for
Float16Array
types #888 - IPC support for
UnionArray
#654 - Dynamic comparison kernels for scalars (e.g.
eq_dyn_scalar
), includingDictionaryArray
: #1113
Enhancements:
- Added
Schema::with_metadata
andField::with_metadata
#1092 - Support for custom datetime format for inference and parsing csv files #1112
- Implement
Array
forArrayRef
for easier use #1129 - Pretty printing display support for
FixedSizeBinaryArray
#1097 - Dependency Upgrades:
pyo3
,parquet-format
,prost
,tonic
- Avoid allocating vector of indices in
lexicographical_partition_ranges
#998
Fixed bugs:
- (parquet) Fix reading of dictionary encoded pages with null values: #1130
6.5.0 (2021-12-23)
- 092fc64bbb019244887ebd0d9c9a2d3e3a9aebc0 support cast decimal to decimal (#1084) (#1093)
- 01459762ed18b504e00e7b2818fce91f19188b1e Fix like regex escaping (#1085) (#1090)
- 7c748bfccbc2eac0c1138378736b70dcb7e26a5b support cast decimal to signed numeric (#1073) (#1089)
- bd3600b6483c253ae57a38928a636d39a6b7cb02 parquet: Use constant for RLE decoder buffer size (#1070) (#1088)
- 2b5c53ecd92468fd95328637a15de7f35b6fcf28 Box RleDecoder index buffer (#1061) (#1062) (#1081)
- 78721bc1a467177679ad6196b994759cf4d73377 BooleanBufferBuilder correct buffer length (#1051) (#1052) (#1080)
- 3a5e3541d3a4db61a828011ed95c8539adf1d57c support cast signed numeric to decimal (#1044) (#1079)
- 000bdb3053098255d43288aa3e8665e8b1892a6c fix(compute): LIKE escape parenthesis (#1042) (#1078)
- e0abdb9e62772a2f853974e68e744246e7f47569 Add Schema::project and RecordBatch::project functions (#1033) (#1077)
- 31911a4d6328d889d98796b896412b3997f73e13 Remove outdated safety example from doc (#1050) (#1058)
- 71ac8620993a65a7f1f57278c3495556625356b3 Use existing array type in
take
kernel (#1046) (#1057) - 1c5902376b7f7d56cb5249db4f98a6a370ead919 Extract method to drive PageIterator -> RecordReader (#1031) (#1056)
- 7ca39361f8733b86bc0cef5ed5d74093e2c6b14d Clarify governance of arrow crate (#1030) (#1055)
6.4.0 (2021-12-10)
- 049f48559f578243935b6e512d06c4c2df360bf1 Force new cargo and target caching to fix CI (#1023) (#1024)
- ef37da3b60f71a52d5ad67e9ca810dca38b29f00 Fix a broken link and some missing styling in the main arrow crate docs (#1013) (#1019)
- f2c746a9b968714cfe05d35fcee8658371acd899 Remove out of date comment (#1008) (#1018)
- 557fc11e3b2a09a680c0cfbf38d27b13101b63fe Remove unneeded
rc
feature of serde (#990) (#1016) - b28385e096b1cf8f5fb2773d49b160f93d94fbac Docstrings for Timestamp*Array. (#988) (#1015)
- a92672e40217670d2566a85d70b0b59fffac594c Add full data validation for ArrayData::try_new() (#1007)
- 6c8b2936d7b07e1e2f5d1d48eea425a385382dfb Add boolean comparison to scalar kernels for less then, greater than (#977) (#1005)
- 14d140aeca608a23a8a6b2c251c8f53ffd377e61 Fix some typos in code and comments (#985) (#1006)
- b4507f562fb0eddfb79840871cd2733dc0e337cd Fix warnings introduced by Rust/Clippy 1.57.0 (#1004)
6.3.0 (2021-11-26)
Changes:
- 7e51df015ce851a5de444ca08b57b38e7ee959a3 add more error test case and change the code style (#952) (#976)
- 6c570cfe98d6a7a4ec74b139b733c5c72ed10015 Support read decimal data from csv reader if user provide the schema with decimal data type (#941) (#974)
- 4fa0d4d7f7d9ca0a3da2a6dfe3eae6dc2d51a79a Adding Pretty Print Support For Fixed Size List (#958) (#968)
- 9d453a3128013c03e8ed854ded76b15cc6f28be4 Fix bug in temporal utilities due to DST being ignored. (#955) (#967)
- 1b9fd9e3fb2653236513bb7dda5aa2fa14d1d831 Inferring 2. as Float64 for issue #929 (#950) (#966)
- e6c5e1c877bd94b3d6e545567f901d9962257cf8 Fix CI for latest nightly (#970) (#973)
- c96e8de457442806e18944f0b26dd06ba4cb1aee Fix primitive sort when input contains more nulls than the given sort limit (#954) (#965)
- 094037d418381584178db1d886cad3b5024b414a Update comfy-table to 5.0 (#957) (#964)
- 9f635021eee6786c5377c891218c5f88ebce07c3 Fix csv writing of timestamps to show timezone. (#849) (#963)
- f7deba4c3a050a52608462ee8a827bb8f6364140 Adding ability to parse float from number with leading decimal (#831) (#962)
- 59f96e842d05b63882f7ba285c66a9739761cf84 add ilike comparitor (#874) (#961)
- 54023c8a5543c9f9fa4955afa01189029f3e96f5 Remove unpassable cargo publish check from verify-release-candidate.sh (#882) (#949)
6.2.0 (2021-11-12)
Features / Fixes:
- 4037933e43cad9e4de027039ce14caa65f78300a Fix validation for offsets of StructArrays (#942) (#946)
- 1af9ca5d363d870550026a7b1abcb749befbb371 implement take kernel for null arrays (#939) (#944)
- 320de1c20aefbf204f6888e2ad3663863afeba9f add checker for appending i128 to decimal builder (#928) (#943)
- dff14113884ad4246a8cafb9be579ebdb4e1481f Validate arguments to ArrayData::new and null bit buffer and buffers (#810) (#936)
- c3eae1ec56303b97c9e15263063a6a13122ef194 fix some warning about unused variables in panic tests (#894) (#933)
- e80bb018450f13a30811ffd244c42917d8bf8a62 fix some clippy warnings (#896) (#930)
- bde89463b627be3f60b5569d038ca36c434da71d feat(ipc): add support for deserializing messages with nested dictionary fields (#923) (#931)
- 792544b5fb7b84224ef9745ecb9f330663c14fb4 refactor regexp_is_match_utf8_scalar to try to mitigate miri failures (#895) (#932)
- 3f0e252811cbb6e3f7c774959787dcfec985d03e Automatically retry failed MIRI runs to work around intermittent failures (#934)
- c9a9515c46d560ced00e23ff57cb10a1c97573cb Update mod.rs (#909) (#919)
- 64ed79ece67141b92dc45b8a1d43cb9d909aa6a9 Mark boolean kernels public (#913) (#920)
- 8b95fe0bbf03588c5cc00f67365c5b0dac4d7a34 doc example mistype (#904) (#918)
- 34c5eab4862cab16fdfd5f5ed6c68dce6298dfa4 allow null array to be cast to all other types (#884) (#917)
- 3c69752e55ed0c58f5a8faed918a22b45cd93766 Fix instances of UB that cause tests to not pass under miri (#878) (#916)
- 85402148c3af03d0855e81f855715ea98a7491c5 feat(ipc): Support writing dictionaries nested in structs and unions (#870) (#915)
- 03d95e626cb0e654775fefa77786674ea41be4a2 Fix references to changelog (#905)
6.1.0 (2021-10-29)
Features / Fixes:
- b42649b0088fe7762c713a41a23c1abdf8d0496d implement eq_dyn and neq_dyn (#858) (#867)
- 01743f3f10a377c1ca857cd554acbf84155766d8 fix: fix a bug in offset calculation for unions (#863) (#871)
- 8bfff793a23f0e71008c7a9eea7a54d6b913ecff add lt_bool, lt_eq_bool, gt_bool, gt_eq_bool (#860) (#868)
- 8845e91d4ab584c822e9ee903db7069551b124af fix(ipc): Support serializing structs containing dictionaries (#848) (#865)
- 620282a0d9fdd2a8ed7e8313d17ba3dec64c80e5 Implement boolean equality kernels (#844) (#857)
- 94cddcacf785be982e69689291ce034ef00220b4 Cherry pick fix parquet_derive with default features (and fix cargo publish) (#856)
- 733fd583ddb3dbe6b4d58a809c444ee16ac0eae8 Use kernel utility for parsing timestamps in csv reader. (#832) (#853)
- 2cc64937a153f632796915d2d9869d5c2a501d28 [Minor] Fix clippy errors with new rust version (1.56) and float formatting with nightly (#845) (#850)
Other:
- bfac9e5a027e3bd78b7a1ec90c75a3e385bd66bb Test out new tarpaulin version (#852) (#866)
- 809350ced392cfc78d8a1a46228d4ffc25dea9ff Update README.md (#834) (#854)
- 70582f40dd21f5c710c4946266d0563a92b92337 [MINOR] Delete temp file from docs (#836) (#855)
- a721e00014015a7e598946b6efb9b1da8080ec85 Force fresh cargo cache key in CI (#839) (#851)
6.0.0 (2021-10-13)
Breaking changes:
- Replace
ArrayData::new()
withArrayData::try_new()
andunsafe ArrayData::new_unchecked
#822 [parquet] [arrow] (alamb) - Update Bitmap::len to return bits rather than bytes #749 [arrow] (matthewmturner)
- use sort_unstable_by in primitive sorting #552 [arrow] (Jimexist)
- New MapArray support #491 [parquet] [arrow] (nevi-me)
Implemented enhancements:
- Improve parquet binary writer speed by reducing allocations #819
- Expose buffer operations #808
- Add doc examples of writing parquet files using
ArrowWriter
#788
Fixed bugs:
- JSON reader can create null struct children on empty lists #825
- Incorrect null count for cast kernel for list arrays #815
minute
andsecond
temporal kernels do not respect timezone #500- Fix data corruption in json decoder f64-to-i64 cast #652 [arrow] (xianwill)
Documentation updates:
- Doctest for PrimitiveArray using from_iter_values. #694 [arrow] (novemberkilo)
- Doctests for BinaryArray and LargeBinaryArray. #625 [arrow] (novemberkilo)
- Add links in docstrings #605 [arrow] (alamb)
5.5.0 (2021-09-24)
Implemented enhancements:
Fixed bugs:
- Converting from string to timestamp uses microseconds instead of milliseconds #780
- Document has no link to
RowColumIter
#762 - length on slices with null doesn't work #744
5.4.0 (2021-09-10)
Implemented enhancements:
- Upgrade lexical-core to 0.8 #747
append_nulls
andappend_trusted_len_iter
for PrimitiveBuilder #725- Optimize MutableArrayData::extend for null buffers #397
Fixed bugs:
- Arithmetic with scalars doesn't work on slices #742
- Comparisons with scalar don't work on slices #740
unary
kernel doesn't respect offset #738new_null_array
creates invalid struct arrays #734- --no-default-features is broken for parquet #733 [parquet]
Bitmap::len
returns the number of bytes, not bits. #730- Decimal logical type is formatted incorrectly by print_schema #713
- parquet_derive does not support chrono time values #711
- Numeric overflow when formatting Decimal type #710
- The integration tests are not running #690
Closed issues:
- Question: Is there no way to create a DictionaryArray with a pre-arranged mapping? #729
5.3.0 (2021-08-26)
Implemented enhancements:
- Add optimized filter kernel for regular expression matching #697
- Can't cast from timestamp array to string array #587
Fixed bugs:
- 'Encoding DELTA_BYTE_ARRAY is not supported' with parquet arrow readers #708
- Support reading json string into binary data type. #701
Closed issues:
5.2.0 (2021-08-12)
Implemented enhancements:
- Make rand an optional dependency #671
- Remove undefined behavior in
value
method of boolean and primitive arrays #645 - Avoid materialization of indices in filter_record_batch for single arrays #636
- Add a note about arrow crate security / safety #627
- Allow the creation of String arrays from an interator of &Option<&str> #598
- Support arrow map datatype #395
Fixed bugs:
- Parquet fixed length byte array columns write byte array statistics #660 [parquet]
- Parquet boolean columns write Int32 statistics #659 [parquet]
- Writing Parquet with a boolean column fails #657
- JSON decoder data corruption for large i64/u64 #653
- Incorrect min/max statistics for strings in parquet files #641 [parquet]
Closed issues:
5.1.0 (2021-07-29)
Implemented enhancements:
- Make FFI_ArrowArray empty() public #602
- exponential sort can be used to speed up lexico partition kernel #586
- Implement sort() for binary array #568
- primitive sorting can be improved and more consistent with and without
limit
if sorted unstably #553
Fixed bugs:
- Confusing memory usage with CSV reader #623
- FFI implementation deviates from specification for array release #595
- Parquet file content is different if
~/.cargo
is in a git checkout #589 - Ensure output of MIRI is checked for success #581
- MIRI failure in
array::ffi::tests::test_struct
and other ffi tests #580 - ListArray equality check may return wrong result #570
- cargo audit failed #561
- ArrayData::slice() does not work for nested types such as StructArray #554
Documentation updates:
- More examples of how to construct Arrays #301
Closed issues:
5.0.0 (2021-07-14)
Breaking changes:
- Remove lifetime from DynComparator #543 [arrow]
- Simplify interactions with arrow flight APIs #376 [arrow-flight]
- refactor: remove lifetime from DynComparator #542 [arrow] (e-dard)
- use iterator for partition kernel instead of generating vec #438 [arrow] (Jimexist)
- Remove DictionaryArray::keys_array method #419 [arrow] (jhorstmann)
- simplify interactions with arrow flight APIs #377 [arrow-flight] (garyanaplan)
- return reference from DictionaryArray::values() (#313) #314 [arrow] (tustvold)
Implemented enhancements:
- Allow creation of StringArrays from Vec<String> #519 [arrow]
- Implement RecordBatch::concat #461 [arrow]
- Implement RecordBatch::slice() to slice RecordBatches #460 [arrow]
- Add a RecordBatch::split to split large batches into a set of smaller batches #343
- generate parquet schema from rust struct #539 [parquet] (nevi-me)
- Implement
RecordBatch::concat
#537 [arrow] (silathdiir) - Implement function slice for RecordBatch #490 [arrow] (b41sh)
- add lexicographically partition points and ranges #424 [arrow] (Jimexist)
- allow to read non-standard CSV #326 [arrow] (kazuk)
- parquet: Speed up
BitReader
/DeltaBitPackDecoder
#325 [parquet] (kornholi) - ARROW-12343: [Rust] Support auto-vectorization for min/max #9 [arrow] (Dandandan)
- ARROW-12411: [Rust] Create RecordBatches from Iterators #7 [arrow] (alamb)
Fixed bugs:
- Error building on master - error: cyclic package dependency: package
ahash v0.7.4
depends on itself. Cycle #544 - IPC reader panics with out of bounds error #541
- Take kernel doesn't handle nulls and structs correctly #530 [arrow]
- master fails to compile with
default-features=false
#529 - README developer instructions out of date #523
- Update rustc and packed_simd in CI before 5.0 release #517
- Incorrect memory usage calculation for dictionary arrays #503 [arrow]
- sliced null buffers lead to incorrect result in take kernel (and probably on other places) #502
- Cast of utf8 types and list container types don't respect offset #334 [arrow]
- fix take kernel null handling on structs #531 [arrow] (bjchambers)
- Correct array memory usage calculation for dictionary arrays #505 [arrow] (jhorstmann)
- parquet: improve BOOLEAN writing logic and report error on encoding fail #443 [parquet] (garyanaplan)
- Fix bug with null buffer offset in boolean not kernel #418 [arrow] (jhorstmann)
- respect offset in utf8 and list casts #335 [arrow] (ritchie46)
- Fix comparison of dictionaries with different values arrays (#332) #333 [arrow] (tustvold)
- ensure null-counts are written for all-null columns #307 [parquet] (crepererum)
- fix invalid null handling in filter #296 [arrow] (ritchie46)
- fix NaN handling in parquet statistics #256 (crepererum)
Documentation updates:
- Improve arrow's crate's readme on crates.io #463
- Clean up README.md in advance of the 5.0 release #536 [arrow] [arrow-flight] [parquet] (alamb)
- fix readme instructions to reflect new structure #524 (marcvanheerden)
- Improve docs for NullArray, new_null_array and new_empty_array #240 [arrow] (alamb)
Merged pull requests:
- Fix default arrow build #533 [arrow] (alamb)
- Add tests for building applications using arrow with different feature flags #532 [arrow] (alamb)
- Remove unused futures dependency from arrow-flight #528 [arrow-flight] (alamb)
- CI: update rust nightly and packed_simd #525 [arrow] (ritchie46)
- Support
StringArray
creation from String Vec #522 [arrow] (silathdiir) - Fix parquet benchmark schema #513 [parquet] (nevi-me)
- Fix parquet definition levels #511 [parquet] (nevi-me)
- Fix for primitive and boolean take kernel for nullable indices with an offset #509 [arrow] (jhorstmann)
- Bump flatbuffers #499 [arrow] (PsiACE)
- implement second/minute helpers for temporal #493 [arrow] (ovr)
- special case concatenating single element array shortcut #492 [arrow] (Jimexist)
- update docs to reflect recent changes (joins and window functions) #489 (Jimexist)
- Update rand, proc-macro and zstd dependencies #488 [arrow] [arrow-flight] [parquet] (alamb)
- Doctest for GenericListArray. #474 [arrow] (novemberkilo)
- remove stale comment on
ArrayData
equality and update unit tests #472 (Jimexist) - remove unused patch file #471 (Jimexist)
- fix clippy warnings for rust 1.53 #470 (Jimexist)
- Fix PR labeler #468 (Dandandan)
- Tweak dev backporting docs #466 (alamb)
- Unvendor Archery #459 (kszucs)
- Add sort boolean benchmark #457 (alamb)
- Add C data interface for decimal128 and timestamp #453 [arrow] (alippai)
- Implement the Iterator trait for the json Reader. #451 [arrow] (LaurentMazare)
- Update release docs + release email template #450 (alamb)
- remove clippy unnecessary wraps suppresions in cast kernel #449 (Jimexist)
- Use partition for bool sort #448 (Jimexist)
- remove unnecessary wraps in sort #445 (Jimexist)
- Python FFI bridge for Schema, Field and DataType #439 [arrow] (kszucs)
- Update release Readme.md #436 (alamb)
- Derive Eq and PartialEq for SortOptions #425 (tustvold)
- refactor lexico sort for future code reuse #423 (Jimexist)
- Reenable MIRI check on PRs #421 (alamb)
- Sort by float lists #420 (medwards)
- Fix out of bounds read in bit chunk iterator #416 (jhorstmann)
- Doctests for DecimalArray. #414 (novemberkilo)
- Add Decimal to CsvWriter and improve debug display #406 (alippai)
- MINOR: update install instruction #400 (alippai)
- use prettier to auto format md files #398 (Jimexist)
- window::shift to work for all array types #388 (Jimexist)
- add more tests for window::shift and handle boundary cases #386 (Jimexist)
- Implement faster arrow array reader #384 (yordan-pavlov)
- Add set_bit to BooleanBufferBuilder to allow mutating bit in index #383 (boazberman)
- make sure that only concat preallocates buffers #382 (ritchie46)
- Respect max rowgroup size in Arrow writer #381 [parquet] (nevi-me)
- Fix typo in release script, update release location #380 (alamb)
- Doctests for FixedSizeBinaryArray #378 (novemberkilo)
- Simplify shift kernel using new_null_array #370 (Dandandan)
- allow
SliceableCursor
to be constructed from anArc
directly #369 (crepererum) - Add doctest for ArrayBuilder #367 (alippai)
- Fix version in readme #365 (domoritz)
- Remove superfluous space #363 (domoritz)
- Add crate badges #362 (domoritz)
- Disable MIRI check until it runs cleanly on CI #360 (alamb)
- Only register Flight.proto with cargo if it exists #351 (tustvold)
- Reduce memory usage of concat (large)utf8 #348 (ritchie46)
- Fix filter UB and add fast path #341 (ritchie46)
- Automatic cherry-pick script #339 (alamb)
- Doctests for BooleanArray. #338 (novemberkilo)
- feature gate ipc reader/writer #336 (ritchie46)
- Add ported Rust release verification script #331 (wesm)
- Doctests for StringArray and LargeStringArray. #330 (novemberkilo)
- inline PrimitiveArray::value #329 (ritchie46)
- Enable wasm32 as a target architecture for the SIMD feature #324 (roee88)
- Fix undefined behavior in FFI and enable MIRI checks on CI #323 (roee88)
- Mutablebuffer::shrink_to_fit #318 [arrow] (ritchie46)
- Add (simd) modulus op #317 (gangliao)
- feature gate csv functionality #312 [arrow] (ritchie46)
- [Minor] Version upgrades #304 (Dandandan)
- Remove old release scripts #293 (alamb)
- Add Send to the ArrayBuilder trait #291 (Max-Meldrum)
- Added changelog generator script and configuration. #289 (jorgecarleitao)
- manually bump development version #288 (nevi-me)
- Fix FFI and add support for Struct type #287 (roee88)
- Fix subtraction underflow when sorting string arrays with many nulls #285 (medwards)
- Speed up bound checking in
take
#281 (Dandandan) - Update PR template by commenting out instructions #278 (nevi-me)
- Added Decimal support to pretty-print display utility (#230) #273 (mgill25)
- Fix null struct and list roundtrip #270 (nevi-me)
- 1.52 clippy fixes #267 (nevi-me)
- Fix typo in csv/reader.rs #265 (domoritz)
- Fix empty Schema::metadata deserialization error #260 (hulunbier)
- update datafusion and ballista doc links #259 (Jimexist)
- support full u32 and u64 roundtrip through parquet #258 [parquet] (crepererum)
- [MINOR] Added env to run rust in integration. #253 (jorgecarleitao)
- [Minor] Made integration tests always run. #248 (jorgecarleitao)
- fix parquet max_definition for non-null structs #246 (nevi-me)
- Disabled rebase needed until demonstrate working. #243 (jorgecarleitao)
- pin flatbuffers to 0.8.4 #239 (ritchie46)
- sort_primitive result is capped to the min of limit or values.len #236 (medwards)
- Read list field correctly #234 [parquet] (nevi-me)
- Fix code examples for RecordBatch::try_from_iter #231 (alamb)
- Support string dictionaries in csv reader (#228) #229 (tustvold)
- support LargeUtf8 in sort kernel #26 (ritchie46)
- Removed unused files #22 (jorgecarleitao)
- ARROW-12504: Buffer::from_slice_ref set correct capacity #18 [arrow] (tustvold)
- Add GitHub templates #17 (andygrove)
- ARROW-12493: Add support for writing dictionary arrays to CSV and JSON #16 [arrow] (tustvold)
- ARROW-12426: [Rust] Fix concatentation of arrow dictionaries #15 [arrow] (tustvold)
- Update repository and homepage urls #14 [arrow] [arrow-flight] [parquet] (Dandandan)
- Added rebase-needed bot #13 (jorgecarleitao)
- Added Integration tests against arrow #10 (jorgecarleitao)
4.4.0 (2021-06-24)
Breaking changes:
- migrate partition kernel to use Iterator trait #437 [arrow]
- Remove DictionaryArray::keys_array #391 [arrow]
Implemented enhancements:
- sort kernel boolean sort can be O(n) #447 [arrow]
- C data interface for decimal128, timestamp, date32 and date64 #413
- Add Decimal to CsvWriter #405
- Use iterators to increase performance of creating Arrow arrays #200 [parquet]
Fixed bugs:
- Release Audit Tool (RAT) is not being triggered #481
- Security Vulnerabilities: flatbuffers:
read_scalar
andread_scalar_at
allow transmuting values withoutunsafe
blocks #476 - Clippy broken after upgrade to rust 1.53 #467
- Pull Request Labeler is not working #462
- Arrow 4.3 release: error[E0658]: use of unstable library feature 'partition_point': new API #456
- parquet reading hangs when row_group contains more than 2048 rows of data #349
- Fail to build arrow #247
- JSON reader does not implement iterator #193 [arrow]
Security fixes:
- Ensure a successful MIRI Run on CI #227
Closed issues:
- sort kernel has a lot of unnecessary wrapping #446
- [Parquet] Plain encoded boolean column chunks limited to 2048 values #48 [parquet]
4.3.0 (2021-06-10)
Implemented enhancements:
- Add partitioning kernel for sorted arrays #428 [arrow]
- Implement sort by float lists #427 [arrow]
- Derive Eq and PartialEq for SortOptions #426 [arrow]
- use prettier and github action to normalize markdown document syntax #399
- window::shift can work for more than just primitive array type #392
- Doctest for ArrayBuilder #366
Fixed bugs:
- Boolean
not
kernel does not take offset of null buffer into account #417 - my contribution not marged in 4.2 release #394
- window::shift shall properly handle boundary cases #387
- Parquet
WriterProperties.max_row_group_size
not wired up #257 - Out of bound reads in chunk iterator #198 [arrow]
4.2.0 (2021-05-29)
Breaking changes:
Implemented enhancements:
- Simplify shift kernel using null array #371
- Provide
Arc
-based constructor forparquet::util::cursor::SliceableCursor
#368 - Add badges to crates #361
- Consider inlining PrimitiveArray::value #328
- Implement automated release verification script #327
- Add wasm32 to the list of target architectures of the simd feature #316
- add with_escape for csv::ReaderBuilder #315 [arrow]
- IPC feature gate #310
- csv feature gate #309 [arrow]
- Add
shrink_to
/shrink_to_fit
toMutableBuffer
#297
Fixed bugs:
- Incorrect crate setup instructions #364
- Arrow-flight only register rerun-if-changed if file exists #350
- Dictionary Comparison Uses Wrong Values Array #332
- Undefined behavior in FFI implementation #322
- All-null column get wrong parquet null-counts #306 [parquet]
- Filter has inconsistent null handling #295
4.1.0 (2021-05-17)
Implemented enhancements:
- Add Send to ArrayBuilder #290 [arrow]
- Improve performance of bound checking option #280 [arrow]
- extend compute kernel arity to include nullary functions #276
- Implement FFI / CDataInterface for Struct Arrays #251 [arrow]
- Add support for pretty-printing Decimal numbers #230 [arrow]
- CSV Reader String Dictionary Support #228 [arrow]
- Add Builder interface for adding Arrays to record batches #210 [arrow]
- Support auto-vectorization for min/max #209 [arrow]
- Support LargeUtf8 in sort kernel #25 [arrow]
Fixed bugs:
- no method named
select_nth_unstable_by
found for mutable reference&mut [T]
#283 - Rust 1.52 Clippy error #266
- NaNs can break parquet statistics #255 [parquet]
- u64::MAX does not roundtrip through parquet #254 [parquet]
- Integration tests failing to compile (flatbuffer) #249 [arrow]
- Fix compatibility quirks between arrow and parquet structs #245 [parquet]
- Unable to write non-null Arrow structs to Parquet #244 [parquet]
- schema: missing field
metadata
when deserialize #241 [arrow] - Arrow does not compile due to flatbuffers upgrade #238 [arrow]
- Sort with limit panics for the limit includes some but not all nulls, for large arrays #235 [arrow]
- arrow-rs contains a copy of the "format" directory #233 [arrow]
- Fix SEGFAULT/ SIGILL in child-data ffi #206 [arrow]
- Read list field correctly in <struct<list>> #167 [parquet]
- FFI listarray lead to undefined behavior. #20
Security fixes:
Documentation updates:
- Comment out the instructions in the PR template #277
- Update links to datafusion and ballista in README.md #19
- Update "repository" in Cargo.toml #12
Closed issues:
- Arrow Aligned Vec #268
- [Rust]: Tracking issue for AVX-512 #220 [arrow]
- Umbrella issue for clippy integration #217 [arrow]
- Support sort #215 [arrow]
- Support stable Rust #214 [arrow]
- Remove Rust and point integration tests to arrow-rs repo #211 [arrow]
- ArrayData buffers are inconsistent accross implementations #207
- 3.0.1 patch release #204
- Document patch release process #202
- Simplify Offset #186 [arrow]
- Typed Bytes #185 [arrow]
- [CI]docker-compose setup should enable caching #175
- Improve take primitive performance #174
- [CI] Try out buildkite #165 [arrow]
- Update assignees in JIRA where missing #160
- [Rust]: From<ArrayDataRef> implementations should validate data type #103 [arrow]
- [DataFusion] Verify that projection push down does not remove aliases columns #99 [arrow]
- [Rust][DataFusion] Implement modulus expression #98 [arrow]
- [DataFusion] Add constant folding to expressions during logically planning #96 [arrow]
- [DataFusion] DataFrame.collect should return RecordBatchReader #95 [arrow]
- [Rust][DataFusion] Add FORMAT to explain plan and an easy to visualize format #94 [arrow]
- [DataFusion] Implement metrics framework #90 [arrow]
- [DataFusion] Implement micro benchmarks for each operator #89 [arrow]
- [DataFusion] Implement pretty print for physical query plan #88 [arrow]
- [Archery] Support rust clippy in the lint command #83
- [rust][datafusion] optimize count(*) queries on parquet sources #75 [arrow]
- [Rust][DataFusion] Improve like/nlike performance #71 [arrow]
- [DataFusion] Implement optimizer rule to remove redundant projections #56 [arrow]
- [DataFusion] Parquet data source does not support complex types #39 [arrow]
- Merge utils from Parquet and Arrow #32 [arrow] [parquet]
- Add benchmarks for Parquet #30 [parquet]
- Mark methods that do not perform bounds checking as unsafe #28 [arrow]
- Test issue #24 [arrow]
- This is a test issue #11