9.0.0 (2022-02-04)
Breaking changes:
- Add
Send
+Sync
toDataType
,RowGroupReader
,FileReader
,ChunkReader
. #1264 - Rename the function
Bitmap::len
toBitmap::bit_len
to clarify its meaning #1242 [parquet] [arrow] (HaoYang670) - Remove unused / broken
memory-check
feature #1222 [arrow] (jhorstmann) - Potentially buffer multiple
RecordBatches
before writing a parquet row group inArrowWriter
#1214 [parquet] [arrow] (tustvold)
Implemented enhancements:
- Add
async
arrow parquet reader #1154 [parquet] [arrow] (tustvold) - Rename
Bitmap::len
toBitmap::bit_len
#1233 - Extend CSV schema inference to allow scientific notation for floating point types #1215 [arrow]
- Write Multiple RecordBatch to Parquet Row Group #1211
- Add doc examples for
eq_dyn
etc. #1202 [arrow] - Add comparison kernels for
BinaryArray
#1108 impl ArrowNativeType for i128
#1098- Remove
Copy
trait bound from dyn scalar kernels #1243 [arrow] (matthewmturner) - Add
into_inner
for IPCFileWriter
#1236 [arrow] (yjshen) - [Minor]Re-export
array::builder::make_builder
to make it available for downstream #1235 [arrow] (yjshen)
Fixed bugs:
- Parquet v8.0.0 panics when reading all null column to NullArray #1245 [parquet]
- Get
Unknown configuration option rust-version
when running the rust format command #1240 Bitmap
Length Validation is Incorrect #1231 [arrow]- Writing sliced
ListArray
orMapArray
ignore offsets #1226 [parquet] - Remove broken
memory-tracking
crate feature #1171 - Revert making
parquet::data_type
andparquet::arrow::schema
experimental #1244 [parquet] (tustvold)
Documentation updates:
- Update parquet crate documentation and examples #1253 [parquet] [arrow] (alamb)
- Refresh parquet readme / contributing guide #1252 [parquet] (alamb)
- Add docs examples for dynamically compare functions #1250 [arrow] (HaoYang670)
- Add Rust Docs examples for UnionArray #1241 [arrow] (HaoYang670)
- Improve documentation for Bitmap #1237 [arrow] (alamb)
Performance improvements:
- Improve performance for arithmetic kernels with
simd
feature enabled (except for division/modulo) #1221 [arrow] (jhorstmann) - Do not concatenate identical dictionaries #1219 [arrow] (tustvold)
- Preserve dictionary encoding when decoding parquet into Arrow arrays, 60x perf improvement (#171) #1180 [parquet] (tustvold)
Closed issues:
UnalignedBitChunkIterator
to that iterates through already alignedu64
blocks #1227- Remove unused
ArrowArrayReader
in parquet #1197 [parquet]
Merged pull requests:
- Upgrade clap to 3.0.0 #1261 [parquet] (Jimexist)
- Update chrono-tz requirement from 0.4 to 0.6 #1259 [arrow] (dependabot[bot])
- Update zstd requirement from 0.9 to 0.10 #1257 [parquet] (dependabot[bot])
- Fix NullArrayReader (#1245) #1246 [parquet] (tustvold)
- dyn compare for binary array #1238 [arrow] (HaoYang670)
- Remove arrow array reader (#1197) #1234 [parquet] (tustvold)
- Fix null bitmap length validation (#1231) #1232 [arrow] (tustvold)
- Faster bitmask iteration #1228 [parquet] [arrow] (tustvold)
- Add non utf8 values into the test cases of BinaryArray comparison #1220 [arrow] (HaoYang670)
- Update DECIMAL_RE to allow scientific notation in auto inferred schemas #1216 [arrow] (pjmore)
8.0.0 (2022-01-20)
Breaking changes:
- Return error from JSON writer rather than panic #1205 [arrow] (Ted-Jiang)
- Remove
ArrowSignedNumericType
to Simplify and reduce code duplication in arithmetic kernels #1161 [arrow] (jhorstmann) - Restrict RecordReader and friends to scalar types (#1132) #1155 [parquet] (tustvold)
- Move more parquet functionality behind experimental feature flag (#1032) #1134 [parquet] (tustvold)
Implemented enhancements:
- Parquet reader should be able to read structs within list #1186 [parquet]
- Disable serde_json
arbitrary_precision
feature flag #1174 [arrow] - Simplify and reduce code duplication in arithmetic.rs #1160 [arrow]
- Return
Err
from JSON writer rather thanpanic!
for unsupported types #1157 [arrow] - Support
scalar
mathematics kernels forArray
and scalar value #1153 [arrow] - Support
DecimalArray
in sort kernel #1137 - Parquet Fuzz Tests #1053
- BooleanBufferBuilder Append Packed #1038 [arrow]
- parquet Performance Optimization: StructArrayReader Redundant Level & Bitmap Computation #1034 [parquet]
- Reduce Public Parquet API #1032 [parquet]
- Add
from_iter_values
for binary array #1188 [arrow] (Jimexist) - Add support for
MapArray
in json writer #1149 [arrow] (helgikrs)
Fixed bugs:
- Empty string arrays with no nulls are not equal #1208 [arrow]
- Pretty print a
RecordBatch
containingFloat16
triggers a panic #1193 [arrow] - Writing structs nested in lists produces an incorrect output #1184 [parquet]
- Undefined behavior for
GenericStringArray::from_iter_values
if reported iterator upper bound is incorrect #1144 [arrow] - Interval comparisons with
simd
feature asserts #1136 [arrow] - RecordReader Permits Illegal Types #1132 [parquet]
Security fixes:
- Fix undefined behavor in GenericStringArray::from_iter_values #1145 [arrow] (alamb)
- parquet: Optimized ByteArrayReader, Add UTF-8 Validation (#1040) #1082 [parquet] [arrow] (tustvold)
Documentation updates:
- Update parquet crate readme #1192 [parquet] (alamb)
- Document safety justification of some uses of
from_trusted_len_iter
#1148 [arrow] (alamb)
Performance improvements:
- Improve parquet reading performance for columns with nulls by preserving bitmask when possible (#1037) #1054 [parquet] [arrow] (tustvold)
- Improve parquet performance: Skip levels computation for required struct arrays in parquet #1035 [parquet] (tustvold)
Closed issues:
Merged pull requests:
- fix a bug in variable sized equality #1209 [arrow] (helgikrs)
- Pin WASM / packed SIMD tests to nightly-2022-01-17 #1204 (alamb)
- feat: add support for casting Duration/Interval to Int64Array #1196 [arrow] (e-dard)
- Add comparison support for fully qualified BinaryArray #1195 [arrow] (HaoYang670)
- Fix in display of
Float16Array
#1194 [arrow] (helgikrs) - update nightly version for miri #1189 (Jimexist)
- feat(parquet): support for reading structs nested within lists #1187 [parquet] (helgikrs)
- fix: Fix a bug in how definition levels are calculated for nested structs in a list #1185 [parquet] (helgikrs)
- Truncate bitmask on BooleanBufferBuilder::resize: #1183 [parquet] [arrow] (tustvold)
- Add ticket reference for false positive in clippy #1181 [arrow] (alamb)
- Fix record formatting in 1.58 #1178 [parquet] (tustvold)
- Serialize i128 as JSON string #1175 [arrow] (tustvold)
- Support DecimalType in
sort
andtake
kernels #1172 [arrow] (liukun4515) - Fix new clippy lints introduced in Rust 1.58 #1170 [parquet] [arrow] (alamb)
- Fix compilation error with simd feature #1169 [arrow] (jhorstmann)
- Fix bug while writing parquet with empty lists of structs #1166 [parquet] (helgikrs)
- Use tempfile for parquet tests #1165 [parquet] (tustvold)
- Remove left over dev/README.md file from arrow/arrow-rs split #1162 (alamb)
- Add multiply_scalar kernel #1159 [arrow] (viirya)
- Fuzz test different parquet encodings #1156 [parquet] (tustvold)
- Add subtract_scalar kernel #1152 [arrow] (viirya)
- Add add_scalar kernel #1151 [arrow] (viirya)
- Move simd right out of for_each loop #1150 [arrow] (viirya)
- Internal Remove
GenericStringArray::from_vec
andGenericStringArray::from_opt_vec
#1147 [arrow] (alamb) - Implement SIMD comparison operations for types with less than 4 lanes (i128) #1146 [arrow] (jhorstmann)
- Extends parquet fuzz tests to also tests nulls, dictionaries and row groups with multiple pages (#1053) #1110 [parquet] (tustvold)
- Generify ColumnReaderImpl and RecordReader (#1040) #1041 [parquet] (tustvold)
- BooleanBufferBuilder::append_packed (#1038) #1039 [arrow] (tustvold)
7.0.0 (2022-1-07)
Breaking changes:
pretty_format_batches
now returnsResult<impl Display>
rather thanString
: #975MutableBuffer::typed_data_mut
is markedunsafe
: #1029- UnionArray updated match latest Arrow spec, added
UnionMode
,UnionArray::new()
markedunsafe
: #885
New Features:
- Support for
Float16Array
types #888 - IPC support for
UnionArray
#654 - Dynamic comparison kernels for scalars (e.g.
eq_dyn_scalar
), includingDictionaryArray
: #1113
Enhancements:
- Added
Schema::with_metadata
andField::with_metadata
#1092 - Support for custom datetime format for inference and parsing csv files #1112
- Implement
Array
forArrayRef
for easier use #1129 - Pretty printing display support for
FixedSizeBinaryArray
#1097 - Dependency Upgrades:
pyo3
,parquet-format
,prost
,tonic
- Avoid allocating vector of indices in
lexicographical_partition_ranges
#998
Fixed bugs:
- (parquet) Fix reading of dictionary encoded pages with null values: #1130
6.5.0 (2021-12-23)
- 092fc64bbb019244887ebd0d9c9a2d3e3a9aebc0 support cast decimal to decimal (#1084) (#1093)
- 01459762ed18b504e00e7b2818fce91f19188b1e Fix like regex escaping (#1085) (#1090)
- 7c748bfccbc2eac0c1138378736b70dcb7e26a5b support cast decimal to signed numeric (#1073) (#1089)
- bd3600b6483c253ae57a38928a636d39a6b7cb02 parquet: Use constant for RLE decoder buffer size (#1070) (#1088)
- 2b5c53ecd92468fd95328637a15de7f35b6fcf28 Box RleDecoder index buffer (#1061) (#1062) (#1081)
- 78721bc1a467177679ad6196b994759cf4d73377 BooleanBufferBuilder correct buffer length (#1051) (#1052) (#1080)
- 3a5e3541d3a4db61a828011ed95c8539adf1d57c support cast signed numeric to decimal (#1044) (#1079)
- 000bdb3053098255d43288aa3e8665e8b1892a6c fix(compute): LIKE escape parenthesis (#1042) (#1078)
- e0abdb9e62772a2f853974e68e744246e7f47569 Add Schema::project and RecordBatch::project functions (#1033) (#1077)
- 31911a4d6328d889d98796b896412b3997f73e13 Remove outdated safety example from doc (#1050) (#1058)
- 71ac8620993a65a7f1f57278c3495556625356b3 Use existing array type in
take
kernel (#1046) (#1057) - 1c5902376b7f7d56cb5249db4f98a6a370ead919 Extract method to drive PageIterator -> RecordReader (#1031) (#1056)
- 7ca39361f8733b86bc0cef5ed5d74093e2c6b14d Clarify governance of arrow crate (#1030) (#1055)
6.4.0 (2021-12-10)
- 049f48559f578243935b6e512d06c4c2df360bf1 Force new cargo and target caching to fix CI (#1023) (#1024)
- ef37da3b60f71a52d5ad67e9ca810dca38b29f00 Fix a broken link and some missing styling in the main arrow crate docs (#1013) (#1019)
- f2c746a9b968714cfe05d35fcee8658371acd899 Remove out of date comment (#1008) (#1018)
- 557fc11e3b2a09a680c0cfbf38d27b13101b63fe Remove unneeded
rc
feature of serde (#990) (#1016) - b28385e096b1cf8f5fb2773d49b160f93d94fbac Docstrings for Timestamp*Array. (#988) (#1015)
- a92672e40217670d2566a85d70b0b59fffac594c Add full data validation for ArrayData::try_new() (#1007)
- 6c8b2936d7b07e1e2f5d1d48eea425a385382dfb Add boolean comparison to scalar kernels for less then, greater than (#977) (#1005)
- 14d140aeca608a23a8a6b2c251c8f53ffd377e61 Fix some typos in code and comments (#985) (#1006)
- b4507f562fb0eddfb79840871cd2733dc0e337cd Fix warnings introduced by Rust/Clippy 1.57.0 (#1004)
6.3.0 (2021-11-26)
Changes:
- 7e51df015ce851a5de444ca08b57b38e7ee959a3 add more error test case and change the code style (#952) (#976)
- 6c570cfe98d6a7a4ec74b139b733c5c72ed10015 Support read decimal data from csv reader if user provide the schema with decimal data type (#941) (#974)
- 4fa0d4d7f7d9ca0a3da2a6dfe3eae6dc2d51a79a Adding Pretty Print Support For Fixed Size List (#958) (#968)
- 9d453a3128013c03e8ed854ded76b15cc6f28be4 Fix bug in temporal utilities due to DST being ignored. (#955) (#967)
- 1b9fd9e3fb2653236513bb7dda5aa2fa14d1d831 Inferring 2. as Float64 for issue #929 (#950) (#966)
- e6c5e1c877bd94b3d6e545567f901d9962257cf8 Fix CI for latest nightly (#970) (#973)
- c96e8de457442806e18944f0b26dd06ba4cb1aee Fix primitive sort when input contains more nulls than the given sort limit (#954) (#965)
- 094037d418381584178db1d886cad3b5024b414a Update comfy-table to 5.0 (#957) (#964)
- 9f635021eee6786c5377c891218c5f88ebce07c3 Fix csv writing of timestamps to show timezone. (#849) (#963)
- f7deba4c3a050a52608462ee8a827bb8f6364140 Adding ability to parse float from number with leading decimal (#831) (#962)
- 59f96e842d05b63882f7ba285c66a9739761cf84 add ilike comparitor (#874) (#961)
- 54023c8a5543c9f9fa4955afa01189029f3e96f5 Remove unpassable cargo publish check from verify-release-candidate.sh (#882) (#949)
6.2.0 (2021-11-12)
Features / Fixes:
- 4037933e43cad9e4de027039ce14caa65f78300a Fix validation for offsets of StructArrays (#942) (#946)
- 1af9ca5d363d870550026a7b1abcb749befbb371 implement take kernel for null arrays (#939) (#944)
- 320de1c20aefbf204f6888e2ad3663863afeba9f add checker for appending i128 to decimal builder (#928) (#943)
- dff14113884ad4246a8cafb9be579ebdb4e1481f Validate arguments to ArrayData::new and null bit buffer and buffers (#810) (#936)
- c3eae1ec56303b97c9e15263063a6a13122ef194 fix some warning about unused variables in panic tests (#894) (#933)
- e80bb018450f13a30811ffd244c42917d8bf8a62 fix some clippy warnings (#896) (#930)
- bde89463b627be3f60b5569d038ca36c434da71d feat(ipc): add support for deserializing messages with nested dictionary fields (#923) (#931)
- 792544b5fb7b84224ef9745ecb9f330663c14fb4 refactor regexp_is_match_utf8_scalar to try to mitigate miri failures (#895) (#932)
- 3f0e252811cbb6e3f7c774959787dcfec985d03e Automatically retry failed MIRI runs to work around intermittent failures (#934)
- c9a9515c46d560ced00e23ff57cb10a1c97573cb Update mod.rs (#909) (#919)
- 64ed79ece67141b92dc45b8a1d43cb9d909aa6a9 Mark boolean kernels public (#913) (#920)
- 8b95fe0bbf03588c5cc00f67365c5b0dac4d7a34 doc example mistype (#904) (#918)
- 34c5eab4862cab16fdfd5f5ed6c68dce6298dfa4 allow null array to be cast to all other types (#884) (#917)
- 3c69752e55ed0c58f5a8faed918a22b45cd93766 Fix instances of UB that cause tests to not pass under miri (#878) (#916)
- 85402148c3af03d0855e81f855715ea98a7491c5 feat(ipc): Support writing dictionaries nested in structs and unions (#870) (#915)
- 03d95e626cb0e654775fefa77786674ea41be4a2 Fix references to changelog (#905)
6.1.0 (2021-10-29)
Features / Fixes:
- b42649b0088fe7762c713a41a23c1abdf8d0496d implement eq_dyn and neq_dyn (#858) (#867)
- 01743f3f10a377c1ca857cd554acbf84155766d8 fix: fix a bug in offset calculation for unions (#863) (#871)
- 8bfff793a23f0e71008c7a9eea7a54d6b913ecff add lt_bool, lt_eq_bool, gt_bool, gt_eq_bool (#860) (#868)
- 8845e91d4ab584c822e9ee903db7069551b124af fix(ipc): Support serializing structs containing dictionaries (#848) (#865)
- 620282a0d9fdd2a8ed7e8313d17ba3dec64c80e5 Implement boolean equality kernels (#844) (#857)
- 94cddcacf785be982e69689291ce034ef00220b4 Cherry pick fix parquet_derive with default features (and fix cargo publish) (#856)
- 733fd583ddb3dbe6b4d58a809c444ee16ac0eae8 Use kernel utility for parsing timestamps in csv reader. (#832) (#853)
- 2cc64937a153f632796915d2d9869d5c2a501d28 [Minor] Fix clippy errors with new rust version (1.56) and float formatting with nightly (#845) (#850)
Other:
- bfac9e5a027e3bd78b7a1ec90c75a3e385bd66bb Test out new tarpaulin version (#852) (#866)
- 809350ced392cfc78d8a1a46228d4ffc25dea9ff Update README.md (#834) (#854)
- 70582f40dd21f5c710c4946266d0563a92b92337 [MINOR] Delete temp file from docs (#836) (#855)
- a721e00014015a7e598946b6efb9b1da8080ec85 Force fresh cargo cache key in CI (#839) (#851)
6.0.0 (2021-10-13)
Breaking changes:
- Replace
ArrayData::new()
withArrayData::try_new()
andunsafe ArrayData::new_unchecked
#822 [parquet] [arrow] (alamb) - Update Bitmap::len to return bits rather than bytes #749 [arrow] (matthewmturner)
- use sort_unstable_by in primitive sorting #552 [arrow] (Jimexist)
- New MapArray support #491 [parquet] [arrow] (nevi-me)
Implemented enhancements:
- Improve parquet binary writer speed by reducing allocations #819
- Expose buffer operations #808
- Add doc examples of writing parquet files using
ArrowWriter
#788
Fixed bugs:
- JSON reader can create null struct children on empty lists #825
- Incorrect null count for cast kernel for list arrays #815
minute
andsecond
temporal kernels do not respect timezone #500- Fix data corruption in json decoder f64-to-i64 cast #652 [arrow] (xianwill)
Documentation updates:
- Doctest for PrimitiveArray using from_iter_values. #694 [arrow] (novemberkilo)
- Doctests for BinaryArray and LargeBinaryArray. #625 [arrow] (novemberkilo)
- Add links in docstrings #605 [arrow] (alamb)
5.5.0 (2021-09-24)
Implemented enhancements:
Fixed bugs:
- Converting from string to timestamp uses microseconds instead of milliseconds #780
- Document has no link to
RowColumIter
#762 - length on slices with null doesn't work #744
5.4.0 (2021-09-10)
Implemented enhancements:
- Upgrade lexical-core to 0.8 #747
append_nulls
andappend_trusted_len_iter
for PrimitiveBuilder #725- Optimize MutableArrayData::extend for null buffers #397
Fixed bugs:
- Arithmetic with scalars doesn't work on slices #742
- Comparisons with scalar don't work on slices #740
unary
kernel doesn't respect offset #738new_null_array
creates invalid struct arrays #734- --no-default-features is broken for parquet #733 [parquet]
Bitmap::len
returns the number of bytes, not bits. #730- Decimal logical type is formatted incorrectly by print_schema #713
- parquet_derive does not support chrono time values #711
- Numeric overflow when formatting Decimal type #710
- The integration tests are not running #690
Closed issues:
- Question: Is there no way to create a DictionaryArray with a pre-arranged mapping? #729
5.3.0 (2021-08-26)
Implemented enhancements:
- Add optimized filter kernel for regular expression matching #697
- Can't cast from timestamp array to string array #587
Fixed bugs:
- 'Encoding DELTA_BYTE_ARRAY is not supported' with parquet arrow readers #708
- Support reading json string into binary data type. #701
Closed issues:
5.2.0 (2021-08-12)
Implemented enhancements:
- Make rand an optional dependency #671
- Remove undefined behavior in
value
method of boolean and primitive arrays #645 - Avoid materialization of indices in filter_record_batch for single arrays #636
- Add a note about arrow crate security / safety #627
- Allow the creation of String arrays from an interator of &Option<&str> #598
- Support arrow map datatype #395
Fixed bugs:
- Parquet fixed length byte array columns write byte array statistics #660 [parquet]
- Parquet boolean columns write Int32 statistics #659 [parquet]
- Writing Parquet with a boolean column fails #657
- JSON decoder data corruption for large i64/u64 #653
- Incorrect min/max statistics for strings in parquet files #641 [parquet]
Closed issues:
5.1.0 (2021-07-29)
Implemented enhancements:
- Make FFI_ArrowArray empty() public #602
- exponential sort can be used to speed up lexico partition kernel #586
- Implement sort() for binary array #568
- primitive sorting can be improved and more consistent with and without
limit
if sorted unstably #553
Fixed bugs:
- Confusing memory usage with CSV reader #623
- FFI implementation deviates from specification for array release #595
- Parquet file content is different if
~/.cargo
is in a git checkout #589 - Ensure output of MIRI is checked for success #581
- MIRI failure in
array::ffi::tests::test_struct
and other ffi tests #580 - ListArray equality check may return wrong result #570
- cargo audit failed #561
- ArrayData::slice() does not work for nested types such as StructArray #554
Documentation updates:
- More examples of how to construct Arrays #301
Closed issues:
5.0.0 (2021-07-14)
Breaking changes:
- Remove lifetime from DynComparator #543 [arrow]
- Simplify interactions with arrow flight APIs #376 [arrow-flight]
- refactor: remove lifetime from DynComparator #542 [arrow] (e-dard)
- use iterator for partition kernel instead of generating vec #438 [arrow] (Jimexist)
- Remove DictionaryArray::keys_array method #419 [arrow] (jhorstmann)
- simplify interactions with arrow flight APIs #377 [arrow-flight] (garyanaplan)
- return reference from DictionaryArray::values() (#313) #314 [arrow] (tustvold)
Implemented enhancements:
- Allow creation of StringArrays from Vec<String> #519 [arrow]
- Implement RecordBatch::concat #461 [arrow]
- Implement RecordBatch::slice() to slice RecordBatches #460 [arrow]
- Add a RecordBatch::split to split large batches into a set of smaller batches #343
- generate parquet schema from rust struct #539 [parquet] (nevi-me)
- Implement
RecordBatch::concat
#537 [arrow] (silathdiir) - Implement function slice for RecordBatch #490 [arrow] (b41sh)
- add lexicographically partition points and ranges #424 [arrow] (Jimexist)
- allow to read non-standard CSV #326 [arrow] (kazuk)
- parquet: Speed up
BitReader
/DeltaBitPackDecoder
#325 [parquet] (kornholi) - ARROW-12343: [Rust] Support auto-vectorization for min/max #9 [arrow] (Dandandan)
- ARROW-12411: [Rust] Create RecordBatches from Iterators #7 [arrow] (alamb)
Fixed bugs:
- Error building on master - error: cyclic package dependency: package
ahash v0.7.4
depends on itself. Cycle #544 - IPC reader panics with out of bounds error #541
- Take kernel doesn't handle nulls and structs correctly #530 [arrow]
- master fails to compile with
default-features=false
#529 - README developer instructions out of date #523
- Update rustc and packed_simd in CI before 5.0 release #517
- Incorrect memory usage calculation for dictionary arrays #503 [arrow]
- sliced null buffers lead to incorrect result in take kernel (and probably on other places) #502
- Cast of utf8 types and list container types don't respect offset #334 [arrow]
- fix take kernel null handling on structs #531 [arrow] (bjchambers)
- Correct array memory usage calculation for dictionary arrays #505 [arrow] (jhorstmann)
- parquet: improve BOOLEAN writing logic and report error on encoding fail #443 [parquet] (garyanaplan)
- Fix bug with null buffer offset in boolean not kernel #418 [arrow] (jhorstmann)
- respect offset in utf8 and list casts #335 [arrow] (ritchie46)
- Fix comparison of dictionaries with different values arrays (#332) #333 [arrow] (tustvold)
- ensure null-counts are written for all-null columns #307 [parquet] (crepererum)
- fix invalid null handling in filter #296 [arrow] (ritchie46)
- fix NaN handling in parquet statistics #256 (crepererum)
Documentation updates:
- Improve arrow's crate's readme on crates.io #463
- Clean up README.md in advance of the 5.0 release #536 [arrow] [arrow-flight] [parquet] (alamb)
- fix readme instructions to reflect new structure #524 (marcvanheerden)
- Improve docs for NullArray, new_null_array and new_empty_array #240 [arrow] (alamb)
Merged pull requests:
- Fix default arrow build #533 [arrow] (alamb)
- Add tests for building applications using arrow with different feature flags #532 [arrow] (alamb)
- Remove unused futures dependency from arrow-flight #528 [arrow-flight] (alamb)
- CI: update rust nightly and packed_simd #525 [arrow] (ritchie46)
- Support
StringArray
creation from String Vec #522 [arrow] (silathdiir) - Fix parquet benchmark schema #513 [parquet] (nevi-me)
- Fix parquet definition levels #511 [parquet] (nevi-me)
- Fix for primitive and boolean take kernel for nullable indices with an offset #509 [arrow] (jhorstmann)
- Bump flatbuffers #499 [arrow] (PsiACE)
- implement second/minute helpers for temporal #493 [arrow] (ovr)
- special case concatenating single element array shortcut #492 [arrow] (Jimexist)
- update docs to reflect recent changes (joins and window functions) #489 (Jimexist)
- Update rand, proc-macro and zstd dependencies #488 [arrow] [arrow-flight] [parquet] (alamb)
- Doctest for GenericListArray. #474 [arrow] (novemberkilo)
- remove stale comment on
ArrayData
equality and update unit tests #472 (Jimexist) - remove unused patch file #471 (Jimexist)
- fix clippy warnings for rust 1.53 #470 (Jimexist)
- Fix PR labeler #468 (Dandandan)
- Tweak dev backporting docs #466 (alamb)
- Unvendor Archery #459 (kszucs)
- Add sort boolean benchmark #457 (alamb)
- Add C data interface for decimal128 and timestamp #453 [arrow] (alippai)
- Implement the Iterator trait for the json Reader. #451 [arrow] (LaurentMazare)
- Update release docs + release email template #450 (alamb)
- remove clippy unnecessary wraps suppresions in cast kernel #449 (Jimexist)
- Use partition for bool sort #448 (Jimexist)
- remove unnecessary wraps in sort #445 (Jimexist)
- Python FFI bridge for Schema, Field and DataType #439 [arrow] (kszucs)
- Update release Readme.md #436 (alamb)
- Derive Eq and PartialEq for SortOptions #425 (tustvold)
- refactor lexico sort for future code reuse #423 (Jimexist)
- Reenable MIRI check on PRs #421 (alamb)
- Sort by float lists #420 (medwards)
- Fix out of bounds read in bit chunk iterator #416 (jhorstmann)
- Doctests for DecimalArray. #414 (novemberkilo)
- Add Decimal to CsvWriter and improve debug display #406 (alippai)
- MINOR: update install instruction #400 (alippai)
- use prettier to auto format md files #398 (Jimexist)
- window::shift to work for all array types #388 (Jimexist)
- add more tests for window::shift and handle boundary cases #386 (Jimexist)
- Implement faster arrow array reader #384 (yordan-pavlov)
- Add set_bit to BooleanBufferBuilder to allow mutating bit in index #383 (boazberman)
- make sure that only concat preallocates buffers #382 (ritchie46)
- Respect max rowgroup size in Arrow writer #381 [parquet] (nevi-me)
- Fix typo in release script, update release location #380 (alamb)
- Doctests for FixedSizeBinaryArray #378 (novemberkilo)
- Simplify shift kernel using new_null_array #370 (Dandandan)
- allow
SliceableCursor
to be constructed from anArc
directly #369 (crepererum) - Add doctest for ArrayBuilder #367 (alippai)
- Fix version in readme #365 (domoritz)
- Remove superfluous space #363 (domoritz)
- Add crate badges #362 (domoritz)
- Disable MIRI check until it runs cleanly on CI #360 (alamb)
- Only register Flight.proto with cargo if it exists #351 (tustvold)
- Reduce memory usage of concat (large)utf8 #348 (ritchie46)
- Fix filter UB and add fast path #341 (ritchie46)
- Automatic cherry-pick script #339 (alamb)
- Doctests for BooleanArray. #338 (novemberkilo)
- feature gate ipc reader/writer #336 (ritchie46)
- Add ported Rust release verification script #331 (wesm)
- Doctests for StringArray and LargeStringArray. #330 (novemberkilo)
- inline PrimitiveArray::value #329 (ritchie46)
- Enable wasm32 as a target architecture for the SIMD feature #324 (roee88)
- Fix undefined behavior in FFI and enable MIRI checks on CI #323 (roee88)
- Mutablebuffer::shrink_to_fit #318 [arrow] (ritchie46)
- Add (simd) modulus op #317 (gangliao)
- feature gate csv functionality #312 [arrow] (ritchie46)
- [Minor] Version upgrades #304 (Dandandan)
- Remove old release scripts #293 (alamb)
- Add Send to the ArrayBuilder trait #291 (Max-Meldrum)
- Added changelog generator script and configuration. #289 (jorgecarleitao)
- manually bump development version #288 (nevi-me)
- Fix FFI and add support for Struct type #287 (roee88)
- Fix subtraction underflow when sorting string arrays with many nulls #285 (medwards)
- Speed up bound checking in
take
#281 (Dandandan) - Update PR template by commenting out instructions #278 (nevi-me)
- Added Decimal support to pretty-print display utility (#230) #273 (mgill25)
- Fix null struct and list roundtrip #270 (nevi-me)
- 1.52 clippy fixes #267 (nevi-me)
- Fix typo in csv/reader.rs #265 (domoritz)
- Fix empty Schema::metadata deserialization error #260 (hulunbier)
- update datafusion and ballista doc links #259 (Jimexist)
- support full u32 and u64 roundtrip through parquet #258 [parquet] (crepererum)
- [MINOR] Added env to run rust in integration. #253 (jorgecarleitao)
- [Minor] Made integration tests always run. #248 (jorgecarleitao)
- fix parquet max_definition for non-null structs #246 (nevi-me)
- Disabled rebase needed until demonstrate working. #243 (jorgecarleitao)
- pin flatbuffers to 0.8.4 #239 (ritchie46)
- sort_primitive result is capped to the min of limit or values.len #236 (medwards)
- Read list field correctly #234 [parquet] (nevi-me)
- Fix code examples for RecordBatch::try_from_iter #231 (alamb)
- Support string dictionaries in csv reader (#228) #229 (tustvold)
- support LargeUtf8 in sort kernel #26 (ritchie46)
- Removed unused files #22 (jorgecarleitao)
- ARROW-12504: Buffer::from_slice_ref set correct capacity #18 [arrow] (tustvold)
- Add GitHub templates #17 (andygrove)
- ARROW-12493: Add support for writing dictionary arrays to CSV and JSON #16 [arrow] (tustvold)
- ARROW-12426: [Rust] Fix concatentation of arrow dictionaries #15 [arrow] (tustvold)
- Update repository and homepage urls #14 [arrow] [arrow-flight] [parquet] (Dandandan)
- Added rebase-needed bot #13 (jorgecarleitao)
- Added Integration tests against arrow #10 (jorgecarleitao)
4.4.0 (2021-06-24)
Breaking changes:
- migrate partition kernel to use Iterator trait #437 [arrow]
- Remove DictionaryArray::keys_array #391 [arrow]
Implemented enhancements:
- sort kernel boolean sort can be O(n) #447 [arrow]
- C data interface for decimal128, timestamp, date32 and date64 #413
- Add Decimal to CsvWriter #405
- Use iterators to increase performance of creating Arrow arrays #200 [parquet]
Fixed bugs:
- Release Audit Tool (RAT) is not being triggered #481
- Security Vulnerabilities: flatbuffers:
read_scalar
andread_scalar_at
allow transmuting values withoutunsafe
blocks #476 - Clippy broken after upgrade to rust 1.53 #467
- Pull Request Labeler is not working #462
- Arrow 4.3 release: error[E0658]: use of unstable library feature 'partition_point': new API #456
- parquet reading hangs when row_group contains more than 2048 rows of data #349
- Fail to build arrow #247
- JSON reader does not implement iterator #193 [arrow]
Security fixes:
- Ensure a successful MIRI Run on CI #227
Closed issues:
- sort kernel has a lot of unnecessary wrapping #446
- [Parquet] Plain encoded boolean column chunks limited to 2048 values #48 [parquet]
4.3.0 (2021-06-10)
Implemented enhancements:
- Add partitioning kernel for sorted arrays #428 [arrow]
- Implement sort by float lists #427 [arrow]
- Derive Eq and PartialEq for SortOptions #426 [arrow]
- use prettier and github action to normalize markdown document syntax #399
- window::shift can work for more than just primitive array type #392
- Doctest for ArrayBuilder #366
Fixed bugs:
- Boolean
not
kernel does not take offset of null buffer into account #417 - my contribution not marged in 4.2 release #394
- window::shift shall properly handle boundary cases #387
- Parquet
WriterProperties.max_row_group_size
not wired up #257 - Out of bound reads in chunk iterator #198 [arrow]
4.2.0 (2021-05-29)
Breaking changes:
Implemented enhancements:
- Simplify shift kernel using null array #371
- Provide
Arc
-based constructor forparquet::util::cursor::SliceableCursor
#368 - Add badges to crates #361
- Consider inlining PrimitiveArray::value #328
- Implement automated release verification script #327
- Add wasm32 to the list of target architectures of the simd feature #316
- add with_escape for csv::ReaderBuilder #315 [arrow]
- IPC feature gate #310
- csv feature gate #309 [arrow]
- Add
shrink_to
/shrink_to_fit
toMutableBuffer
#297
Fixed bugs:
- Incorrect crate setup instructions #364
- Arrow-flight only register rerun-if-changed if file exists #350
- Dictionary Comparison Uses Wrong Values Array #332
- Undefined behavior in FFI implementation #322
- All-null column get wrong parquet null-counts #306 [parquet]
- Filter has inconsistent null handling #295
4.1.0 (2021-05-17)
Implemented enhancements:
- Add Send to ArrayBuilder #290 [arrow]
- Improve performance of bound checking option #280 [arrow]
- extend compute kernel arity to include nullary functions #276
- Implement FFI / CDataInterface for Struct Arrays #251 [arrow]
- Add support for pretty-printing Decimal numbers #230 [arrow]
- CSV Reader String Dictionary Support #228 [arrow]
- Add Builder interface for adding Arrays to record batches #210 [arrow]
- Support auto-vectorization for min/max #209 [arrow]
- Support LargeUtf8 in sort kernel #25 [arrow]
Fixed bugs:
- no method named
select_nth_unstable_by
found for mutable reference&mut [T]
#283 - Rust 1.52 Clippy error #266
- NaNs can break parquet statistics #255 [parquet]
- u64::MAX does not roundtrip through parquet #254 [parquet]
- Integration tests failing to compile (flatbuffer) #249 [arrow]
- Fix compatibility quirks between arrow and parquet structs #245 [parquet]
- Unable to write non-null Arrow structs to Parquet #244 [parquet]
- schema: missing field
metadata
when deserialize #241 [arrow] - Arrow does not compile due to flatbuffers upgrade #238 [arrow]
- Sort with limit panics for the limit includes some but not all nulls, for large arrays #235 [arrow]
- arrow-rs contains a copy of the "format" directory #233 [arrow]
- Fix SEGFAULT/ SIGILL in child-data ffi #206 [arrow]
- Read list field correctly in <struct<list>> #167 [parquet]
- FFI listarray lead to undefined behavior. #20
Security fixes:
Documentation updates:
- Comment out the instructions in the PR template #277
- Update links to datafusion and ballista in README.md #19
- Update "repository" in Cargo.toml #12
Closed issues:
- Arrow Aligned Vec #268
- [Rust]: Tracking issue for AVX-512 #220 [arrow]
- Umbrella issue for clippy integration #217 [arrow]
- Support sort #215 [arrow]
- Support stable Rust #214 [arrow]
- Remove Rust and point integration tests to arrow-rs repo #211 [arrow]
- ArrayData buffers are inconsistent accross implementations #207
- 3.0.1 patch release #204
- Document patch release process #202
- Simplify Offset #186 [arrow]
- Typed Bytes #185 [arrow]
- [CI]docker-compose setup should enable caching #175
- Improve take primitive performance #174
- [CI] Try out buildkite #165 [arrow]
- Update assignees in JIRA where missing #160
- [Rust]: From<ArrayDataRef> implementations should validate data type #103 [arrow]
- [DataFusion] Verify that projection push down does not remove aliases columns #99 [arrow]
- [Rust][DataFusion] Implement modulus expression #98 [arrow]
- [DataFusion] Add constant folding to expressions during logically planning #96 [arrow]
- [DataFusion] DataFrame.collect should return RecordBatchReader #95 [arrow]
- [Rust][DataFusion] Add FORMAT to explain plan and an easy to visualize format #94 [arrow]
- [DataFusion] Implement metrics framework #90 [arrow]
- [DataFusion] Implement micro benchmarks for each operator #89 [arrow]
- [DataFusion] Implement pretty print for physical query plan #88 [arrow]
- [Archery] Support rust clippy in the lint command #83
- [rust][datafusion] optimize count(*) queries on parquet sources #75 [arrow]
- [Rust][DataFusion] Improve like/nlike performance #71 [arrow]
- [DataFusion] Implement optimizer rule to remove redundant projections #56 [arrow]
- [DataFusion] Parquet data source does not support complex types #39 [arrow]
- Merge utils from Parquet and Arrow #32 [arrow] [parquet]
- Add benchmarks for Parquet #30 [parquet]
- Mark methods that do not perform bounds checking as unsafe #28 [arrow]
- Test issue #24 [arrow]
- This is a test issue #11
For older versions, see apache/arrow/CHANGELOG.md
* This Changelog was automatically generated by github_changelog_generator