From 768fa5ebae8ef22544df830d7e84ed3185c01855 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Mon, 1 Aug 2022 17:33:23 +0200 Subject: [PATCH 01/22] Version 9.0.0 post skeleton --- _posts/2022-08-01-9.0.0-release.md | 79 ++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) create mode 100644 _posts/2022-08-01-9.0.0-release.md diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md new file mode 100644 index 000000000000..f40d51e25285 --- /dev/null +++ b/_posts/2022-08-01-9.0.0-release.md @@ -0,0 +1,79 @@ +--- +layout: post +title: "Apache Arrow 9.0.0 Release" +date: "2022-08-01 00:00:00" +author: pmc +categories: [release] +--- + + + +The Apache Arrow team is pleased to announce the 9.0.0 release. This covers +over 3 months of development work and includes [**473 resolved issues**][1] +from [**YYY distinct contributors**][2]. See the Install Page to learn how to +get the libraries for your platform. + +The release notes below are not exhaustive and only expose selected highlights +of the release. Many other bugfixes and improvements have been made: we refer +you to the [complete changelog][3]. + +## Community + +Since the 8.0.0 release, Dewey Dunnington, Alenka Frim and Rok Mihevc +have been invited to be committers. +Thanks for your contributions and participation in the project! + +## Columnar Format Notes + +## Arrow Flight RPC notes + +## C++ notes + +## C# notes + +## Go notes + +## Java notes + +## JavaScript notes + +## Python notes + +## R notes + +For more on what’s in the 9.0.0 R package, see the [R changelog][4]. + +## Ruby and C GLib notes + +### Ruby + +### C GLib + +## Rust notes + +The Rust projects have moved to separate repositories outside the +main Arrow monorepo. For notes on the 19.0.0 release of the Rust +implementation, see the [Arrow Rust changelog][5]. + +[1]: https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%209.0.0 +[2]: {{ site.baseurl }}/release/9.0.0.html#contributors +[3]: {{ site.baseurl }}/release/9.0.0.html#changelog +[4]: {{ site.baseurl }}/docs/r/news/ +[5]: https://github.com/apache/arrow-rs/blob/19.0.0/CHANGELOG.md From 1d25e3344dfc4cd609b69abc499944d8cbf54ebc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Tue, 2 Aug 2022 10:57:32 +0200 Subject: [PATCH 02/22] Update _posts/2022-08-01-9.0.0-release.md Co-authored-by: David Li --- _posts/2022-08-01-9.0.0-release.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index f40d51e25285..5f11ba0c6128 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -44,6 +44,10 @@ Thanks for your contributions and participation in the project! ## Arrow Flight RPC notes +Arrow Flight is now available in MacOS M1 Python wheels ([ARROW-16779](https://issues.apache.org/jira/browse/ARROW-16779)). +Arrow Flight SQL is now buildable on Windows ([ARROW-16902](https://issues.apache.org/jira/browse/ARROW-16902)). +Ruby now exposes more of the Flight and Flight SQL APIs (various JIRAs). + ## C++ notes ## C# notes From 2c6fad19d4e5d0c450d27d1901d56c4115fc931e Mon Sep 17 00:00:00 2001 From: Eric Erhardt Date: Tue, 2 Aug 2022 11:00:39 -0500 Subject: [PATCH 03/22] Add C# notes for 9.0.0 --- _posts/2022-08-01-9.0.0-release.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index 5f11ba0c6128..d8a62d25b110 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -52,6 +52,10 @@ Ruby now exposes more of the Flight and Flight SQL APIs (various JIRAs). ## C# notes +#### New Features + +* Added support for Add support for Time32Array and Time64Array ([ARROW-16660](https://github.com/apache/arrow/pull/13279)) + ## Go notes ## Java notes From f6ba42eb4abdc2e7dc0bf20bda5279511ffb1ad5 Mon Sep 17 00:00:00 2001 From: Antoine Pitrou Date: Tue, 2 Aug 2022 18:27:32 +0200 Subject: [PATCH 04/22] Add C++ notes. --- _posts/2022-08-01-9.0.0-release.md | 77 ++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index 5f11ba0c6128..ad46a9d3ccdf 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -50,6 +50,83 @@ Ruby now exposes more of the Flight and Flight SQL APIs (various JIRAs). ## C++ notes +STL-like iteration is now provided over chunked arrays (ARROW-602). + +### Compute + +The C++ compute and execution engine is now officially named "Acero", though +its C++ namespaces have not changed. + +New light-weight data holder abstractions have been introduced in order +to reduce the overhead of invoking compute functions and kernels, especially +at the small data sizes desirable for efficient parallelization (typically +L1- or L2-sized). Specifically, the non-owning `ArraySpan` and `ExecSpan` +structures have internally superseded the much heavier `ExecBatch`, which +is still supported for compatibility at the API level +(ARROW-16756, ARROW-16824, ARROW-16852). + +In a similar vein, the `ValueDescr` class was removed and `ScalarKernel` +implementations now always receive at least one non-scalar input, removing +the special case where a `ScalarKernel` needs to output a scalar rather than +an array. The higher-level compute APIs still allow executing a scalar function +over all-scalar inputs; but those scalars are internally broadcasted to +1-element arrays so as to simplify kernel implementation (ARROW-16757). + +Timestamp comparison is now supported (ARROW-16425). + +A cumulative sum function is implemented over numeric inputs (ARROW-13530). + +Temporal rounding functions received additional options to control how +rounding is done (ARROW-14821). + +Improper computation of the "mode" function on boolean input was fixed +(ARROW-17096). + +Function registries can now be nested (ARROW-16677). + +### Dataset + +The `autogenerate_column_names` option for CSV reading is now handled correctly +(ARROW-16436). + +Fix `InMemoryDataset::ReplaceSchema` to actually replace the schema +(ARROW-16085). + +Fix `FilenamePartitioning` to properly support null values (ARROW-16302). + +### Filesystem + +A number of bugfixes and improvements were made to the Google Cloud Storage +filesystem implementation (ARROW-14892). + +By default, the S3 filesystem implementation does not create or drop buckets +anymore (ARROW-15906). This is a compatibility-breaking change intended +to prevent user errors from having potentially catastrophic consequences. +Options have been added to restore the previous behavior if necessary. + +### Parquet + +The default Parquet version is now 2.4 for writing, enabling use of +more recent logical types by default (ARROW-12203). + +Non-nullable fields are now handled correctly by the Parquet reader +(ARROW-16116). + +Reading encrypted files should now be thread-safe (ARROW-14114). + +Statistics equality now works correctly with minmax (ARROW-16487). + +The minimum Thrift version required for building is now 0.13 (ARROW-16721). + +The Thrift deserialization limits can now be configured to accomodate for +data files with very large metadata (ARROW-16546). + +### Substrait + +The Substrait spec has been updated to 0.6.0 (ARROW-16816). In addition, a +larger subset of the Substrait specification is now supported (ARROW-15587, +ARROW-15590, ARROW-15901, ARROW-16657, ARROW-15591). + ## C# notes ## Go notes From 0aab865eb92d2336d4cf666074a754c921c59e07 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Wed, 3 Aug 2022 12:19:57 +0200 Subject: [PATCH 05/22] Update _posts/2022-08-01-9.0.0-release.md Co-authored-by: Matt Topol --- _posts/2022-08-01-9.0.0-release.md | 48 ++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index 9893084acae3..e4cad8cda2c2 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -135,6 +135,54 @@ ARROW-15590, ARROW-15901, ARROW-16657, ARROW-15591). ## Go notes +### Security + +* Updated testify dependency to address CVE-2022-28948. ([ARROW-16759](https://issues.apache.org/jira/browse/ARROW-16759)) (This was also backported to previous versions and released as patch versions v6.0.2, v7.0.1, and v8.0.1) + +### Arrow + +#### New Features + +* Dictionary Scalars are now available ([ARROW-16323](https://issues.apache.org/jira/browse/ARROW-16323)) +* Introduced a DictionaryUnifier object along with functions for unifying Chunked Arrays and Tables ([ARROW-16324](https://issues.apache.org/jira/browse/ARROW-16324)) +* New CSV examples added to documentation to demonstrate error handling ([ARROW-16450](https://issues.apache.org/jira/browse/ARROW-16450)) +* CSV Reader now supports arrow.TimestampType ([ARROW-16504](https://issues.apache.org/jira/browse/ARROW-16504)) +* JSON parsing for Temporal Types now allow passing numeric values in addition to strings for parsing. Timezones will be properly parsed if they exist in the string and a function was added to retrieve a time.Location object from a TimestampType ([ARROW-16551](https://issues.apache.org/jira/browse/ARROW-16551)) +* New utilities added to decimal128 for rescaling and easy conversion to and from float32/float64 ([ARROW-16552](https://issues.apache.org/jira/browse/ARROW-16552)) +* Arrow DataType interface now has a LayoutMethod which returns the physical layout of the given datatype such as the number of buffers, types, etc. This matches the behavior of the layout() methods in C++ for data types. ([ARROW-16556](https://issues.apache.org/jira/browse/ARROW-16556)) +* Added a SliceBuffer function to the memory package to allow better re-using of memory across buffer objects ([ARROW-16557](https://issues.apache.org/jira/browse/ARROW-16557)) +* Dictionary Arrays can now be concatenated using array.Concatenate ([ARROW-17095](https://issues.apache.org/jira/browse/ARROW-17095)) + +#### Bug Fixes + +* ipc.FileReader now properly uses the memory.Allocator interface ([ARROW-16002](https://issues.apache.org/jira/browse/ARROW-16002)) +* Addressed issue with Integration tests between Go and Java ([ARROW-16441](https://issues.apache.org/jira/browse/ARROW-16441)) +* RecordBuilder.UnmarshalJSON now properly ignores extra unknown fields rather than panic'ing ([ARROW-16456](https://issues.apache.org/jira/browse/ARROW-16456)) +* StructBuilder.UnmarshalJSON will no longer fail and panic when Nullable fields are missing ([ARROW-16502](https://issues.apache.org/jira/browse/ARROW-16502)) +* ipc.Reader no longer silently accepts string columns with invalid offsets, preventing unexpected panics later when writing or accessing the resulting arrays. ([ARROW-16831](https://issues.apache.org/jira/browse/ARROW-16831)) +* Arrow CSV reader no longer clobbers its reported errors and properly surfaces them ([ARROW-16926](https://issues.apache.org/jira/browse/ARROW-16926)) + +### Parquet + +#### New Features + +* The CreatedBy version string for the Parquet writer will now correctly reflect the library version, and will be updated by the release scripts ([ARROW-16484](https://issues.apache.org/jira/browse/ARROW-16484)) +* Parquet bit_packing functions now have ARM64 NEON implementations for performance ([ARROW-16486](https://issues.apache.org/jira/browse/ARROW-16486)) +* It is now possible to customize the root node in the Parquet writer instead of hardcoding it to be named "schema" with a repetition type of Repeated. This was needed to allow producing files similar to spark where the root node has a repetition type of Required. It still defaults to the spec definition of Repeated. ([ARROW-16561](https://issues.apache.org/jira/browse/ARROW-16561)) +* parquet_reader CLI mainprog has been enhanced to dump values out as JSON and CSV along with setting an output file instead of just dumping to the terminal. ([ARROW-16934](https://issues.apache.org/jira/browse/ARROW-16934)) + +#### Bug Fixes + +* Fixed a memory leak with Parquet page reading ([ARROW-16473](https://issues.apache.org/jira/browse/ARROW-16473)) +* Parquet Reader properly parallelizes column reads when the parallel option is set to true. ([ARROW-16530](https://issues.apache.org/jira/browse/ARROW-16530)) +* Fixed bug in the Bool decoder for plain encoding ([ARROW-16563](https://issues.apache.org/jira/browse/ARROW-16563)) +* Fixed a bug in the Parquet bool column reader where it failed to properly skip rows ([ARROW-16638](https://issues.apache.org/jira/browse/ARROW-16638)) +* Fixed the flakey travis ARM64 builds by reducing the size of a test case in the pqarrow unit tests to reduce the memory usage for the tests. ([ARROW-16669](https://issues.apache.org/jira/browse/ARROW-16669)) +* Parquet writer now properly handles writing arrow.NULL type arrays ([ARROW-16749](https://issues.apache.org/jira/browse/ARROW-16749)) +* Column level dictionary encoding configuration for Parquet writing now correctly respects the input value ([ARROW-16813](https://issues.apache.org/jira/browse/ARROW-16813)) +* Memory leak in DeltaByteArray encoding fixed ([ARROW-16983](https://issues.apache.org/jira/browse/ARROW-16983)) + + ## Java notes ## JavaScript notes From b3cca5a3e22af9243fa87d338aaf88513ae9eb62 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Wed, 3 Aug 2022 12:20:15 +0200 Subject: [PATCH 06/22] Update _posts/2022-08-01-9.0.0-release.md Co-authored-by: Dominik Moritz --- _posts/2022-08-01-9.0.0-release.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index e4cad8cda2c2..783056dce4b2 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -187,6 +187,9 @@ ARROW-15590, ARROW-15901, ARROW-16657, ARROW-15591). ## JavaScript notes +* [ARROW-16371: [JS] Fix error iterating tables with no batches](https://github.com/apache/arrow/pull/13287) +* [ARROW-16704: [JS] Handle case where `tableFromIPC` input is an async `RecordBatchReader`](https://github.com/apache/arrow/pull/13278) + ## Python notes ## R notes From b3ce1c34308650da5e73ce60ede73eb56ff7bc76 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Wed, 3 Aug 2022 12:20:59 +0200 Subject: [PATCH 07/22] Update _posts/2022-08-01-9.0.0-release.md Co-authored-by: Larry White --- _posts/2022-08-01-9.0.0-release.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index 783056dce4b2..132cd81df99f 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -184,7 +184,26 @@ ARROW-15590, ARROW-15901, ARROW-16657, ARROW-15591). ## Java notes +#### New Features +* Allow overriding column nullability in arrow-jdbc ([#13558](https://github.com/apache/arrow/pull/13558)) +* Enable skip BOUNDS_CHECKING with setBytes and getBytes of ArrowBuf ([#13161](https://github.com/apache/arrow/pull/13161)) +* Initialize JNI components on use instead of statically ([#13146](https://github.com/apache/arrow/pull/13146)) +* Provide explicit JDBC column type mapping ([#13166](https://github.com/apache/arrow/pull/13166)) +* Allow duplicated field names in Java C data interface ([#13247](https://github.com/apache/arrow/pull/13247)) +* Improve and document StackTrace ([#12656](https://github.com/apache/arrow/pull/12656)) +* Keep more context when marshaling errors through JNI ([#13246](https://github.com/apache/arrow/pull/13246)) +* Make RoundingMode configurable to handle inconsistent scale in BigDecimals ([#13433](https://github.com/apache/arrow/pull/13433)) +* Improve Java dev experience with IntelliJ ([#13017](https://github.com/apache/arrow/pull/13017)) +* Implement ArrowArrayStream ([#13465](https://github.com/apache/arrow/pull/13465))) +#### Bug Fixes +* Fix variable-width vectors in integration JSON writer ([#13676](https://github.com/apache/arrow/pull/13676)) +* Handle empty JDBC ResultSet ([#13049](https://github.com/apache/arrow/pull/13049)) +* Fix hasNext() in ArrowVectorIterator ([#13107](https://github.com/apache/arrow/pull/13107)) +* Fix ArrayConsumer when using ArrowVectorIterator ([#12692](https://github.com/apache/arrow/pull/12692)) +* Update Gandiva Protobuf library to enable builds on Apple M1 ([#13121](https://github.com/apache/arrow/pull/13121)) +* Patch dataset module testing failure with JSE11+ ([#13200](https://github.com/apache/arrow/pull/13200)) +* Don't duplicate generated Protobuf classes between flight-core and flight-sql ([#13596](https://github.com/apache/arrow/pull/13596)) ## JavaScript notes * [ARROW-16371: [JS] Fix error iterating tables with no batches](https://github.com/apache/arrow/pull/13287) From d9b073c477d35476abaa6867e07f524caff06cea Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Wed, 3 Aug 2022 13:04:41 +0200 Subject: [PATCH 08/22] Add JIRA links to JIRA issues --- _posts/2022-08-01-9.0.0-release.md | 44 +++++++++++++++--------------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index 132cd81df99f..3fe83ae77a8e 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -50,7 +50,7 @@ Ruby now exposes more of the Flight and Flight SQL APIs (various JIRAs). ## C++ notes -STL-like iteration is now provided over chunked arrays (ARROW-602). +STL-like iteration is now provided over chunked arrays ([ARROW-602](https://issues.apache.org/jira/browse/ARROW-602)). ### Compute @@ -63,69 +63,69 @@ at the small data sizes desirable for efficient parallelization (typically L1- or L2-sized). Specifically, the non-owning `ArraySpan` and `ExecSpan` structures have internally superseded the much heavier `ExecBatch`, which is still supported for compatibility at the API level -(ARROW-16756, ARROW-16824, ARROW-16852). +([ARROW-16756](https://issues.apache.org/jira/browse/ARROW-16756), [ARROW-16824](https://issues.apache.org/jira/browse/ARROW-16824), [ARROW-16852](https://issues.apache.org/jira/browse/ARROW-16852)). In a similar vein, the `ValueDescr` class was removed and `ScalarKernel` implementations now always receive at least one non-scalar input, removing the special case where a `ScalarKernel` needs to output a scalar rather than an array. The higher-level compute APIs still allow executing a scalar function over all-scalar inputs; but those scalars are internally broadcasted to -1-element arrays so as to simplify kernel implementation (ARROW-16757). +1-element arrays so as to simplify kernel implementation ([ARROW-16757](https://issues.apache.org/jira/browse/ARROW-16757)). -Timestamp comparison is now supported (ARROW-16425). +Timestamp comparison is now supported ([ARROW-16425](https://issues.apache.org/jira/browse/ARROW-16425)). -A cumulative sum function is implemented over numeric inputs (ARROW-13530). +A cumulative sum function is implemented over numeric inputs ([ARROW-13530](https://issues.apache.org/jira/browse/ARROW-13530)). Temporal rounding functions received additional options to control how -rounding is done (ARROW-14821). +rounding is done ([ARROW-14821](https://issues.apache.org/jira/browse/ARROW-14821)). Improper computation of the "mode" function on boolean input was fixed -(ARROW-17096). +([ARROW-17096](https://issues.apache.org/jira/browse/ARROW-17096)). -Function registries can now be nested (ARROW-16677). +Function registries can now be nested ([ARROW-16677](https://issues.apache.org/jira/browse/ARROW-16677)). ### Dataset The `autogenerate_column_names` option for CSV reading is now handled correctly -(ARROW-16436). +([ARROW-16436](https://issues.apache.org/jira/browse/ARROW-16436)). Fix `InMemoryDataset::ReplaceSchema` to actually replace the schema -(ARROW-16085). +([ARROW-16085](https://issues.apache.org/jira/browse/ARROW-16085)). -Fix `FilenamePartitioning` to properly support null values (ARROW-16302). +Fix `FilenamePartitioning` to properly support null values ([ARROW-16302](https://issues.apache.org/jira/browse/ARROW-16302)). ### Filesystem A number of bugfixes and improvements were made to the Google Cloud Storage -filesystem implementation (ARROW-14892). +filesystem implementation ([ARROW-14892](https://issues.apache.org/jira/browse/ARROW-14892)). By default, the S3 filesystem implementation does not create or drop buckets -anymore (ARROW-15906). This is a compatibility-breaking change intended +anymore ([ARROW-15906](https://issues.apache.org/jira/browse/ARROW-15906)). This is a compatibility-breaking change intended to prevent user errors from having potentially catastrophic consequences. Options have been added to restore the previous behavior if necessary. ### Parquet The default Parquet version is now 2.4 for writing, enabling use of -more recent logical types by default (ARROW-12203). +more recent logical types by default ([ARROW-12203](https://issues.apache.org/jira/browse/ARROW-12203)). Non-nullable fields are now handled correctly by the Parquet reader -(ARROW-16116). +([ARROW-16116](https://issues.apache.org/jira/browse/ARROW-16116)). -Reading encrypted files should now be thread-safe (ARROW-14114). +Reading encrypted files should now be thread-safe ([ARROW-14114](https://issues.apache.org/jira/browse/ARROW-14114)). -Statistics equality now works correctly with minmax (ARROW-16487). +Statistics equality now works correctly with minmax ([ARROW-16487](https://issues.apache.org/jira/browse/ARROW-16487)). -The minimum Thrift version required for building is now 0.13 (ARROW-16721). +The minimum Thrift version required for building is now 0.13 ([ARROW-16721](https://issues.apache.org/jira/browse/ARROW-16721)). The Thrift deserialization limits can now be configured to accomodate for -data files with very large metadata (ARROW-16546). +data files with very large metadata ([ARROW-16546](https://issues.apache.org/jira/browse/ARROW-16546)). ### Substrait -The Substrait spec has been updated to 0.6.0 (ARROW-16816). In addition, a -larger subset of the Substrait specification is now supported (ARROW-15587, -ARROW-15590, ARROW-15901, ARROW-16657, ARROW-15591). +The Substrait spec has been updated to 0.6.0 ([ARROW-16816](https://issues.apache.org/jira/browse/ARROW-16816)). In addition, a +larger subset of the Substrait specification is now supported ([ARROW-15587](https://issues.apache.org/jira/browse/ARROW-15587), +[ARROW-15590](https://issues.apache.org/jira/browse/ARROW-15590), ARROW-15901, ARROW-16657, ARROW-15591). ## C# notes From 095823c22e93a0c392bfac714acd71ebab3c5c00 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Wed, 3 Aug 2022 15:53:14 +0200 Subject: [PATCH 09/22] Apply suggestions from code review Co-authored-by: Neal Richardson --- _posts/2022-08-01-9.0.0-release.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index 3fe83ae77a8e..155a48daf846 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -129,8 +129,6 @@ larger subset of the Substrait specification is now supported ([ARROW-15587](htt ## C# notes -#### New Features - * Added support for Add support for Time32Array and Time64Array ([ARROW-16660](https://github.com/apache/arrow/pull/13279)) ## Go notes @@ -213,6 +211,8 @@ larger subset of the Substrait specification is now supported ([ARROW-15587](htt ## R notes +Highlights include several new `dplyr` verbs, including `glimpse()` and `union_all()`, as well as many more datetime functions from `lubridate`. There is also experimental support for user-defined scalar functions in the query engine, and most packages include native support for datasets in Google Cloud Storage (opt-in in the Linux full source build). + For more on what’s in the 9.0.0 R package, see the [R changelog][4]. ## Ruby and C GLib notes From cc26082e085e519f664429d40240272a4f82c375 Mon Sep 17 00:00:00 2001 From: Eric Erhardt Date: Wed, 3 Aug 2022 09:33:20 -0500 Subject: [PATCH 10/22] Add bug fixes for C# --- _posts/2022-08-01-9.0.0-release.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index 155a48daf846..8db350165c2a 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -129,7 +129,14 @@ larger subset of the Substrait specification is now supported ([ARROW-15587](htt ## C# notes -* Added support for Add support for Time32Array and Time64Array ([ARROW-16660](https://github.com/apache/arrow/pull/13279)) +#### New Features + +* Added support for Time32Array and Time64Array ([ARROW-16660](https://github.com/apache/arrow/pull/13279)) + +#### Bug Fixes + +* When using TableFromRecordBatches, the resulting table columns have no data array. ([ARROW-13129](https://github.com/apache/arrow/pull/10562)) +* Fix intermittent test failures due to async memory management bug. ([ARROW-16978](https://github.com/apache/arrow/pull/13573)) ## Go notes From 33d350d9bdba878dc2a1024563b0acb88f05d788 Mon Sep 17 00:00:00 2001 From: Ian Cook Date: Thu, 4 Aug 2022 12:06:15 -0400 Subject: [PATCH 11/22] Make all Arrow C++ Substrait Jira references into links --- _posts/2022-08-01-9.0.0-release.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index 8db350165c2a..95291867f702 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -125,7 +125,10 @@ data files with very large metadata ([ARROW-16546](https://issues.apache.org/jir The Substrait spec has been updated to 0.6.0 ([ARROW-16816](https://issues.apache.org/jira/browse/ARROW-16816)). In addition, a larger subset of the Substrait specification is now supported ([ARROW-15587](https://issues.apache.org/jira/browse/ARROW-15587), -[ARROW-15590](https://issues.apache.org/jira/browse/ARROW-15590), ARROW-15901, ARROW-16657, ARROW-15591). +[ARROW-15590](https://issues.apache.org/jira/browse/ARROW-15590), +[ARROW-15901](https://issues.apache.org/jira/browse/ARROW-15901), +[ARROW-16657](https://issues.apache.org/jira/browse/ARROW-16657), +[ARROW-15591](https://issues.apache.org/jira/browse/ARROW-15591)). ## C# notes From 1fe13de64de097d771f24802de6e7a5f738670a9 Mon Sep 17 00:00:00 2001 From: Ian Cook Date: Thu, 4 Aug 2022 12:06:57 -0400 Subject: [PATCH 12/22] Adjust Jira/PR references in JavaScript notes for consistency --- _posts/2022-08-01-9.0.0-release.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index 95291867f702..55d0704ba413 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -214,8 +214,8 @@ larger subset of the Substrait specification is now supported ([ARROW-15587](htt * Don't duplicate generated Protobuf classes between flight-core and flight-sql ([#13596](https://github.com/apache/arrow/pull/13596)) ## JavaScript notes -* [ARROW-16371: [JS] Fix error iterating tables with no batches](https://github.com/apache/arrow/pull/13287) -* [ARROW-16704: [JS] Handle case where `tableFromIPC` input is an async `RecordBatchReader`](https://github.com/apache/arrow/pull/13278) +* Fix error iterating tables with no batches ([ARROW-16371](https://issues.apache.org/jira/browse/ARROW-16371)) +* Handle case where `tableFromIPC` input is an async `RecordBatchReader` ([ARROW-16704](https://issues.apache.org/jira/browse/ARROW-16704)) ## Python notes From 5f25b844de7dbf40b9fbbefd4e2d1bbfe531311a Mon Sep 17 00:00:00 2001 From: Ian Cook Date: Thu, 4 Aug 2022 12:08:05 -0400 Subject: [PATCH 13/22] Add missing line break --- _posts/2022-08-01-9.0.0-release.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index 55d0704ba413..e3671703b704 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -212,6 +212,7 @@ larger subset of the Substrait specification is now supported ([ARROW-15587](htt * Update Gandiva Protobuf library to enable builds on Apple M1 ([#13121](https://github.com/apache/arrow/pull/13121)) * Patch dataset module testing failure with JSE11+ ([#13200](https://github.com/apache/arrow/pull/13200)) * Don't duplicate generated Protobuf classes between flight-core and flight-sql ([#13596](https://github.com/apache/arrow/pull/13596)) + ## JavaScript notes * Fix error iterating tables with no batches ([ARROW-16371](https://issues.apache.org/jira/browse/ARROW-16371)) From 4667bfd6a09f7c9787ea0bf358d1923801a1d374 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Fri, 5 Aug 2022 09:54:29 +0200 Subject: [PATCH 14/22] add python notes --- _posts/2022-08-01-9.0.0-release.md | 31 ++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index e3671703b704..2f8909b6b1c2 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -220,6 +220,37 @@ larger subset of the Substrait specification is now supported ([ARROW-15587](htt ## Python notes +Compatibility notes: + +* PyArrow now requires Python >= 3.7 (ARROW-16474). + +* The default behaviour regarding memory mapping has changed in several APIs (reading of Feather or Parquet files, IPC RecordBatchFileReader and RecordBatchStreamReader) to disable memory mapping by default (ARROW-16382). + +* The default Parquet version is now 2.4 for writing, enabling use of +more recent logical types by default such as unsigned integers ([ARROW-12203](https://issues.apache.org/jira/browse/ARROW-12203)). One can specify `version="2.6"` to also enable support for nanosecond timestamps. Use `version="1.0"` to restore the old behaviour and maximizes file compatibility. + +* Some deprecated APIs (deprecated at least since pyarrow 1.0.0) have been removed: IPC methods in the top-level namespace, the `Value` scalar classes and the `pyarrow.compat` module (ARROW-17010). + +New features: + +* Google Cloud Storage (GCS) File System support is now available in the Python bindings (ARROW-14892). + +* The `Table.filter()` method now supports passing an expression in addition to a boolean array (ARROW-16469). + +* When implementing extension types in Python, it is now possible to also customize which Python scalar gets returned (in `Array.to_pylist()` or `Scalar.as_py()`) by subclassing `ExtensionScalar` (ARROW-13612, ARROW-17065). + +* It is now possible to register User Defined Functions (UDF) for scalar functions using `register_scalar_function` (ARROW-15639). + +* Basic support for consuming a Substrait plan has been exposed in Python as `pyarrow.substrait.run_query` (ARROW-15779). + +* The `cast` method and compute kernel now exposes the fine grained options in addition to safe/unsafe casting (ARROW-15365). + +In addition, this release includes several bug fixes and documention improvements (such as expanded examples in docstrings (ARROW-16091)). + +Further, the Python bindings benefit from improvements in the C++ library +(e.g. new compute functions); see the C++ notes above for additional details. + + ## R notes Highlights include several new `dplyr` verbs, including `glimpse()` and `union_all()`, as well as many more datetime functions from `lubridate`. There is also experimental support for user-defined scalar functions in the query engine, and most packages include native support for datasets in Google Cloud Storage (opt-in in the Linux full source build). From 1b39c2881563a887943572896c182f21ec84288d Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Fri, 5 Aug 2022 09:56:34 +0200 Subject: [PATCH 15/22] update links --- _posts/2022-08-01-9.0.0-release.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index 2f8909b6b1c2..961a81415c84 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -222,30 +222,30 @@ larger subset of the Substrait specification is now supported ([ARROW-15587](htt Compatibility notes: -* PyArrow now requires Python >= 3.7 (ARROW-16474). +* PyArrow now requires Python >= 3.7 ([ARROW-16474](https://issues.apache.org/jira/browse/ARROW-16474)). -* The default behaviour regarding memory mapping has changed in several APIs (reading of Feather or Parquet files, IPC RecordBatchFileReader and RecordBatchStreamReader) to disable memory mapping by default (ARROW-16382). +* The default behaviour regarding memory mapping has changed in several APIs (reading of Feather or Parquet files, IPC RecordBatchFileReader and RecordBatchStreamReader) to disable memory mapping by default ([ARROW-16382](https://issues.apache.org/jira/browse/ARROW-16382)). * The default Parquet version is now 2.4 for writing, enabling use of more recent logical types by default such as unsigned integers ([ARROW-12203](https://issues.apache.org/jira/browse/ARROW-12203)). One can specify `version="2.6"` to also enable support for nanosecond timestamps. Use `version="1.0"` to restore the old behaviour and maximizes file compatibility. -* Some deprecated APIs (deprecated at least since pyarrow 1.0.0) have been removed: IPC methods in the top-level namespace, the `Value` scalar classes and the `pyarrow.compat` module (ARROW-17010). +* Some deprecated APIs (deprecated at least since pyarrow 1.0.0) have been removed: IPC methods in the top-level namespace, the `Value` scalar classes and the `pyarrow.compat` module ([ARROW-17010](https://issues.apache.org/jira/browse/ARROW-17010)). New features: -* Google Cloud Storage (GCS) File System support is now available in the Python bindings (ARROW-14892). +* Google Cloud Storage (GCS) File System support is now available in the Python bindings ([ARROW-14892](https://issues.apache.org/jira/browse/ARROW-14892)). -* The `Table.filter()` method now supports passing an expression in addition to a boolean array (ARROW-16469). +* The `Table.filter()` method now supports passing an expression in addition to a boolean array ([ARROW-16469](https://issues.apache.org/jira/browse/ARROW-16469)). -* When implementing extension types in Python, it is now possible to also customize which Python scalar gets returned (in `Array.to_pylist()` or `Scalar.as_py()`) by subclassing `ExtensionScalar` (ARROW-13612, ARROW-17065). +* When implementing extension types in Python, it is now possible to also customize which Python scalar gets returned (in `Array.to_pylist()` or `Scalar.as_py()`) by subclassing `ExtensionScalar` ([ARROW-13612](https://issues.apache.org/jira/browse/ARROW-13612), ([ARROW-17065](https://issues.apache.org/jira/browse/ARROW-17065))). -* It is now possible to register User Defined Functions (UDF) for scalar functions using `register_scalar_function` (ARROW-15639). +* It is now possible to register User Defined Functions (UDF) for scalar functions using `register_scalar_function` ([ARROW-15639](https://issues.apache.org/jira/browse/ARROW-15639)). -* Basic support for consuming a Substrait plan has been exposed in Python as `pyarrow.substrait.run_query` (ARROW-15779). +* Basic support for consuming a Substrait plan has been exposed in Python as `pyarrow.substrait.run_query` ([ARROW-15779](https://issues.apache.org/jira/browse/ARROW-15779)). -* The `cast` method and compute kernel now exposes the fine grained options in addition to safe/unsafe casting (ARROW-15365). +* The `cast` method and compute kernel now exposes the fine grained options in addition to safe/unsafe casting ([ARROW-15365](https://issues.apache.org/jira/browse/ARROW-15365)). -In addition, this release includes several bug fixes and documention improvements (such as expanded examples in docstrings (ARROW-16091)). +In addition, this release includes several bug fixes and documention improvements (such as expanded examples in docstrings ([ARROW-16091](https://issues.apache.org/jira/browse/ARROW-16091))). Further, the Python bindings benefit from improvements in the C++ library (e.g. new compute functions); see the C++ notes above for additional details. From 3f3946c939519cf4ab06c01a2dac7138aa9a991f Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Fri, 5 Aug 2022 09:59:29 +0200 Subject: [PATCH 16/22] add rank kernel --- _posts/2022-08-01-9.0.0-release.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index 961a81415c84..3d03d4fe2b6e 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -76,6 +76,8 @@ Timestamp comparison is now supported ([ARROW-16425](https://issues.apache.org/j A cumulative sum function is implemented over numeric inputs ([ARROW-13530](https://issues.apache.org/jira/browse/ARROW-13530)). +A rank vector kernel has been added ([ARROW-16234](https://issues.apache.org/jira/browse/ARROW-16234)). + Temporal rounding functions received additional options to control how rounding is done ([ARROW-14821](https://issues.apache.org/jira/browse/ARROW-14821)). From 91e80393bbf7776fb423b614ebf0eaf9bd175cf2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Fri, 5 Aug 2022 11:10:09 +0200 Subject: [PATCH 17/22] Update _posts/2022-08-01-9.0.0-release.md Co-authored-by: Ian Cook --- _posts/2022-08-01-9.0.0-release.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index 3d03d4fe2b6e..26dd4fa5ea60 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -74,7 +74,8 @@ over all-scalar inputs; but those scalars are internally broadcasted to Timestamp comparison is now supported ([ARROW-16425](https://issues.apache.org/jira/browse/ARROW-16425)). -A cumulative sum function is implemented over numeric inputs ([ARROW-13530](https://issues.apache.org/jira/browse/ARROW-13530)). +A cumulative sum function is implemented over numeric inputs ([ARROW-13530](https://issues.apache.org/jira/browse/ARROW-13530)). Note that this is a vector +function so cannot be used in an Acero ExecPlan. A rank vector kernel has been added ([ARROW-16234](https://issues.apache.org/jira/browse/ARROW-16234)). From ccbe8de48f6ffdc6fc9a951af6f1d69d28224627 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Fri, 5 Aug 2022 11:15:08 +0200 Subject: [PATCH 18/22] Fix some typos and update number of issues solved and contributors --- _posts/2022-08-01-9.0.0-release.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index 26dd4fa5ea60..f70f8d80d673 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -26,12 +26,12 @@ limitations under the License. The Apache Arrow team is pleased to announce the 9.0.0 release. This covers -over 3 months of development work and includes [**473 resolved issues**][1] -from [**YYY distinct contributors**][2]. See the Install Page to learn how to +over 3 months of development work and includes [**509 resolved issues**][1] +from [**114 distinct contributors**][2]. See the Install Page to learn how to get the libraries for your platform. The release notes below are not exhaustive and only expose selected highlights -of the release. Many other bugfixes and improvements have been made: we refer +of the release. Many other bug fixes and improvements have been made: we refer you to the [complete changelog][3]. ## Community @@ -99,7 +99,7 @@ Fix `FilenamePartitioning` to properly support null values ([ARROW-16302](https: ### Filesystem -A number of bugfixes and improvements were made to the Google Cloud Storage +A number of bug fixes and improvements were made to the Google Cloud Storage filesystem implementation ([ARROW-14892](https://issues.apache.org/jira/browse/ARROW-14892)). By default, the S3 filesystem implementation does not create or drop buckets @@ -121,7 +121,7 @@ Statistics equality now works correctly with minmax ([ARROW-16487](https://issue The minimum Thrift version required for building is now 0.13 ([ARROW-16721](https://issues.apache.org/jira/browse/ARROW-16721)). -The Thrift deserialization limits can now be configured to accomodate for +The Thrift deserialization limits can now be configured to accommodate for data files with very large metadata ([ARROW-16546](https://issues.apache.org/jira/browse/ARROW-16546)). ### Substrait @@ -168,7 +168,7 @@ larger subset of the Substrait specification is now supported ([ARROW-15587](htt * ipc.FileReader now properly uses the memory.Allocator interface ([ARROW-16002](https://issues.apache.org/jira/browse/ARROW-16002)) * Addressed issue with Integration tests between Go and Java ([ARROW-16441](https://issues.apache.org/jira/browse/ARROW-16441)) -* RecordBuilder.UnmarshalJSON now properly ignores extra unknown fields rather than panic'ing ([ARROW-16456](https://issues.apache.org/jira/browse/ARROW-16456)) +* RecordBuilder.UnmarshalJSON now properly ignores extra unknown fields rather than panicking ([ARROW-16456](https://issues.apache.org/jira/browse/ARROW-16456)) * StructBuilder.UnmarshalJSON will no longer fail and panic when Nullable fields are missing ([ARROW-16502](https://issues.apache.org/jira/browse/ARROW-16502)) * ipc.Reader no longer silently accepts string columns with invalid offsets, preventing unexpected panics later when writing or accessing the resulting arrays. ([ARROW-16831](https://issues.apache.org/jira/browse/ARROW-16831)) * Arrow CSV reader no longer clobbers its reported errors and properly surfaces them ([ARROW-16926](https://issues.apache.org/jira/browse/ARROW-16926)) @@ -179,7 +179,7 @@ larger subset of the Substrait specification is now supported ([ARROW-15587](htt * The CreatedBy version string for the Parquet writer will now correctly reflect the library version, and will be updated by the release scripts ([ARROW-16484](https://issues.apache.org/jira/browse/ARROW-16484)) * Parquet bit_packing functions now have ARM64 NEON implementations for performance ([ARROW-16486](https://issues.apache.org/jira/browse/ARROW-16486)) -* It is now possible to customize the root node in the Parquet writer instead of hardcoding it to be named "schema" with a repetition type of Repeated. This was needed to allow producing files similar to spark where the root node has a repetition type of Required. It still defaults to the spec definition of Repeated. ([ARROW-16561](https://issues.apache.org/jira/browse/ARROW-16561)) +* It is now possible to customize the root node in the Parquet writer instead of hardcoding it to be named "schema" with a repetition type of Repeated. This was needed to allow producing files similar to Apache Spark where the root node has a repetition type of Required. It still defaults to the spec definition of Repeated. ([ARROW-16561](https://issues.apache.org/jira/browse/ARROW-16561)) * parquet_reader CLI mainprog has been enhanced to dump values out as JSON and CSV along with setting an output file instead of just dumping to the terminal. ([ARROW-16934](https://issues.apache.org/jira/browse/ARROW-16934)) #### Bug Fixes From 2b29845755e4f6cb48e1520b9c910e2d0a115e51 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei Date: Sat, 6 Aug 2022 22:56:24 +0900 Subject: [PATCH 19/22] Add Linux packages, C GLib and Ruby updates --- _posts/2022-08-01-9.0.0-release.md | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index f70f8d80d673..bc9f0c89187d 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -46,7 +46,13 @@ Thanks for your contributions and participation in the project! Arrow Flight is now available in MacOS M1 Python wheels ([ARROW-16779](https://issues.apache.org/jira/browse/ARROW-16779)). Arrow Flight SQL is now buildable on Windows ([ARROW-16902](https://issues.apache.org/jira/browse/ARROW-16902)). -Ruby now exposes more of the Flight and Flight SQL APIs (various JIRAs). +Ruby now exposes more of the Flight and Flight SQL APIs (various JIRAs). + +## Linux packages notes + +AlmaLinux 9 is now supported. ([ARROW-16745](https://issues.apache.org/jira/browse/ARROW-16745)) + +AmazonLinux 2 aarch64 is now supported. ([ARROW-16477](https://issues.apache.org/jira/browse/ARROW-16477)) ## C++ notes @@ -139,7 +145,7 @@ larger subset of the Substrait specification is now supported ([ARROW-15587](htt * Added support for Time32Array and Time64Array ([ARROW-16660](https://github.com/apache/arrow/pull/13279)) -#### Bug Fixes +#### Bug Fixes * When using TableFromRecordBatches, the resulting table columns have no data array. ([ARROW-13129](https://github.com/apache/arrow/pull/10562)) * Fix intermittent test failures due to async memory management bug. ([ARROW-16978](https://github.com/apache/arrow/pull/13573)) @@ -164,7 +170,7 @@ larger subset of the Substrait specification is now supported ([ARROW-15587](htt * Added a SliceBuffer function to the memory package to allow better re-using of memory across buffer objects ([ARROW-16557](https://issues.apache.org/jira/browse/ARROW-16557)) * Dictionary Arrays can now be concatenated using array.Concatenate ([ARROW-17095](https://issues.apache.org/jira/browse/ARROW-17095)) -#### Bug Fixes +#### Bug Fixes * ipc.FileReader now properly uses the memory.Allocator interface ([ARROW-16002](https://issues.apache.org/jira/browse/ARROW-16002)) * Addressed issue with Integration tests between Go and Java ([ARROW-16441](https://issues.apache.org/jira/browse/ARROW-16441)) @@ -262,10 +268,26 @@ For more on what’s in the 9.0.0 R package, see the [R changelog][4]. ## Ruby and C GLib notes +FlightSQL is now supported but there are minimum features for now. + +More Flight features are now supported. + ### Ruby +`Enumerable` compatible methods such as `#min` and `#max` on `Arrow::Array`, `Arrow::ChunkedArray` and `Arrow::Column` are implemented by C++'s [compute functions]({{ site.baseurl }}/docs/cpp/compute.html). This improves performance. ([ARROW-15222](https://issues.apache.org/jira/browse/ARROW-15222)) + +This release fixed some memory leaks. ([ARROW-14790](https://issues.apache.org/jira/browse/ARROW-14790)) + +This release improved support for interval type arrays such as `Arrow::MonthIntervalArray`. ([ARROW-16206](https://issues.apache.org/jira/browse/ARROW-16206)) + +This release improved auto data type conversion. ([ARROW-16874](https://issues.apache.org/jira/browse/ARROW-16874)) + ### C GLib +Vala is now supported. ([ARROW-15671](https://issues.apache.org/jira/browse/ARROW-15671)). See [`c_glib/example/vala/`](https://github.com/apache/arrow/tree/apache-arrow-9.0.0/c_glib/example/vala) for examples. + +`GArrowQunatileOptions` is added. ([ARROW-16623](https://issues.apache.org/jira/browse/ARROW-16623)) + ## Rust notes The Rust projects have moved to separate repositories outside the From 4fee7a04817245cf4f7c8a9d69444bc128bb4907 Mon Sep 17 00:00:00 2001 From: Weston Pace Date: Mon, 8 Aug 2022 16:58:41 -0700 Subject: [PATCH 20/22] Update 2022-08-01-9.0.0-release.md Added info for ARROW-14182 and ARROW-15498 which improve hash join performance. --- _posts/2022-08-01-9.0.0-release.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index bc9f0c89187d..a29c2cc8f464 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -78,6 +78,15 @@ an array. The higher-level compute APIs still allow executing a scalar function over all-scalar inputs; but those scalars are internally broadcasted to 1-element arrays so as to simplify kernel implementation ([ARROW-16757](https://issues.apache.org/jira/browse/ARROW-16757)). +Some performance improvements were made to the hash join node. These changes +do not require additional configuration. The hash join exec node has been +improved to more efficiently use CPU cache and make better use of available +vectorization hardware ([ARROW-14182](https://issues.apache.org/jira/browse/ARROW-14182)). + +Some plans containing a sequence of hash join operators will now use bloom +filters to eliminate rows earlier in the plan, reducing the overall CPU +cost of the plan ([ARROW-15498](https://issues.apache.org/jira/browse/ARROW-15498)). + Timestamp comparison is now supported ([ARROW-16425](https://issues.apache.org/jira/browse/ARROW-16425)). A cumulative sum function is implemented over numeric inputs ([ARROW-13530](https://issues.apache.org/jira/browse/ARROW-13530)). Note that this is a vector From 9117b9edc91cf9d4eb99644df9035dd815406641 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei Date: Tue, 16 Aug 2022 13:02:04 +0900 Subject: [PATCH 21/22] Fix a typo --- _posts/2022-08-01-9.0.0-release.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-01-9.0.0-release.md index a29c2cc8f464..bb4d7b921ed1 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-01-9.0.0-release.md @@ -295,7 +295,7 @@ This release improved auto data type conversion. ([ARROW-16874](https://issues.a Vala is now supported. ([ARROW-15671](https://issues.apache.org/jira/browse/ARROW-15671)). See [`c_glib/example/vala/`](https://github.com/apache/arrow/tree/apache-arrow-9.0.0/c_glib/example/vala) for examples. -`GArrowQunatileOptions` is added. ([ARROW-16623](https://issues.apache.org/jira/browse/ARROW-16623)) +`GArrowQuantileOptions` is added. ([ARROW-16623](https://issues.apache.org/jira/browse/ARROW-16623)) ## Rust notes From f7fa93df88cd8efb36b9847ec900351c965e3886 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei Date: Tue, 16 Aug 2022 16:07:02 +0900 Subject: [PATCH 22/22] Update date --- ...22-08-01-9.0.0-release.md => 2022-08-16-9.0.0-release.md} | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) rename _posts/{2022-08-01-9.0.0-release.md => 2022-08-16-9.0.0-release.md} (99%) diff --git a/_posts/2022-08-01-9.0.0-release.md b/_posts/2022-08-16-9.0.0-release.md similarity index 99% rename from _posts/2022-08-01-9.0.0-release.md rename to _posts/2022-08-16-9.0.0-release.md index bb4d7b921ed1..f69e3edfc26b 100644 --- a/_posts/2022-08-01-9.0.0-release.md +++ b/_posts/2022-08-16-9.0.0-release.md @@ -1,7 +1,7 @@ --- layout: post title: "Apache Arrow 9.0.0 Release" -date: "2022-08-01 00:00:00" +date: "2022-08-16 00:00:00" author: pmc categories: [release] --- @@ -295,7 +295,8 @@ This release improved auto data type conversion. ([ARROW-16874](https://issues.a Vala is now supported. ([ARROW-15671](https://issues.apache.org/jira/browse/ARROW-15671)). See [`c_glib/example/vala/`](https://github.com/apache/arrow/tree/apache-arrow-9.0.0/c_glib/example/vala) for examples. -`GArrowQuantileOptions` is added. ([ARROW-16623](https://issues.apache.org/jira/browse/ARROW-16623)) +`GArrowQuantil +eOptions` is added. ([ARROW-16623](https://issues.apache.org/jira/browse/ARROW-16623)) ## Rust notes