Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-39519: [Swift] Fix null count when using reader #39520

Merged
merged 1 commit into from
Jan 13, 2024
Merged

Conversation

abandy
Copy link
Contributor

@abandy abandy commented Jan 8, 2024

Currently the reader is not properly setting the null count when building an array from a stream. This PR adds a fix for this.

@abandy abandy requested a review from kou as a code owner January 8, 2024 23:47
@abandy abandy changed the title GH-37938:[Swift] fix null count when using reader GH-39519:[Swift] fix null count when using reader Jan 8, 2024
Copy link

github-actions bot commented Jan 8, 2024

⚠️ GitHub issue #39519 has been automatically assigned in GitHub to PR creator.

@kou kou changed the title GH-39519:[Swift] fix null count when using reader GH-39519: [Swift] Fix null count when using reader Jan 9, 2024
@kou
Copy link
Member

kou commented Jan 9, 2024

This approach passes nullCount from Apache Arrow data (.arrow/.arrows data) to ArrowArray via ArrowNullBuffer, right?
The C++ implementation passes it to arrow::Array (arrow::ArrayData) directly:

out->null_count = node->null_count();

If we use this approach in Swift too, we will use API something like the following:

diff --git a/swift/Arrow/Sources/Arrow/ArrowReader.swift b/swift/Arrow/Sources/Arrow/ArrowReader.swift
index d9dc1bdb47..c985b37ca6 100644
--- a/swift/Arrow/Sources/Arrow/ArrowReader.swift
+++ b/swift/Arrow/Sources/Arrow/ArrowReader.swift
@@ -88,7 +88,7 @@ public class ArrowReader {
             let valueBuffer = loadInfo.recordBatch.buffers(at: loadInfo.bufferIndex + 2)!
             let arrowValueBuffer = makeBuffer(valueBuffer, fileData: loadInfo.fileData,
                                               length: UInt(node.length), messageOffset: loadInfo.messageOffset)
-            return makeArrayHolder(loadInfo.field, buffers: [arrowNullBuffer, arrowOffsetBuffer, arrowValueBuffer])
+            return makeArrayHolder(loadInfo.field, buffers: [arrowNullBuffer, arrowOffsetBuffer, arrowValueBuffer], UInt(node.nullCount))
         } catch let error as ArrowError {
             return .failure(error)
         } catch {
diff --git a/swift/Arrow/Sources/Arrow/ArrowReaderHelper.swift b/swift/Arrow/Sources/Arrow/ArrowReaderHelper.swift
index fa52160478..13d6209dee 100644
--- a/swift/Arrow/Sources/Arrow/ArrowReaderHelper.swift
+++ b/swift/Arrow/Sources/Arrow/ArrowReaderHelper.swift
@@ -126,7 +126,8 @@ private func makeFixedHolder<T>(
 
 func makeArrayHolder( // swiftlint:disable:this cyclomatic_complexity
     _ field: org_apache_arrow_flatbuf_Field,
-    buffers: [ArrowBuffer]
+    buffers: [ArrowBuffer],
+    nullCount: UInt
 ) -> Result<ArrowArrayHolder, ArrowError> {
     let type = field.typeType
     switch type {

What do you think this approach?

@abandy
Copy link
Contributor Author

abandy commented Jan 13, 2024

Hi @kou, I have made the requested updates. Please review when you get a chance. Thank you!

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit e632364 into apache:main Jan 13, 2024
7 checks passed
@kou kou removed the awaiting review Awaiting review label Jan 13, 2024
@github-actions github-actions bot added the awaiting merge Awaiting merge label Jan 13, 2024
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit e632364.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

clayburn pushed a commit to clayburn/arrow that referenced this pull request Jan 23, 2024
Currently the reader is not properly setting the null count when building an array from a stream.  This PR adds a fix for this.

* Closes: apache#39519

Authored-by: Alva Bandy <abandy@live.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
Currently the reader is not properly setting the null count when building an array from a stream.  This PR adds a fix for this.

* Closes: apache#39519

Authored-by: Alva Bandy <abandy@live.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Feb 28, 2024
Currently the reader is not properly setting the null count when building an array from a stream.  This PR adds a fix for this.

* Closes: apache#39519

Authored-by: Alva Bandy <abandy@live.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
thisisnic pushed a commit to thisisnic/arrow that referenced this pull request Mar 8, 2024
Currently the reader is not properly setting the null count when building an array from a stream.  This PR adds a fix for this.

* Closes: apache#39519

Authored-by: Alva Bandy <abandy@live.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@abandy abandy deleted the GH-39519 branch April 2, 2024 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Swift] Null count is not being set when reading from stream
2 participants