Fix for primitive and boolean take kernel for nullable indices with an offset #509

jhorstmann · 2021-06-29T19:04:28Z

Which issue does this PR close?

Closes #502.

While implementing the fix I noticed a similar issue in the boolean take kernel which is now also fixed.

Rationale for this change

When reusing the validity buffer of an array for a newly created array the offsets of the original array have to be taken into account. The original array might have had an offset > 0 but the new array will usually start at 0 and so a slice of the validity buffer has to be taken.

What changes are included in this PR?

Are there any user-facing changes?

No

codecov-commenter · 2021-06-29T19:20:14Z

Codecov Report

Merging #509 (2423fad) into master (99b1c90) will increase coverage by 0.12%.
The diff coverage is 100.00%.

❗ Current head 2423fad differs from pull request most recent head 541b5ad. Consider uploading reports for the commit 541b5ad to get more accurate results

@@            Coverage Diff             @@
##           master     #509      +/-   ##
==========================================
+ Coverage   82.64%   82.76%   +0.12%     
==========================================
  Files         165      165              
  Lines       45703    45724      +21     
==========================================
+ Hits        37769    37845      +76     
+ Misses       7934     7879      -55

Impacted Files	Coverage Δ
arrow/src/array/array_binary.rs	`92.23% <ø> (+2.10%)`	⬆️
arrow/src/array/array_boolean.rs	`94.01% <ø> (+3.10%)`	⬆️
arrow/src/array/array_dictionary.rs	`88.38% <ø> (+3.81%)`	⬆️
arrow/src/array/array_list.rs	`94.88% <ø> (+2.06%)`	⬆️
arrow/src/array/array_primitive.rs	`94.60% <ø> (-0.10%)`	⬇️
arrow/src/array/array_string.rs	`97.76% <ø> (+1.71%)`	⬆️
arrow/src/array/array_struct.rs	`89.24% <ø> (+1.39%)`	⬆️
arrow/src/array/array_union.rs	`89.26% <ø> (+2.33%)`	⬆️
arrow/src/array/null.rs	`83.78% <ø> (-2.89%)`	⬇️
arrow/src/array/array.rs	`80.90% <100.00%> (+4.04%)`	⬆️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 99b1c90...541b5ad. Read the comment docs.

alamb

I reviewed the logic and the tests carefully and this looks great to me. Thank you @jhorstmann

FYI @ritchie46

alamb · 2021-06-29T21:00:56Z

arrow/src/compute/kernels/take.rs

@@ -516,7 +522,7 @@ where
        nulls = match indices.data_ref().null_buffer() {
            Some(buffer) => Some(buffer_bin_and(
                buffer,
-                0,
+                indices.offset(),
                &null_buf.into(),
                0,


Is it correct that this 0 is due to the fact that null_buf was constructed via

let mut null_buf = MutableBuffer::new(num_byte).with_bitset(num_byte, true);

in the same else clause?

Yes, null_buf is newly constructed and initialized starting from 0, while the first buffer and offset pair are coming from the indices array which might have a non-0 offset.

There was a proposal before to push the offsets down into all buffers instead of storing it in the array. That way we wouldn't need to care about which array a buffer originally belonged too. But if we do that we'd still need a better abstraction for the validity bitmap and only access it via (chunked) iterators. I'm also not sure whether such a change would have an affect on FFI usage.

alamb · 2021-06-29T21:03:05Z

arrow/src/compute/kernels/take.rs

+        );
+
+        test_take_primitive_arrays_non_null::<Int64Type>(
+            vec![0, 1, 2, 3, 4, 5, 6],


Would it make sense to use different values here than the indices -- perhaps something like

Suggested change

vec![0, 1, 2, 3, 4, 5, 6],

vec![0, 10, 20, 30, 40, 50, 60],

So it is clearer from just this context that just 20 and 30 should be returned

Good idea, done

ritchie46 · 2021-06-30T06:15:04Z

Nice. Great that this is fixed before the 5.0 release!

…ices

alamb

Looking good -- thanks again @jhorstmann

…n offset (#509) * Fix for take kernel with nullable indices and nonnull values * Fix for boolean take kernel when indices have an offset * Use different values for data so they cannot be confused with the indices

…n offset (#509) (#516) * Fix for take kernel with nullable indices and nonnull values * Fix for boolean take kernel when indices have an offset * Use different values for data so they cannot be confused with the indices Co-authored-by: Jörn Horstmann <git@jhorstmann.net>

jhorstmann added 2 commits June 29, 2021 20:48

Fix for take kernel with nullable indices and nonnull values

1652d2b

Fix for boolean take kernel when indices have an offset

42aca71

github-actions bot added the arrow Changes to the arrow crate label Jun 29, 2021

alamb approved these changes Jun 29, 2021

View reviewed changes

Use different values for data so they cannot be confused with the ind…

541b5ad

…ices

alamb approved these changes Jun 30, 2021

View reviewed changes

alamb merged commit b63c407 into apache:master Jun 30, 2021

alamb added the cherry-picked label Jun 30, 2021

alamb mentioned this pull request Jun 30, 2021

Cherry pick Fix for primitive and boolean take kernel for nullable indices with an offset to active_release #516

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for primitive and boolean take kernel for nullable indices with an offset #509

Fix for primitive and boolean take kernel for nullable indices with an offset #509

jhorstmann commented Jun 29, 2021

codecov-commenter commented Jun 29, 2021 •

edited

Loading

alamb left a comment

alamb Jun 29, 2021

jhorstmann Jun 30, 2021

alamb Jun 29, 2021

jhorstmann Jun 30, 2021

ritchie46 commented Jun 30, 2021

alamb left a comment

Fix for primitive and boolean take kernel for nullable indices with an offset #509

Fix for primitive and boolean take kernel for nullable indices with an offset #509

Conversation

jhorstmann commented Jun 29, 2021

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

codecov-commenter commented Jun 29, 2021 • edited Loading

Codecov Report

alamb left a comment

Choose a reason for hiding this comment

alamb Jun 29, 2021

Choose a reason for hiding this comment

jhorstmann Jun 30, 2021

Choose a reason for hiding this comment

alamb Jun 29, 2021

Choose a reason for hiding this comment

jhorstmann Jun 30, 2021

Choose a reason for hiding this comment

ritchie46 commented Jun 30, 2021

alamb left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jun 29, 2021 •

edited

Loading