-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-34315: [C++] Correct is_null kernel for Union and RunEndEncoded logical nulls #37642
base: main
Are you sure you want to change the base?
Conversation
…ded logical nulls
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
@@ -82,6 +85,72 @@ static void SetNanBits(const ArraySpan& arr, uint8_t* out_bitmap, int64_t out_of | |||
} | |||
} | |||
|
|||
static void SetSparseUnionLogicalNullBits(const ArraySpan& span, uint8_t* out_bitmap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth reusing the choose
kernel for this case?
} | ||
|
||
template <typename RunEndCType> | ||
void SetREELogicalNullBits(const ArraySpan& span, uint8_t* out_bitmap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it seems we could trivially create an ree($index_type, bool)
, then return that directly or decode it using existing utilities
int64_t out_offset) { | ||
const auto* dense_union_type = | ||
::arrow::internal::checked_cast<const DenseUnionType*>(span.type); | ||
DCHECK_LE(span.child_data.size(), 128); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someday if we have more dense union utilities the same trick as for sparse unions could be used here
Rationale for this change
Currently the is_null kernel always returns all-False for union and run-end-encoded types, since those don't have a top-level validity buffer. Update the kernel to take the logical nulls into account.
Are these changes tested?
Yes, both in Python and C++
Are there any user-facing changes?
Yes, this changes (corrects) the behaviour of the is_null kernel.
Closes #35036
pa.compute.is_null()
returns incorrect answer for dense union arrays and segfaults for dense union scalars #34315