-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Array
Logical Nullability
#4691
Conversation
@@ -311,11 +311,7 @@ pub fn not(left: &BooleanArray) -> Result<BooleanArray, ArrowError> { | |||
/// assert_eq!(a_is_null, BooleanArray::from(vec![false, false, true])); | |||
/// ``` | |||
pub fn is_null(input: &dyn Array) -> Result<BooleanArray, ArrowError> { | |||
let values = match input.nulls() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer need to special case NullArray, whilst also now correctly handling DictionaryArray with value nulls
@@ -1253,4 +1286,20 @@ mod tests { | |||
assert_eq!(v, expected, "{idx}"); | |||
} | |||
} | |||
|
|||
#[test] | |||
fn test_iterator_nulls() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test for #4616
/// are present. For example a [`DictionaryArray`] with nullable values will still return true, | ||
/// even if the nulls present in [`DictionaryArray::values`] are not referenced by any key, | ||
/// and therefore would not appear in [`Array::logical_nulls`]. | ||
fn is_nullable(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debated calling this method is_required instead, but figured it was better to stick with consistent terminology, and this matches the method on Field
@@ -107,22 +109,12 @@ impl Array for NullArray { | |||
None | |||
} | |||
|
|||
/// Returns whether the element at `index` is null. | |||
/// All elements of a `NullArray` are always null. | |||
fn is_null(&self, _index: usize) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change, although I expect the impact to minimal. I debated making is_null and friends always return logical nullability, but this would have been inconsistent with ArrayData and generally seemed like it might just cause more confusion
|
||
#[test] | ||
#[cfg(feature = "dyn_cmp_dict")] | ||
fn test_dictionary_nested_nulls() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a test for #4688
@@ -4016,4 +4016,30 @@ mod tests { | |||
vec![None, None, None, Some(5.1), Some(5.1), Some(3.0), Some(1.2)], | |||
); | |||
} | |||
|
|||
#[test] | |||
fn test_lexicographic_comparator_null_dict_values() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test for #2687
} | ||
|
||
#[test] | ||
fn test_null_dictionary_values() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test for #4690
ab28cfe
to
721d65e
Compare
721d65e
to
55a420b
Compare
@@ -277,6 +317,10 @@ impl Array for ArrayRef { | |||
self.as_ref().nulls() | |||
} | |||
|
|||
fn logical_nulls(&self) -> Option<NullBuffer> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered having this eagerly constructed and stored on the arrays, but I felt this would be more surprising as it would make the arrays diverge from the arrow specification. Whilst this does raise the prospect of computing this NullBuffer multiple times, the cases where this is required are fairly rare. We can always optimise in future should this become a problem
fn nulls(&self) -> Option<&NullBuffer>; | ||
|
||
/// Returns the logical null buffer of this array if any | ||
/// | ||
/// In most cases this will be the same as [`Array::nulls`], except for: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Array::nulls
should be deprecated and a non-deprecated version physical_nulls
should be introduced. Users will misuse nulls
because:
- most people will read over the small note within the doc string
nulls
is easier to find thanlogical_nulls
- there are two options of "which null", and we offer an implicit default (because it is not prefixed with a semantic) that is clearly the wrong one for many use cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whilst I do sort of agree, I'm not sure this wouldn't just cause more confusion, as this isn't a distinction found in the arrow spec. Would we also deprecate is_null, is_valid, etc... I think the current API hasn't been a major issue thus far, as the cases where it matters are rare.
FWIW once we have StringView I think there is basically no reason to use DictionaryArray or RunArray
Which issue does this PR close?
Closes #4690
Closes #4689
Closes #4688
Closes #2687
Closes #4616
Relates to #4565
Relates to #4835
Rationale for this change
Various types have nullability that is not accurately reflected in their reported NullBuffer, this causes incorrect behaviour for various kernels that have special handling of the null buffers.
What changes are included in this PR?
Adds two methods to Array:
This logical null mask can then be used in the places where the logic relies on logical nullability
Are there any user-facing changes?