-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Physical null and logical null are confusing concepts #4840
Comments
I'd like to provide some though from a user perspective. When processing data where null is very common, it's natural to looking for a way to reduce the comsuption of those null values. As it's known that some part of the data is missing and we of cause can make optimization based on that. But I find it seems to be difficult at present. I can't use (BTW, if I want write this part of data to parquet, or passing/compute it under a given schema, I can only build a corresponding array, and fills From whether the type is null and whether the value is null, we can give four (!!) types of null. When the type is null, test function like By listing them down, some questions come to my mind:
Physical and logical null are truly confusing. But it there any way to make it intuitive and easy to use 🤔 |
I agree @waynexia I don't quite follow your examples with 4 different types of null. If you need to quickly create an array that represents entirely null of a single type, there is It is fast to check for an array that contains only nulls by using let num_nulls = array.nulls().map(|nulls| nulls.null_count()).unwrap_or(0)
let is_all_null = num_nulls = array.len(); Though as the docs say, this isn't correct for Dictionary or REE arrays |
Yes, I'm considering the "strictly necessary" consumption. API could be something formed later.
Sorry for my unclear description 🥲 Let me make another try. Here is a table of those four types:
Arrays created by The problem is What confuses me is From my perspective as a user, an array like In many scenarios I met, |
This might be a little off-topic. I'd like to propose a new array similar to I don't know if it's still an option to not have "logical null" and "physical null". Maybe overriding |
This sounds very much like what datafusion
I think the I made #5434 to try and clarify this even more |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The arrow-rs library (now) makes a distinction between physical nulls and logical nulls as the same distinction is made in the Arrow specification (though the terms physical and logical nulls are not used, to my knoweldge)
The issue is that for certain array types computing if an element is very fast (consult a pre-existing bitmap) but for others can be quite slow (e.g. a dictionary where both the keys and values must be consulted for nullness)
The method named
Array::is_null
returns the (fast) physical nullness, but is deeply confusing for for certain types -- see #4835 and #4838 (comment) from @crepererum.We have tried to clarify the difference in #4838 but it is still confusing
Describe the solution you'd like
I am not sure -- @crepererum suggests in #4838 (comment)
However, there are downsides to this too
Describe alternatives you've considered
The documentation changes may be enough, but I think the issue is important enough to track here
Additional context
The text was updated successfully, but these errors were encountered: