Skip to content
This repository has been archived by the owner on May 4, 2019. It is now read-only.

[RFC/WIP] Few additional methods #155

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

[RFC/WIP] Few additional methods #155

wants to merge 9 commits into from

Conversation

alyst
Copy link
Member

@alyst alyst commented Oct 4, 2016

Some methods that I introduced, while rebasing JuliaData/DataFrames.jl#850

  • isnull(A::NullableArray) to get the boolean array of A nulls
  • unsafe_isnull(A::NullableArray, i::Int...) to check for nulls without checking the validity of the index
  • AbstractNullableArray{T, N} typealias of AbstractArray{Nullable{T}, N} (however AbstractNullableArray (without params) is not considered by Julia more specific than AbstractArray, I don't know if it's a known feature/bug)
  • isless{T,S}(a::Nullable{T}, b::S) to compare nullables vs nonnullables (nonnullable is always less)

@nalimilan
Copy link
Member

nalimilan commented Oct 4, 2016

isnull(A::NullableArray) to get the boolean array of A nulls

This conflicts with the new definition of isnull(::Any) in Julia 0.6. I'd prefer using isnull.(A), which should work if we fix broadcast.

unsafe_isnull(A::NullableArray, i::Int...) to check for nulls without checking the validity of the index

Can't we use the @inbounds machinery combined with isnull(A, i) for that?

isless{T,S}(a::Nullable{T}, b::S) to compare nullables vs nonnullables (nonnullable is always less)

Why would you want this sort order? I'd rather use lifting semantics after wrapping b inside a Nullable, but we shouldn't do that without a clear plan of what functions need this. EDIT: reading the code, I see that only null values are sorted last. So that's just an extension of isless to mixed cases. I would be on board with that, but we should also do the same for isequal, as well as for all operators. This deserves a separate issue (see #85).

@alyst
Copy link
Member Author

alyst commented Oct 4, 2016

isnull(A::NullableArray) to get the boolean array of A nulls

This conflicts with the new definition of isnull(::Any) in Julia 0.6. I'd prefer using isnull.(A), which should work if we fix broadcast.

Would isnull.() be able to just return A.isnull?

unsafe_isnull(A::NullableArray, i::Int...) to check for nulls without checking the validity of the index

Can't we use the @inbounds machinery combined with isnull(A, i) for that?

I must admit my understanding of what @inbounds can do is very limited. Would it see across the function boundary? The other problem is that @inbounds Expr IIRC returns nothing, which is sometimes inconvenient.

isless{T,S}(a::Nullable{T}, b::S) to compare nullables vs nonnullables (nonnullable is always less)

... I would be on board with that, but we should also do the same for isequal, as well as for all operators. This deserves a separate issue (see #85).

Actually, this one I don't really need, so I will remove it from the PR. Anyway, if it has to be defined, it should go to Base.

@codecov-io
Copy link

codecov-io commented Oct 4, 2016

Codecov Report

Merging #155 into master will decrease coverage by 7.54%.
The diff coverage is 45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #155      +/-   ##
==========================================
- Coverage   65.43%   57.88%   -7.55%     
==========================================
  Files          14       13       -1     
  Lines         732      862     +130     
==========================================
+ Hits          479      499      +20     
- Misses        253      363     +110
Impacted Files Coverage Δ
src/typedefs.jl 100% <ø> (ø) ⬆️
src/indexing.jl 76.19% <33.33%> (-1.86%) ⬇️
src/primitives.jl 95.4% <40%> (-4.6%) ⬇️
src/reduce.jl 83.58% <55.55%> (+3.12%) ⬆️
src/broadcast.jl 57.64% <0%> (-36.95%) ⬇️
src/operators.jl 51.21% <0%> (-26.29%) ⬇️
src/map.jl 87.87% <0%> (-3.43%) ⬇️
src/subarray0_5.jl 33.33% <0%> (ø) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 212b3b8...3dab600. Read the comment docs.

@nalimilan
Copy link
Member

Would isnull.() be able to just return A.isnull?

It would, by overloading broadcast(::typeof(isnull), ::NullableArray). Though we should check whether it's OK to return the internal field without a copy. It could certainly be confusing.

I must admit my understanding of what @inbounds can do is very limited. Would it see across the function boundary? The other problem is that @inbounds Expr IIRC returns nothing, which is sometimes inconvenient.

You just need to add @boundscheck at the beginning of the line which does the bounds checking, and mark the getindex function as @inline. See http://docs.julialang.org/en/release-0.5/devdocs/boundscheck/. Do you have an example where the fact that @inbounds doesn't return anything would really be a problem?

@davidagold
Copy link
Contributor

davidagold commented Oct 5, 2016

Though we should check whether it's OK to return the internal field without a copy. It could certainly be confusing.

Agreed. I think I'd prefer a copy. This might be ameliorated by switching over to a :hasvalue field for NullableArrays and then have only a generic method for isnull but not one for hasvalue so that hasvalue(X::NullableArray) could select the :hasvalue field.

EDIT: Or we could just require that anybody who wants to modify the internals of a NullableArray do their own dirty work and explicitly use getfield. I think I like that the best.

@alyst
Copy link
Member Author

alyst commented Oct 5, 2016

Is there any possibility to return a readonly view of the A.isnull?

@davidagold
Copy link
Contributor

I don't think so. That seems like a weird special case. What's wrong with Base.getfield?

@alyst
Copy link
Member Author

alyst commented Oct 6, 2016

@davidagold I thought the reason to return a copy of A.isnull is to avoid accidental modifications of the field. OTOH in most of the cases isnull(A) would be used for reading only, and copying would be unnecessary performance penalty. It would be nice if isnull(A) returns something that implements AbstractArray{Bool} interface without setindex!() and doesn't allocate. Then the same could be done for NullableCategoricalArray, which doesn't have the :isnull field.

@nalimilan
Copy link
Member

Anyway as David said we need to change isnull to hasvalue field for consistency with e.g. Arrow, Pandas 2 or PostgreSQL. I guess we could return a generator, like isnull(x::NullableArray) = (!x.hasvalue[i] for i in eachindex(x.hasvalue)). That would also work for CategoricalArrays. But it could be surprising for users...

@cjprybol
Copy link
Contributor

bump re: hasvalue

Anyway as David said we need to change isnull to hasvalue field for consistency with e.g. Arrow, Pandas 2 or PostgreSQL. I guess we could return a generator, like isnull(x::NullableArray) = (!x.hasvalue[i] for i in eachindex(x.hasvalue)). That would also work for CategoricalArrays. But it could be surprising for users...

@nalimilan
Copy link
Member

I still think moving to hasvalue is a good idea in general, but given the expected changes in the representation of Nullable in Base (as a Union), and given that changing that field won't really improve the user experience, I'd rather wait until the future of NullableArrays is clearer. We don't really need more disruptive changes at the moment.

* mapreduce() returns Nullable{T}(mr_empty) instead of just mr_empty
* added tests for [map]reduce() over non-empty collection that
  contain no non-null
should have the same effect as @propagate_inbounds within @inbounds
blocks
since _mapreduce_skipnull() counts the number of nulls anyway, it
doesn't need the `missingdata` parameter
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants