Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plotting DataArrays #262

Closed
Alexander-Barth opened this issue Dec 13, 2016 · 10 comments
Closed

Plotting DataArrays #262

Alexander-Barth opened this issue Dec 13, 2016 · 10 comments

Comments

@Alexander-Barth
Copy link

It would be great if PyPlot would also support Julia's DataArray and map them internally to numpy's masked arrays.

For pcolor I use the following to add the functionality myself.

using DataArrays
using PyPlot
using PyCall
@pyimport numpy.ma as ma

pyma(S) =  pycall(ma.array, Any, S.data, mask=S.na)
PyPlot.pcolor(x,y,z::DataArray; kws...) = pcolor(x,y,pyma(z); kws...)
PyPlot.pcolor(z::DataArray; kws...) = pcolor(pyma(z); kws...)

In my domain (geospatial data analysis), it is quite common to have missing data and it would be very useful if this kind of functionality would already be present in PyPlot.

@Alexander-Barth
Copy link
Author

I have seen that other conversion rules (e.g. from range to 1d array) are already present:
https://github.com/JuliaPy/PyPlot.jl/blob/master/src/PyPlot.jl#L544

I could try to make a pull request, developers agree to have DataArrays as an additional dependencies.

@Alexander-Barth
Copy link
Author

Or with NullableArrays

using NullableArrays
using PyPlot
using PyCall
@pyimport numpy.ma as ma

# convert Julia's NullableArrays to Numpy's masked array
na2pyma(S) =  pycall(ma.array, Any, S.values, mask=S.isnull)
PyPlot.pcolor(x,y,z::NullableArray; kws...) = pcolor(x,y,na2pyma(z); kws...)
PyPlot.pcolor(z::NullableArray; kws...) = pcolor(na2pyma(z); kws...)

Test with:

a = randn(10,10); X = NullableArray(a, a .> 1.7)
pcolor(X)

@stevengj
Copy link
Member

stevengj commented Jan 23, 2017

It would be cleaner to simply define PyObject(a::NullableArray) and PyObject(a::DataArray) constructors.

However, it would be nicer to do this if we can define a small package (required by PyCall, DataArrays, and NullableArrays) that defines a common API, something like

module AbstractMissingData
export ismissing, densevalues, missingvalues
ismissing(a) = Val{false}() # default fallback is no missing data
ismissing(a, i::Integer...) = false
densevalues(a) = a
missingvalues(a) = falses(a)
end

Then e.g. NullableArrays would do:

import AbstractMissingData
MissingData.ismissing(a::NullableArray) = Val{true}()
Base.@propagate_inbounds MissingData.ismissing(a::NullableArray, i::Integer...) = a.isnull[i...]
MissingData.densevalues(a::NullableArray) = a.values
MissingData.missingvalues(a::NullableArray) = a.isnull

and similarly for DataArrays. PyCall could then define something like e.g.

using AbstractMissingData
array2py(a::AbstractArray, missing::Val{true}) = pycall(numpyarray, PyObject, densevalues(a), mask=missingvalues(a))
array2py(a::AbstractArray, missing::Val{false}) = .... ordinary array conversion ...
PyObject(a::AbstractArray) = array2py(a, ismissing(a))

and it would work for both NullableArrays and DataArrays without introducing a dependency on those packages.

@stevengj
Copy link
Member

cc @andreasnoack, @nalimilan

@andreasnoack
Copy link

cc: @davidagold

@nalimilan
Copy link

These functions kind of already exist in NullableArrays: they are isnull(A), isnull(A, i...) and dropnull(A). In Julia 0.6, isnull(x::Any) = false too. I guess we could move them to a small package. OTOH, if highest performance isn't required (which might be the case for plotting), you can just index into AbstractArray{<:Nullable}, call isnull(A[i]) or do [x.value for x in A if !isnull(x)]. That only requires Base types, and hopefully it will get optimized in the future.

@stevengj
Copy link
Member

stevengj commented Jan 31, 2017

@nalimilan, it would be good to have a solution that worked for DataArrays too.

The point is you don't need to move the functionality into a small package, you just need to move the interface into a small package, so that all interested packages can use the same interface without depending on one another.

@Alexander-Barth
Copy link
Author

I am new to Julia and I asked initially for DataArrays because it was the first package that I found to represent missing data. However, it seems that the development efforts is now going into NullableArrays.
JuliaStats/DataArrays.jl#177 (comment)

I would already be happy with a solution that works with NullableArrays.

The proposed solution to use a constructor function is indeed way more elegant than what I proposed.
However, it seems that I need to make the conversion explicitly:

using NullableArrays
using PyPlot
using PyCall
@pyimport numpy.ma as ma
PyObject(a::NullableArray) =  pycall(ma.array, Any, a.values, mask=a.isnull)
a = randn(10,10); X = NullableArray(a, a .> 1.7)
# just pcolor(X) does not work
pcolor(PyObject(X)) 

Thank you very much for your insight so far on this issue!

@nalimilan
Copy link

@stevengj Well, yeah, my solution required depending on DataArrays. That idea of a generic interface is certainly worth discussing. We could also include it in AbstractTables to avoid creating yet another small interface package.

@Alexander-Barth
Copy link
Author

Better use Union{T,Missing} instead of DataArrays

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants