Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Operations for DataArray that skip NA values #354

Closed
wants to merge 2 commits into from

Conversation

nfoti
Copy link
Contributor

@nfoti nfoti commented Aug 22, 2013

I've implemented a few operations on DataArrays that can skip NA values. This behavior can eventually be tied to a flag to turn it on or off, but for now I just wanted to get something working. I've implemented the approach outlined by @simonster in Issue #325 using the NumericalExtenstions package.

Also, since NumericExtensions doesn't work with BitArrays I've had to convert the na member to an Array{Bool,N}. I realize this isn't very efficient right now, but it's hopefully a short-term solution.

@johnmyleswhite
Copy link
Contributor

Very cool. Is there any way to patch NumericExtensions to handle BitArray's?

@simonster
Copy link
Contributor

I'm sorry for not realizing this earlier, but there's a wrinkle when using this on arrays of non-bits types, since it's possible (likely, even) that values that are NA have #undef in da.data and will throw on access. There's rarely a reason to perform these operations on non-bits types, but they should probably work.

@nfoti
Copy link
Contributor Author

nfoti commented Aug 22, 2013

You're right that for an arbitrary map(NAOrZero(), da.data, da.na) one could run into the problem you mention. But most of the functions I wrote only really make sense for numbers and so should always be a bitstype. I can specialize the version of evaluate to only be defined for Numbers and maybe define a different version for non-bitstypes that won't break on the case you mention.

@nfoti
Copy link
Contributor Author

nfoti commented Aug 22, 2013

Also, I attempted to patch NumericExtensions to work with BitArrays but it turned out out to be more involved than originally thought. I created an issue for this in NumericExtensions (#10).

@simonster
Copy link
Contributor

Union types and abstract types aren't bits types, so a DataVector{Real} or DataVector{Union(Int64, Float64)} could still have #undefs. Generally, people shouldn't use types like this, but these functions still ought to work if they do.

Non-bits arrays aren't going to be fast anyway, so using an simple but slow approach seems fine, but I'm not sure it's possible to define methods that only apply to arrays of bits types using multiple dispatch, since e.g. Real <: Number. You might just have to use isbits.

@nfoti
Copy link
Contributor Author

nfoti commented Aug 22, 2013

Using isbits to switch between a fast implementation for bitstypes and an
implementation that checks for #undef for non-bitstypes is probably fine.
Given the amount of work these functions need to do the if overhead is
probably negligible.

It's too bad there's no way to use multiple dispatch to do this though.

After thinking about how I'd implement it though, what should the behavior
be if an #undef is encountered? It seems like an error actually is what
you'd want as there's no obvious default. For the Real case above 0 is
probably a decent choice, but what about a user-defined type that has an
#undef?

On Thu, Aug 22, 2013 at 11:43 AM, Simon Kornblith
notifications@github.comwrote:

Union types and abstract types aren't bits types, so a DataVector{Real}or DataVector{Union(Int64,
Float64)} could still have #undefs. Generally, people shouldn't use types
like this, but these functions still ought to work if they do.

Non-bits arrays aren't going to be fast anyway, so using an simple but
slow approach seems fine, but I'm not sure it's possible to define methods
that only apply to arrays of bits types using multiple dispatch, since e.g. Real
<: Number. You might just have to use isbits.


Reply to this email directly or view it on GitHubhttps://github.com//pull/354#issuecomment-23115130
.

@simonster
Copy link
Contributor

Fixed by JuliaStats/DataArrays.jl#101

@simonster simonster closed this Jul 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants