-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up basic functions like mean and std #325
Comments
Would it be possible to change the API to these things by making it |
Yeah, I'd be happy with something like that. So I take it the compiler is able to specialize on keyword arguments with distinct types? What really bothers me with these functions is that we have to duplicate so much functionality from Base. I've started trying to exploit more of the |
No, it's actually not. That's why the way the sorting API works is by translating keyword calls into a non-keyword call which can get specialized on the argument types. For example, |
I should clarify that |
I'm torn between saying that such duplication is unavoidable, and considering providing some mechanism for plugging in custom behaviors for certain collection/element type combinations. Hard to design such a generic mechanism but probably not impossible. |
For now, I think we should go through and accept the duplication. As we figure things out, we may be able to design a more generic system. |
Is there any progress on this? I would like to use DataFrames for some upcoming work and I would definitely like to have the functionality to operate along dimensions of |
I was holding off while @simonster was working on operators. Anything that gets this moving forward would be welcome in my book. I suspect just nailing the interface for |
A straight-forward way to implement some of this is to define operators that skip Alternatively, it may be better to write cache-friendly versions of these functions for |
Sorry for the delay on operators. I'll try to finish things up this weekend. I don't think it's possible to get |
Yes, the I don't understand how you were thinking of using |
Re: type NAOrZero <: BinaryFunctor; end
evaluate(::NAOrZero, x, y::Bool) = y ? x : zero(typeof(x))
sum(darray::DataArray, dims) = mapreduce(NAOrZero(), Add(), darray.data, darray.na, dims) I don't think this quite works because the NumericExtensions version of |
That makes a lot of sense. When you're done updating |
Fixed by JuliaStats/DataArrays.jl#101 |
Our current implementation of
mean
,std
, etc. is kind of a mess. The biggest issue is that thefailNA
,replaceNA
,removeNA
trichotomy we have only works well forDataVector
. I'm up for keeping these functions around for use withDataVector
objects, but they don't make any sense for higher-order tensors.In addition, our functions don't work properly on higher-order tensors because you can't pass in a set of dimensions for splicing. This means that we have things like
colsums(x)
, whereas the Julian thing to do would besum(x, 2)
.I'm going to go through and start removing some of this functionality through deprecations.
In addition, I'm going to start writing custom
NA
handling keyword arguments for each function. This seems like more work, but it's the only sane thing to do: a call tosvd
for aDataMatrix
needs to offer an option likeimpute = false
, because the solution to having anyNA
entries isn't described in terms offailNA
,replaceNA
orremoveNA
. Sadly, I think this is the rule, not the exception:NA
handling has to be handled in a special way for each function.The text was updated successfully, but these errors were encountered: