-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Clean up operators #351
Conversation
Is this generally good to merge? I'd like to get something (even if it's a draft) merged soon, so that I can finish the work I've started splitting DataArrays out of DataFrames. |
May need additional tests
It passes the current tests, but it might be worth adding some tests to |
That would be great. I'll give it a review now. Sorry for taking so long to look at this. Please ping me in the future if you think I'm neglecting something. Or just merge it if you're happy with it. |
I can move the code in PR #354 to the new repo when it's made. |
That would be great. And the same comments I made are relevant there: if I'm behind with a PR, please ping me. I've gotten to have more to do now than I can easily manage, so any reminders are really helpful. |
function similar{T}(d::DataArray{T}, dims::Dims) | ||
DataArray(Array(T, dims), trues(dims)) | ||
end | ||
similar(d::DataArray, T, dims::Dims) = DataArray(Array(T, dims), trues(dims)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we revert this change? I've edited this on my own and now feel like similar
should not initialize the na
bit mask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change this not to initialize the na
mask, but I still think we only need a single similar
function with all the arguments. The one- and two-argument versions of similar
in Base will just call this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. I'll make that change on my end.
Okay, I've added some additional tests and fixed two bugs they picked up, one preexisting and one new. I think this should be good to merge. |
Ok. Let's merge this. Then I'll start the split into DataFrames and DataArrays. After that, we can review this stuff again. |
RFC: Clean up operators
There are two main goals here: to improve performance by allowing type inference to happen for most of these operations, and to reduce the amount of repetitive code. See #327 for more background.
Some notes:
dataframe_blocks.jl
. My main grievance is that they make it hard to tell what's being defined where without jumping around in the file. A secondary issue is that the operator categories that make sense inoperators.jl
don't necessarily make sense elsewhere. For example,./
needs to be defined separately inoperators.jl
for type reasons, so it's not inarray_arithmetic_operators
, and at the moment this also means it's not handled indataframe_blocks.jl
.isna
anddata
methods that take indices? Theisna
method would return aBool
, and thedata
method would returndv[i]
ifdv[i] != NA
and could return anything otherwise. This would permit efficient type inference without accessing fields directly, which would let me remove the special cases for DataArrays and speed up other AbstractDataArrays.col*
androw*
, but these could use some more work. The API should probably change to be more Julian (Make API more Julian #159) and there are other considerations as well (see Clean up basic functions like mean and std #325 and Implementna_rm
for math functions? #259).all
. Whether we returnNA
orfalse
depends on the order of the vector, i.e.,all([false, NA]) == false
whereasall([NA, false]) == NA
. I haven't changed this, since I wanted the existing tests to pass, but is this really what we want?rle
methods, since I'm not sure they're sufficiently commonly used to be worth optimizing.