-
Notifications
You must be signed in to change notification settings - Fork 11
WIP: Consolidate hcat, append, merge, vcat and split functional overlaps #70
base: master
Are you sure you want to change the base?
Conversation
removes hcat, append now adds columns to end of DataTable, merge concatenates DataTables horizontally, and vcat still concatenates vertically. append! and vcat no longer perform the same function. hcat and merge no longer perform the same function.
@@ -201,7 +201,11 @@ function combine(ga::GroupApplied) | |||
idx[j + (1:n)] = gd.idx[start] | |||
j += n | |||
end | |||
hcat!(gd.parent[idx, gd.cols], valscat) | |||
if isa(valscat, DataTable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I broke this during my changes it helped me understand how combine
and my changes to aggregate
in #65 differ. This hcat!
call was handling both DataTables and Vectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand. valscat = vcat(ga.vals)
and ga.vals
is necessarily a Vector{<:AbstractDataTable}
. So valscat
is always an AbstractDataTable
. Why do you need to call append!
when it's not a DataTable
?
@@ -109,7 +109,7 @@ module TestDataTable | |||
dt = DataTable(a=[1, 2], b=[3., 4.]) | |||
dt2 = DataTable(b=["a", "b"], c=[:c, :d]) | |||
@test isequal(merge!(dt, dt2), dt) | |||
@test isequal(dt, DataTable(a=[1, 2], b=["a", "b"], c=[:c, :d])) | |||
@test isequal(dt, DataTable(a=[1, 2], b=[3., 4.], b_1=["a", "b"], c=[:c, :d])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was another point of confusion. Depending on which method you used to horizontally concatenate, you could either have columns overwrite one another or have column names updated so duplicates are unique. Is everyone ok with defaulting to not overwriting column names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generic merge(!)
methods use the value from the last argument in case of conflict, so I think we should respect this. It could make sense to also support passing a combine
function.
Maybe that's an argument to keep hcat
, with a different behavior. For that function, the situation is very similar to that faced by NamedArray
or AxisArray
: duplicate column names are not allowed, so either an error must be thrown, or names have to be made unique. I would suggest throwing an error by default, with a boolean keyword argument to make names unique when needed; but other solutions can be considered (we could discuss that with the authors of the two named array packages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Could you add docstrings explaining in a few words the DataTable
-specific behavior? I.e. that it operates on columns (AFAICT that's the only precision which is needed beyond the generic docstring from Base). Also, if that's possible a few deprecations wouldn't hurt. Finally, I see you made some methods more generic by accepting any AbstractDataTable
as first argument. This probably needs to throw an error for SubDataTable
(and be tested)?
|
||
# catch-all to cover cases where indexing returns a DataTable and copy doesn't | ||
Base.hcat(dt::AbstractDataTable, x) = hcat!(dt[:, :], x) | ||
Base.hcat(dt1::AbstractDataTable, dt2::AbstractDataTable) = hcat!(dt1[:, :], dt2) | ||
Base.merge(dt::AbstractDataTable, dtn::AbstractDataTable...) = merge!(dt[:, :], dtn...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deepcopy
would be more natural, even if the result is the same?
Base.hcat(dt1::AbstractDataTable, dt2::AbstractDataTable) = hcat!(dt1[:, :], dt2) | ||
Base.merge(dt::AbstractDataTable, dtn::AbstractDataTable...) = merge!(dt[:, :], dtn...) | ||
|
||
function Base.append!(dt1::AbstractDataTable, x::AbstractVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a one-liner here and below.
@@ -39,6 +39,7 @@ export @~, | |||
SubDataTable, | |||
|
|||
aggregate, | |||
append, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I can see the point of implementing generic functions which exist in Base, I don't like the idea of creating append
, with such a general name and so similar to append!
. Especially since merge
does the same thing.
@@ -109,7 +109,7 @@ module TestDataTable | |||
dt = DataTable(a=[1, 2], b=[3., 4.]) | |||
dt2 = DataTable(b=["a", "b"], c=[:c, :d]) | |||
@test isequal(merge!(dt, dt2), dt) | |||
@test isequal(dt, DataTable(a=[1, 2], b=["a", "b"], c=[:c, :d])) | |||
@test isequal(dt, DataTable(a=[1, 2], b=[3., 4.], b_1=["a", "b"], c=[:c, :d])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generic merge(!)
methods use the value from the last argument in case of conflict, so I think we should respect this. It could make sense to also support passing a combine
function.
Maybe that's an argument to keep hcat
, with a different behavior. For that function, the situation is very similar to that faced by NamedArray
or AxisArray
: duplicate column names are not allowed, so either an error must be thrown, or names have to be made unique. I would suggest throwing an error by default, with a boolean keyword argument to make names unique when needed; but other solutions can be considered (we could discuss that with the authors of the two named array packages).
@@ -201,7 +201,11 @@ function combine(ga::GroupApplied) | |||
idx[j + (1:n)] = gd.idx[start] | |||
j += n | |||
end | |||
hcat!(gd.parent[idx, gd.cols], valscat) | |||
if isa(valscat, DataTable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand. valscat = vcat(ga.vals)
and ga.vals
is necessarily a Vector{<:AbstractDataTable}
. So valscat
is always an AbstractDataTable
. Why do you need to call append!
when it's not a DataTable
?
removes hcat and hcat!, append(!) now adds columns to end of DataTable, merge(!)
concatenates DataTables horizontally, and vcat still concatenates
vertically. append! and vcat no longer perform the same function. hcat(!)
and merge no longer perform the same function.
Aims to resolve #52