Clean up colwise #485

johnmyleswhite · 2014-01-21T02:15:32Z

Instead of offering functions colmean, colstd, etc., I'd like to lean more heavily on colwise. Right now its behavior surprises, although I think I understand the logic.

using DataFrames
    df = DataFrame(A = 1:4, B = randn(4))
    colwise(cumsum, df) # -> Evaluates to Array{Any}

I assume we do this with the assumption that the return values for a function may differ in length across columns, which means that we can't do better than return a generic Array{Any}. That might be the right approach, but it's worth making sure that we prefer this very general strategy over something that would produce a more easily interpreted DataFrame.

The text was updated successfully, but these errors were encountered:

nalimilan · 2014-01-21T12:47:07Z

For most uses the length of the result will be the same for all columns, so indeed it would be nice to get a more specific type. I think it would be natural to return the same type as what a list comprehension syntax would return.
colwise() could even be defined as a comprehension, i.e. colwise(mean, df) = [mean(df[i]) for i in 1:length(df)].

Now, currently there's issue JuliaLang/julia#5258, but in the long term this would not return an Array{Any}.

mkborregaard · 2016-09-12T12:41:28Z

Revisiting this 2 years later, colwise still returns an array of Any, where each element is a Vector, even when the result of each operation is a scalar. Example:

df =DataFrame(a = repeat([1,2,3,4], outer =[2]), b =repeat([2,3], outer =[4]), c =randn(8))
cs = colwise(sum, df)

gives

3-element Array{Any,1}:
 [20]    
 [20]    
 [5.0978]

I feel this is not convenient for further work on the column results, e.g. I may want to make a histogram of the column sums. I then have to

using Plots; histogram(vcat(cs...))

cjprybol · 2017-03-10T20:51:01Z

@mkborregaard this has been addressed in DataTables by JuliaData/DataTables.jl#28

mkborregaard · 2017-03-10T22:26:26Z

Great, thanks for the heads up! Must be nice to be able to close a three year old issue :-)

johnmyleswhite mentioned this issue Jan 22, 2014

Stop treating DataFrames like matrices #484

Merged

nalimilan added this to the 0.9.0 milestone Dec 2, 2016

This was referenced Mar 7, 2017

Enhance joining and grouping JuliaData/DataTables.jl#17

Merged

update colwise JuliaData/DataTables.jl#28

Merged

cjprybol mentioned this issue Aug 18, 2017

WIP: DataTables.jl Backport #1214

Closed

4 tasks

quinnj closed this as completed Sep 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up colwise #485

Clean up colwise #485

johnmyleswhite commented Jan 21, 2014

nalimilan commented Jan 21, 2014

mkborregaard commented Sep 12, 2016 •

edited

Loading

cjprybol commented Mar 10, 2017

mkborregaard commented Mar 10, 2017

Clean up colwise #485

Clean up colwise #485

Comments

johnmyleswhite commented Jan 21, 2014

nalimilan commented Jan 21, 2014

mkborregaard commented Sep 12, 2016 • edited Loading

cjprybol commented Mar 10, 2017

mkborregaard commented Mar 10, 2017

mkborregaard commented Sep 12, 2016 •

edited

Loading