Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add across etc. for easier working with multiple columns #296

Open
pdeffebach opened this issue Sep 9, 2021 · 7 comments
Open

Add across etc. for easier working with multiple columns #296

pdeffebach opened this issue Sep 9, 2021 · 7 comments

Comments

@pdeffebach
Copy link
Collaborator

As referenced on discourse, lots of users have trouble with complicated source => fun => dest pairs. Maybe something like R's across might be useful.

It could live here. It could also live in a 3rd package that we re-export, like Chain.jl.

@bkamins
Copy link
Member

bkamins commented Sep 9, 2021

The question is if the problem is that we should add across or instead of using broadcasting (which is nice, clean, but maybe too magical) start promoting using a comprehension instead?

The only challenge is what while Not(:x) .=> fun will soon work [n => fun for n in Not(:x)] is impossible to be made working, so maybe indeed @across within DataFramesMeta.jl that would be data-frame context sensitive (rewriting Not(:x) etc. as appropriate) would make sense?

@nalimilan
Copy link
Member

As I said at JuliaData/DataFrames.jl#2870 I'd rather wait a bit after Between, All, Cols and Not support .=> before deciding whether adding across would really make people's life easier. I'm concerned that people find across natural simply because they are used to it.

@pdeffebach
Copy link
Collaborator Author

I agree. I think we can make a lot of progress helping people ease into src => fun => dest. If that approach remains too hard after tutorials / community knowledge etc, then we can consider across.

@xiaodaigh
Copy link
Contributor

using DataFrames, DataFrameMacros


df = DataFrame(a = 1:3)

function return2(a)
  (a = a/2, b = a/3)
end
transform(df, :a =>  return2 => [:c, :d])

The above return2 function returns 2 columns which I'd like to assign to two new columns. I don't it's possible currently with this package.

Wish we can do something there.

@pdeffebach
Copy link
Collaborator Author

This is definitely possible!

julia> df = DataFrame(a = 1:3);

julia> function return2(a)
         (a = a/2, b = a/3)
       end;

julia> @rtransform df $[:c, :d] = return2(:a)
3×3 DataFrame
 Row │ a      c        d        
     │ Int64  Float64  Float64  
─────┼──────────────────────────
   1 │     1      0.5  0.333333
   2 │     2      1.0  0.666667
   3 │     3      1.5  1.0

@pdeffebach
Copy link
Collaborator Author

What isn't possible is applying the same transformation to many columns

You can get close with AsTable, but I don't know how you would add a suffix.

julia> df = DataFrame(rand(5, 3), :auto)
5×3 DataFrame
 Row │ x1        x2        x3       
     │ Float64   Float64   Float64  
─────┼──────────────────────────────
   1 │ 0.57637   0.682894  0.738253
   2 │ 0.678677  0.273173  0.906671
   3 │ 0.848709  0.970005  0.289846
   4 │ 0.6376    0.154395  0.271188
   5 │ 0.223121  0.73541   0.576734

julia> transform(df, All() .=> ByRow(t -> t + 1)) 
5×6 DataFrame
 Row │ x1        x2        x3        x1_function  x2_function  x3_functio ⋯
     │ Float64   Float64   Float64   Float64      Float64      Float64    ⋯
─────┼─────────────────────────────────────────────────────────────────────
   1 │ 0.57637   0.682894  0.738253      1.57637      1.68289      1.7382 ⋯
   2 │ 0.678677  0.273173  0.906671      1.67868      1.27317      1.9066
   3 │ 0.848709  0.970005  0.289846      1.84871      1.97001      1.2898
   4 │ 0.6376    0.154395  0.271188      1.6376       1.15439      1.2711
   5 │ 0.223121  0.73541   0.576734      1.22312      1.73541      1.5767 ⋯
                                                           1 column omitted

julia> @rtransform df $AsTable = map(t -> t + 1, AsTable(All()))
5×3 DataFrame
 Row │ x1       x2       x3      
     │ Float64  Float64  Float64 
─────┼───────────────────────────
   1 │ 1.57637  1.68289  1.73825
   2 │ 1.67868  1.27317  1.90667
   3 │ 1.84871  1.97001  1.28985
   4 │ 1.6376   1.15439  1.27119
   5 │ 1.22312  1.73541  1.57673

You couldn't also do combinations of columns, say, [:x1, :x2], and [:x1, :x3].

@xiaodaigh
Copy link
Contributor

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants