Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@_ within the selector Fun part of DataFrames.select #16

Closed
ianqsong opened this issue Aug 3, 2020 · 6 comments
Closed

@_ within the selector Fun part of DataFrames.select #16

ianqsong opened this issue Aug 3, 2020 · 6 comments

Comments

@ianqsong
Copy link

ianqsong commented Aug 3, 2020

It seems currently @_ not working in the following:

using DataFrames
df = DataFrame(a=1:3, b=4:6, c=7:9);

df1 = select(df, Not(:b) => ByRow(@_(_1 + _2)))

MethodError: no method matching +(::var"#3#5", ::var"#4#6")
Closest candidates are:
  +(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:529
Stacktrace:
 [1] top-level scope at In[18]:1

df2 = select(df, :a => @_(_ .^2))

MethodError: no method matching ^(::var"#20#21", ::Int64)
Closest candidates are:
  ^(!Matched::Float16, ::Integer) at math.jl:885
  ^(!Matched::Regex, ::Integer) at regex.jl:712
  ^(!Matched::Missing, ::Integer) at missing.jl:155
  ...
Stacktrace:
 [1] macro expansion at ./none:0 [inlined]
 [2] literal_pow at ./none:0 [inlined]
 [3] _broadcast_getindex_evalf at ./broadcast.jl:631 [inlined]
 [4] _broadcast_getindex at ./broadcast.jl:604 [inlined]
 [5] getindex at ./broadcast.jl:564 [inlined]
 [6] copy at ./broadcast.jl:830 [inlined]
 [7] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(Base.literal_pow),Tuple{Base.RefValue{typeof(^)},Base.RefValue{var"#20#21"},Base.RefValue{Val{2}}}}) at ./broadcast.jl:820
 [8] top-level scope at In[23]:1
@mcabbott
Copy link
Collaborator

mcabbott commented Aug 3, 2020

@_ needs to be outside the function which accepts a function as an argument. Thus your first example should be:

@_ ByRow(_1 + _2)  # equiv. ByRow(+), or ByRow((_1,_2) -> _1 + _2)

The second one is harder, I think what you want is select(df, @_ Pair(:a, _ .^2)). It will work with the infix form => too, on version 2.0 of this package, but that is (probably) going to change, and will not work on master.

@ianqsong ianqsong closed this as completed Aug 3, 2020
@c42f
Copy link
Owner

c42f commented Aug 8, 2020

Exactly, thanks Michael.

The new DataFrames API conventions use => heavily in a way which is very specialized to DataFrames. This makes it a bit difficult to use a general purpose package like Underscores nicely with the special purpose syntax in DataFrames.

In the "True spirit of Underscores.jl", it's select which is the higher order function accepting other functions, so in principle we'd like it if placing the @_ outside of select worked (just like normal higher order functions):

df1 = @_ select(df, Not(:b) => ByRow(_1 + _2))
df2 = @_ select(df, :a =>_ .^2)

But unfortunately this doesn't seem possible(?) because select doesn't actually accept plain functions; rather it accepts functions which are annotated with symbols (Not(:b), :a), or further decorated with wrappers like ByRow.

I do wonder whether there's any way to resolve this, or whether DataFrames syntax is just too specialized for this to work in a general way.

@mcabbott
Copy link
Collaborator

I suppose it wouldn't be impossible to special-case => for this purpose, on grounds that it's hard to imagine someone wanting the current behaviour:

@_ fun(x, :a =>_ .^2) # fun(x, ξ -> (:a => ξ.^2))

But it is one more rule to know, which makes @_ a bit more opaque. And it may introduce other weird edge cases that I haven't thought of.

xref #12 for thinking about the rules.

@c42f
Copy link
Owner

c42f commented Aug 27, 2020

I suppose it wouldn't be impossible to special-case => for this purpose

Certainly, but I feel like there's cases where having a closure which returns a pair might be desired. I don't have a very natural example, but I feel like it would be weird if the following didn't work:

julia> @_ map(_=>_^2+1, 1:3) |> Dict
Dict{Int64,Int64} with 3 entries:
  2 => 5
  3 => 10
  1 => 2

This isn't exactly the syntax above, and I suppose we could go to extra lengths to pattern match things which "happen to look like" DataFrames usage. But overall I feel like that will lead to trouble :)

@mcabbott
Copy link
Collaborator

Ah that's a good example, someone will do that for sure.

Maybe the place for a macro which understands the special syntax used with DataFrames, is DataFrames. It's possible that a @D_ with a few extra rules could be built on top of this package.

@c42f
Copy link
Owner

c42f commented Sep 3, 2020

Maybe the place for a macro which understands the special syntax used with DataFrames, is DataFrames. It's possible that a @D_ with a few extra rules could be built on top of this package.

I agree that's for the best.

I'd be quite happy to generalize the functions which do the Expr manipulation a little, as required by DataFrames, so that DataFrames can reuse the work we've put into this package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants