-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Another attempt at an astable flag #298
Changes from 14 commits
a8701c8
9b997a6
d639560
b77e8ca
3cdf0d5
b878fbb
2344a2e
6557def
6002def
08a1c4b
581b2cf
7cc8947
0eca67d
a4ab9a6
ab9bae4
495f08a
01cb5e7
01fb3b7
915191c
a331fc2
2ce4d9e
57b4051
da7674d
285e3ac
713eaf0
4e01c4a
09c692a
ae26da8
a7fd1a2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,6 +22,7 @@ In addition, DataFramesMeta provides | |
convenient syntax. | ||
* `@byrow` for applying functions to each row of a data frame (only supported inside other macros). | ||
* `@passmissing` for propagating missing values inside row-wise DataFramesMeta.jl transformations. | ||
* `@astable` to create multiple columns within a single transformation. | ||
* `@chain`, from [Chain.jl](https://github.com/jkrumbiegel/Chain.jl) for piping the above macros together, similar to [magrittr](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html)'s | ||
`%>%` in R. | ||
|
||
|
@@ -396,11 +397,37 @@ julia> @rtransform df @passmissing x = parse(Int, :x_str) | |
3 │ missing missing | ||
``` | ||
|
||
## Creating multiple columns at once with `@astable` | ||
|
||
Often new variables may depend on the same intermediate calculations. `@astable` makes it easy to create multiple | ||
new variables in the same operation, yet have them share | ||
information. | ||
|
||
In a single block, all assignments of the form `:y = f(:x)` | ||
or `$y = f(:x)` at the top-level are generate new columns. | ||
|
||
``` | ||
julia> df = DataFrame(a = [1, 2, 3], b = [400, 500, 600]); | ||
|
||
julia> @transform df @astable begin | ||
ex = extrema(:b) | ||
:b_first = :b .- first(ex) | ||
:b_last = :b .- last(ex) | ||
end | ||
3×4 DataFrame | ||
Row │ a b b_first b_last | ||
│ Int64 Int64 Int64 Int64 | ||
─────┼─────────────────────────────── | ||
1 │ 1 400 0 -200 | ||
2 │ 2 500 100 -100 | ||
3 │ 3 600 200 0 | ||
``` | ||
|
||
|
||
## [Working with column names programmatically with `$`](@id dollar) | ||
|
||
DataFramesMeta provides the special syntax `$` for referring to | ||
columns in a data frame via a `Symbol`, string, or column position as either | ||
a literal or a variable. | ||
columns in a data frame via a `Symbol`, string, or column position as either a literal or a variable. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While we are at it given our recent discussion on Discourse, I think it is essential to mention when the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will do this as another PR. In summary, you can't use other macros which use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be clear why I stress it so much. With DataFrames.jl my answer to users is: if you learn Julia Base then you will know exactly how DataFrames.jl works. With DataFramesMeta.jl unfortunately this is not the case as it is a DSL so we need to be very precise how things work in documentation. |
||
|
||
```julia | ||
df = DataFrame(A = 1:3, B = [2, 1, 2]) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -350,6 +350,120 @@ macro passmissing(args...) | |
throw(ArgumentError("@passmissing only works inside DataFramesMeta macros.")) | ||
end | ||
|
||
""" | ||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
astable(args...) | ||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Return a `NamedTuple` from a single transformation inside DataFramesMeta.jl macros. | ||
|
||
`@astable` acts on a single block. It works through all top-level expressions | ||
and collects all such expressions of the form `:y = ...`, i.e. assignments to a | ||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`Symbol`, which is a syntax error outside of DataFramesMeta.jl macros. At the end of the | ||
expression, all assignments are collected into a `NamedTuple` to be used | ||
with the `AsTable` destination in the DataFrames.jl transformation | ||
mini-language. | ||
|
||
Concretely, the expressions | ||
|
||
``` | ||
df = DataFrame(a = 1) | ||
|
||
@rtransform df @astable begin | ||
:x = 1 | ||
y = 50 | ||
:z = :x + y + :a | ||
end | ||
``` | ||
|
||
become the pair | ||
|
||
``` | ||
function f(a) | ||
x_t = 1 | ||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
y = 50 | ||
z_t = x_t + y + a | ||
|
||
(; x = x_t, z = z_t) | ||
end | ||
|
||
transform(df, [:a] => ByRow(f) => AsTable) | ||
``` | ||
|
||
`@astable` has two major advantages at the cost of increasing complexity. | ||
First, `@astable` makes it easy to create multiple columns from a single | ||
transformation, which share a scope. For example, `@astable` allows | ||
for the following | ||
|
||
``` | ||
@transform df @astable begin | ||
m = mean(:x) | ||
:x_demeaned = :x .- m | ||
:x2_demeaned = :x2 .- m | ||
end | ||
``` | ||
|
||
The creation of `:x_demeaned` and `:x2_demeaned` both share the variable `m`, | ||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
which does not need to be calculated twice. | ||
|
||
Second, `@astable` is useful when performing intermediate calculations | ||
and storing their results in new columns. For example, the following fails. | ||
|
||
``` | ||
@rtransform df begin | ||
:new_col_1 = :x + :y | ||
:new_col_2 = :new_col_1 + :z | ||
end | ||
``` | ||
|
||
This because DataFrames.jl does not guarantee sequential evaluation of | ||
transformations. `@astable` solves this problem | ||
|
||
@rtransform df @astable begin | ||
:new_col_1 = :x + :y | ||
:new_col_2 = :new_col_1 + :z | ||
end | ||
|
||
Column assignment in `@astable` follows the same rules as | ||
column assignment more generally. Construct a new column | ||
from a string by escaping it with `$DOLLAR`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add an example of this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added. |
||
|
||
### Examples | ||
|
||
``` | ||
julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]); | ||
|
||
julia> d = @rtransform df @astable begin | ||
:x = 1 | ||
y = 5 | ||
:z = :x + y | ||
end | ||
3×4 DataFrame | ||
Row │ a b x z | ||
│ Int64 Int64 Int64 Int64 | ||
─────┼──────────────────────────── | ||
1 │ 1 4 1 6 | ||
2 │ 2 5 1 6 | ||
3 │ 3 6 1 6 | ||
|
||
julia> df = DataFrame(a = [1, 1, 2, 2], b = [5, 6, 70, 80]); | ||
|
||
julia> @by df :a @astable begin | ||
ex = extrema(:b) | ||
:min_b = first(ex) | ||
:max_b = last(ex) | ||
end | ||
2×3 DataFrame | ||
Row │ a min_b max_b | ||
│ Int64 Int64 Int64 | ||
─────┼───────────────────── | ||
1 │ 1 5 6 | ||
2 │ 2 70 80 | ||
``` | ||
|
||
""" | ||
macro astable(args...) | ||
throw(ArgumentError("@astable only works inside DataFramesMeta macros.")) | ||
end | ||
|
||
############################################################################## | ||
## | ||
## @with | ||
|
@@ -1546,17 +1660,6 @@ function combine_helper(x, args...; deprecation_warning = false) | |
|
||
exprs, outer_flags = create_args_vector(args...) | ||
|
||
fe = first(exprs) | ||
if length(exprs) == 1 && | ||
get_column_expr(fe) === nothing && | ||
!(fe.head == :(=) || fe.head == :kw) | ||
|
||
@warn "Returning a Table object from @by and @combine now requires `$(DOLLAR)AsTable` on the LHS." | ||
|
||
lhs = Expr(:$, :AsTable) | ||
exprs = ((:($lhs = $fe)),) | ||
end | ||
|
||
t = (fun_to_vec(ex; gensym_names = false, outer_flags = outer_flags) for ex in exprs) | ||
|
||
quote | ||
|
@@ -1666,16 +1769,6 @@ end | |
function by_helper(x, what, args...) | ||
# Only allow one argument when returning a Table object | ||
exprs, outer_flags = create_args_vector(args...) | ||
fe = first(exprs) | ||
if length(exprs) == 1 && | ||
get_column_expr(fe) === nothing && | ||
!(fe.head == :(=) || fe.head == :kw) | ||
|
||
@warn "Returning a Table object from @by and @combine now requires `\$AsTable` on the LHS." | ||
|
||
lhs = Expr(:$, :AsTable) | ||
exprs = ((:($lhs = $fe)),) | ||
end | ||
|
||
t = (fun_to_vec(ex; gensym_names = false, outer_flags = outer_flags) for ex in exprs) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
function conditionally_add_symbols!(inputs_to_function::AbstractDict, | ||
lhs_assignments::OrderedCollections.OrderedDict, col) | ||
# if it's already been assigned at top-level, | ||
# don't add it to the inputs | ||
if haskey(lhs_assignments, col) | ||
return lhs_assignments[col] | ||
else | ||
return addkey!(inputs_to_function, col) | ||
end | ||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
end | ||
|
||
replace_syms_astable!(inputs_to_function::AbstractDict, | ||
lhs_assignments::OrderedCollections.OrderedDict, x) = x | ||
replace_syms_astable!(inputs_to_function::AbstractDict, | ||
lhs_assignments::OrderedCollections.OrderedDict, q::QuoteNode) = | ||
conditionally_add_symbols!(inputs_to_function, lhs_assignments, q) | ||
|
||
function replace_syms_astable!(inputs_to_function::AbstractDict, | ||
lhs_assignments::OrderedCollections.OrderedDict, e::Expr) | ||
if onearg(e, :^) | ||
return e.args[2] | ||
end | ||
|
||
col = get_column_expr(e) | ||
if col !== nothing | ||
return conditionally_add_symbols!(inputs_to_function, lhs_assignments, col) | ||
elseif e.head == :. | ||
return replace_dotted_astable!(inputs_to_function, lhs_assignments, e) | ||
else | ||
return mapexpr(x -> replace_syms_astable!(inputs_to_function, lhs_assignments, x), e) | ||
end | ||
end | ||
|
||
protect_replace_syms_astable!(inputs_to_function::AbstractDict, | ||
lhs_assignments::OrderedCollections.OrderedDict, e) = e | ||
protect_replace_syms_astable!(inputs_to_function::AbstractDict, | ||
lhs_assignments::OrderedCollections.OrderedDict, e::Expr) = | ||
replace_syms!(inputs_to_function, lhs_assignments, e) | ||
|
||
function replace_dotted_astable!(inputs_to_function::AbstractDict, | ||
lhs_assignments::OrderedCollections.OrderedDict, e) | ||
x_new = replace_syms_astable!(inputs_to_function, lhs_assignments, e.args[1]) | ||
y_new = protect_replace_syms_astable!(inputs_to_function, lhs_assignments, e.args[2]) | ||
Expr(:., x_new, y_new) | ||
end | ||
|
||
is_column_assigment(ex) = false | ||
function is_column_assigment(ex::Expr) | ||
ex.head == :(=) && (get_column_expr(ex.args[1]) !== nothing) | ||
end | ||
|
||
# Taken from MacroTools.jl | ||
# No docstring so assumed unstable | ||
block(ex) = isexpr(ex, :block) ? ex : :($ex;) | ||
|
||
function get_source_fun_astable(ex; exprflags = deepcopy(DEFAULT_FLAGS)) | ||
inputs_to_function = Dict{Any, Symbol}() | ||
lhs_assignments = OrderedCollections.OrderedDict{Any, Symbol}() | ||
|
||
# Make sure all top-level assignments are | ||
# in the args vector | ||
ex = block(MacroTools.flatten(ex)) | ||
exprs = map(ex.args) do arg | ||
if is_column_assigment(arg) | ||
lhs = get_column_expr(arg.args[1]) | ||
rhs = arg.args[2] | ||
new_ex = replace_syms_astable!(inputs_to_function, lhs_assignments, arg.args[2]) | ||
if haskey(inputs_to_function, lhs) | ||
new_lhs = inputs_to_function[lhs] | ||
lhs_assignments[lhs] = new_lhs | ||
else | ||
new_lhs = addkey!(lhs_assignments, lhs) | ||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
end | ||
|
||
Expr(:(=), new_lhs, new_ex) | ||
else | ||
replace_syms_astable!(inputs_to_function, lhs_assignments, arg) | ||
end | ||
end | ||
source = :(DataFramesMeta.make_source_concrete($(Expr(:vect, keys(inputs_to_function)...)))) | ||
|
||
inputargs = Expr(:tuple, values(inputs_to_function)...) | ||
nt_iterator = (:(Symbol($k) => $v) for (k, v) in lhs_assignments) | ||
nt_expr = Expr(:tuple, Expr(:parameters, nt_iterator...)) | ||
body = Expr(:block, Expr(:block, exprs...), nt_expr) | ||
|
||
fun = quote | ||
$inputargs -> begin | ||
$body | ||
end | ||
end | ||
|
||
# TODO: Add passmissing support by | ||
# checking if any input arguments missing, | ||
# and if-so, making a named tuple with | ||
# missing values | ||
if exprflags[BYROW_SYM][] | ||
fun = :(ByRow($fun)) | ||
end | ||
|
||
return source, fun | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add what
$y
has to resolve to (I understand it has to beSymbol
, or strings are also accepted?)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Turns out I was allowing unexpected behavior and patched the code.