Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another attempt at an astable flag #298

Merged
merged 29 commits into from
Sep 24, 2021
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
a8701c8
initial attempt
pdeffebach Sep 14, 2021
9b997a6
finally working
pdeffebach Sep 15, 2021
d639560
start adding tests
pdeffebach Sep 15, 2021
b77e8ca
more tests
pdeffebach Sep 16, 2021
3cdf0d5
more tests
pdeffebach Sep 16, 2021
b878fbb
add docstring
pdeffebach Sep 16, 2021
2344a2e
tests pass
pdeffebach Sep 16, 2021
6557def
add ByRow in docstring
pdeffebach Sep 16, 2021
6002def
add type annotation
pdeffebach Sep 21, 2021
08a1c4b
better docs
pdeffebach Sep 21, 2021
581b2cf
more docs fixes
pdeffebach Sep 21, 2021
7cc8947
update index.md
pdeffebach Sep 21, 2021
0eca67d
Apply suggestions from code review
pdeffebach Sep 21, 2021
a4ab9a6
Merge branch 'astable_2' of https://github.com/pdeffebach/DataFramesM…
pdeffebach Sep 21, 2021
ab9bae4
clean named tuple creation
pdeffebach Sep 22, 2021
495f08a
add example with string
pdeffebach Sep 22, 2021
01cb5e7
grouping tests
pdeffebach Sep 22, 2021
01fb3b7
Update src/macros.jl
pdeffebach Sep 22, 2021
915191c
changes
pdeffebach Sep 23, 2021
a331fc2
Merge branch 'astable_2' of https://github.com/pdeffebach/DataFramesM…
pdeffebach Sep 23, 2021
2ce4d9e
fix some errors
pdeffebach Sep 23, 2021
57b4051
add macro check
pdeffebach Sep 23, 2021
da7674d
add errors for bad flag combo
pdeffebach Sep 23, 2021
285e3ac
better grouping tests
pdeffebach Sep 23, 2021
713eaf0
Update src/parsing_astable.jl
pdeffebach Sep 23, 2021
4e01c4a
add snipper to transform, select, combine, by
pdeffebach Sep 23, 2021
09c692a
add mutating tests
pdeffebach Sep 23, 2021
ae26da8
get rid of debugging printin
pdeffebach Sep 24, 2021
a7fd1a2
Apply suggestions from code review
pdeffebach Sep 24, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 30 additions & 2 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ In addition, DataFramesMeta provides
convenient syntax.
* `@byrow` for applying functions to each row of a data frame (only supported inside other macros).
* `@passmissing` for propagating missing values inside row-wise DataFramesMeta.jl transformations.
* `@astable` to create multiple columns within a single transformation.
* `@chain`, from [Chain.jl](https://github.com/jkrumbiegel/Chain.jl) for piping the above macros together, similar to [magrittr](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html)'s
`%>%` in R.

Expand Down Expand Up @@ -396,11 +397,38 @@ julia> @rtransform df @passmissing x = parse(Int, :x_str)
3 │ missing missing
```

## Creating multiple columns at once with `@astable`

Often new variables may depend on the same intermediate calculations. `@astable` makes it easy to create multiple
new variables in the same operation, yet have them share
information.

In a single block, all assignments of the form `:y = f(:x)`
or `$y = f(:x)` at the top-level generate new columns. In the 2nd example, `y`
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
must be a string or `Symbol`.

```
julia> df = DataFrame(a = [1, 2, 3], b = [400, 500, 600]);

julia> @transform df @astable begin
ex = extrema(:b)
:b_first = :b .- first(ex)
:b_last = :b .- last(ex)
end
3×4 DataFrame
Row │ a b b_first b_last
│ Int64 Int64 Int64 Int64
─────┼───────────────────────────────
1 │ 1 400 0 -200
2 │ 2 500 100 -100
3 │ 3 600 200 0
```


## [Working with column names programmatically with `$`](@id dollar)

DataFramesMeta provides the special syntax `$` for referring to
columns in a data frame via a `Symbol`, string, or column position as either
a literal or a variable.
columns in a data frame via a `Symbol`, string, or column position as either a literal or a variable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are at it given our recent discussion on Discourse, I think it is essential to mention when the $ reference is resolved.
Also maybe add an example when macros are used within a function? I think these are cases not trivial. This can be another PR of course

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this as another PR. In summary, you can't use other macros which use $. I will try and sort out if I can carve out an exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear why I stress it so much. With DataFrames.jl my answer to users is: if you learn Julia Base then you will know exactly how DataFrames.jl works. With DataFramesMeta.jl unfortunately this is not the case as it is a DSL so we need to be very precise how things work in documentation.


```julia
df = DataFrame(A = 1:3, B = [2, 1, 2])
Expand Down
69 changes: 54 additions & 15 deletions src/macros.jl
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ end


"""
passmissing(args...)
@passmissing(args...)

Propograte missing values inside DataFramesMeta.jl macros.
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved

Expand Down Expand Up @@ -351,13 +351,14 @@ macro passmissing(args...)
end

"""
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
astable(args...)
@astable(args...)

Return a `NamedTuple` from a transformation inside DataFramesMeta.jl macros.
Return a `NamedTuple` from a single transformation inside the DataFramesMeta.jl
macros, `@select`, `@transform`, and their mutating and row-wise equivalents.

`@astable` acts on a single block. It works through all top-level expressions
and collects all such expressions of the form `:y = x`, i.e. assignments to a
`Symbol`, which is a syntax error outside of the macro. At the end of the
and collects all such expressions of the form `:y = ...`, i.e. assignments to a
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
`Symbol`, which is a syntax error outside of DataFramesMeta.jl macros. At the end of the
expression, all assignments are collected into a `NamedTuple` to be used
with the `AsTable` destination in the DataFrames.jl transformation
mini-language.
Expand All @@ -374,7 +375,7 @@ df = DataFrame(a = 1)
end
```

becomes the pair
become the pair

```
function f(a)
Expand All @@ -388,8 +389,24 @@ end
transform(df, [:a] => ByRow(f) => AsTable)
```

`@astable` is useful when performing intermediate calculations
yet store their results in new columns. For example, the following fails.
`@astable` has two major advantages at the cost of increasing complexity.
First, `@astable` makes it easy to create multiple columns from a single
transformation, which share a scope. For example, `@astable` allows
for the following (where `:x` and `:x_2` exist in the data frame already).

```
@transform df @astable begin
m = mean(:x)
:x_demeaned = :x .- m
:x2_demeaned = :x2 .- m
end
```

The creation of `:x_demeaned` and `:x2_demeaned` both share the variable `m`,
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
which does not need to be calculated twice.

Second, `@astable` is useful when performing intermediate calculations
and storing their results in new columns. For example, the following fails.

```
@rtransform df begin
Expand All @@ -406,6 +423,13 @@ transformations. `@astable` solves this problem
:new_col_2 = :new_col_1 + :z
end

Column assignment in `@astable` follows the same rules as
column assignment more generally. Construct a new column
from a string by escaping it with `$DOLLAR`, which can be a
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
`Symbol` or an `AbstractString`. References to existing
columns may be a `Symbol`, `AbstractString`, or an
integer.

### Examples

```
Expand All @@ -427,15 +451,30 @@ julia> d = @rtransform df @astable begin
julia> df = DataFrame(a = [1, 1, 2, 2], b = [5, 6, 70, 80]);

julia> @by df :a @astable begin
$(DOLLAR)"Mean of b" = mean(:b)
$(DOLLAR)"Standard deviation of b" = std(:b)
ex = extrema(:b)
:min_b = first(ex)
:max_b = last(ex)
end
2×3 DataFrame
Row │ a Mean of b Standard deviation of b
│ Int64 Float64 Float64
─────┼───────────────────────────────────────────
1 │ 1 5.5 0.707107
2 │ 2 75.0 7.07107
Row │ a min_b max_b
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 5 6
2 │ 2 70 80

julia> @rtransform df @astable begin
f_a = first(:a)
$(DOLLAR)new_col = :a + :b + f_a
bkamins marked this conversation as resolved.
Show resolved Hide resolved
:y = :a * :b
end
4×4 DataFrame
Row │ a b New Column y
│ Int64 Int64 Int64 Int64
─────┼─────────────────────────────────
1 │ 1 5 7 5
2 │ 1 6 8 6
3 │ 2 70 74 140
4 │ 2 80 84 160
```

"""
Expand Down
2 changes: 1 addition & 1 deletion src/parsing.jl
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ function fun_to_vec(ex::Expr;
return :($src => $fun => AsTable)
end

if no_dest # subet and with
if no_dest # subset and with
src, fun = get_source_fun(ex, exprflags = final_flags)
return quote
$src => $fun
Expand Down
31 changes: 22 additions & 9 deletions src/parsing_astable.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
function conditionally_add_symbols!(inputs_to_function, lhs_assignments, col)
function conditionally_add_symbols!(inputs_to_function::AbstractDict,
lhs_assignments::OrderedCollections.OrderedDict, col)
# if it's already been assigned at top-level,
# don't add it to the inputs
if haskey(lhs_assignments, col)
Expand All @@ -8,11 +9,14 @@ function conditionally_add_symbols!(inputs_to_function, lhs_assignments, col)
end
end

replace_syms_astable!(inputs_to_function, lhs_assignments, x) = x
replace_syms_astable!(inputs_to_function, lhs_assignments, q::QuoteNode) =
replace_syms_astable!(inputs_to_function::AbstractDict,
lhs_assignments::OrderedCollections.OrderedDict, x) = x
replace_syms_astable!(inputs_to_function::AbstractDict,
lhs_assignments::OrderedCollections.OrderedDict, q::QuoteNode) =
conditionally_add_symbols!(inputs_to_function, lhs_assignments, q)

function replace_syms_astable!(inputs_to_function, lhs_assignments, e::Expr)
function replace_syms_astable!(inputs_to_function::AbstractDict,
lhs_assignments::OrderedCollections.OrderedDict, e::Expr)
if onearg(e, :^)
return e.args[2]
end
Expand All @@ -27,11 +31,14 @@ function replace_syms_astable!(inputs_to_function, lhs_assignments, e::Expr)
end
end

protect_replace_syms_astable!(inputs_to_function, lhs_assignments, e) = e
protect_replace_syms_astable!(inputs_to_function, lhs_assignments, e::Expr) =
protect_replace_syms_astable!(inputs_to_function::AbstractDict,
lhs_assignments::OrderedCollections.OrderedDict, e) = e
protect_replace_syms_astable!(inputs_to_function::AbstractDict,
lhs_assignments::OrderedCollections.OrderedDict, e::Expr) =
replace_syms!(inputs_to_function, lhs_assignments, e)

function replace_dotted_astable!(inputs_to_function, lhs_assignments, e)
function replace_dotted_astable!(inputs_to_function::AbstractDict,
lhs_assignments::OrderedCollections.OrderedDict, e)
x_new = replace_syms_astable!(inputs_to_function, lhs_assignments, e.args[1])
y_new = protect_replace_syms_astable!(inputs_to_function, lhs_assignments, e.args[2])
Expr(:., x_new, y_new)
Expand All @@ -43,9 +50,15 @@ function is_column_assigment(ex::Expr)
end

# Taken from MacroTools.jl
# No docstring so assumed untable
# No docstring so assumed unstable
block(ex) = isexpr(ex, :block) ? ex : :($ex;)

sym_or_str_to_sym(x::Union{AbstractString, Symbol}) = Symbol(x)
function sym_or_str_to_sym(x)
e = "New columns created inside @astable must be Symbols or AbstractStrings"
throw(ArgumentError(e))
end
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved

function get_source_fun_astable(ex; exprflags = deepcopy(DEFAULT_FLAGS))
inputs_to_function = Dict{Any, Symbol}()
lhs_assignments = OrderedCollections.OrderedDict{Any, Symbol}()
Expand Down Expand Up @@ -73,7 +86,7 @@ function get_source_fun_astable(ex; exprflags = deepcopy(DEFAULT_FLAGS))
source = :(DataFramesMeta.make_source_concrete($(Expr(:vect, keys(inputs_to_function)...))))

inputargs = Expr(:tuple, values(inputs_to_function)...)
nt_iterator = (:(Symbol($k) => $v) for (k, v) in lhs_assignments)
nt_iterator = (:(DataFramesMeta.sym_or_str_to_sym($k) => $v) for (k, v) in lhs_assignments)
nt_expr = Expr(:tuple, Expr(:parameters, nt_iterator...))
body = Expr(:block, Expr(:block, exprs...), nt_expr)

Expand Down
45 changes: 45 additions & 0 deletions test/astable_flag.jl
Original file line number Diff line number Diff line change
Expand Up @@ -120,5 +120,50 @@ end
@test d == DataFrame(x = 1, z = 6)
end

@testset "grouping astable flag" begin
df = DataFrame(a = [1, 1, 2, 2], b = [5, 6, 7, 8])

gd = groupby(df, :a)

d = @combine gd @astable begin
ex = extrema(:b)
:b_min = ex[1]
:b_max = ex[2]
end

@test sort(d.b_min) == [5, 7]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of test is quite fragile. Better sort the whole data frame and compare it to the reference to make sure that groups and values match.


d = @combine gd @astable begin
ex = extrema(:b)
$"b_min" = ex[1]
$"b_max" = ex[2]
end

@test sort(d.b_min) == [5, 7]

d = @by df :a @astable begin
ex = extrema(:b)
:b_min = ex[1]
:b_max = ex[2]
end

@test sort(d.b_min) == [5, 7]

d = @by df :a @astable begin
ex = extrema(:b)
$"b_min" = ex[1]
$"b_max" = ex[2]
end

@test sort(d.b_min) == [5, 7]
end



@testset "bad assignments" begin
@eval df = DataFrame(y = 1)
@test_throws ArgumentError @eval @transform df @astable cols(1) = :y
@test_throws ArgumentError @eval @transform df @astable cols(AsTable) = :y
end

end # module