Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another attempt at an astable flag #298

Merged
merged 29 commits into from
Sep 24, 2021
Merged

Conversation

pdeffebach
Copy link
Collaborator

This attempt should be a more robust strategy than the current @t flag in DataFrameMacros.jl.

I will outline it in more detail shortly.

@pdeffebach
Copy link
Collaborator Author

More progress! Not ready for a review yet as i have not added a complete test set or begun documenting.

@pdeffebach pdeffebach mentioned this pull request Sep 15, 2021
@pdeffebach
Copy link
Collaborator Author

pdeffebach commented Sep 16, 2021

Ready for a review!

This is a big PR. In the process, I have to finally remove some deprecated functionality for grouped data frames. In particular

@by df :a (n1 = :a, n2 = :b)

used to work. In the past release we had a warning that this needed to be

@by df :a $AsTable = (n1 = :a, n2 = :b)

and now we throw an error in the original expression.

Here is the docstring for @astable


astable(args...)

Return a NamedTuple from a transformation inside DataFramesMeta.jl macros.

@astable acts on a single block. It works through all top-level expressions
and collects all such expressions of the form :y = x, i.e. assignments to a
Symbol, which is a syntax error outside of the macro. At the end of the
expression, all assignments are collected into a NamedTuple to be used
with the AsTable destination in the DataFrames.jl transformation
mini-language.

Concretely, the expressions

df = DataFrame(a = 1)

@rtransform df @astable begin
    :x = 1
    y = 50
    :z = :x + y + :a
end

becomes the pair

function f(a)
    x_t = 1
    y = 50
    z_t = x_t + y + a

    (; x = x_t, z = z_t)
end

transform(df, [:a] => f => AsTable)

@astable is useful when performing intermediate calculations
yet store their results in new columns. For example, the following fails.

@rtransform df begin
    :new_col_1 = :x + :y
    :new_col_2 = :new_col_1 + :z
end

This because DataFrames.jl does not guarantee sequential evaluation of
transformations. @astable solves this problem

@rtransform df @astable begin
    :new_col_1 = :x + :y
    :new_col_2 = :new_col_1 + :z
end

Examples

julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);

julia> d = @rtransform df @astable begin
           :x = 1
           y = 5
           :z = :x + y
       end
3×4 DataFrame
 Row │ a      b      x      z
     │ Int64  Int64  Int64  Int64
─────┼────────────────────────────
   1 │     1      4      1      6
   2 │     2      5      1      6
   3 │     3      6      1      6

julia> df = DataFrame(a = [1, 1, 2, 2], b = [5, 6, 70, 80]);

julia> @by df :a @astable begin
           $(DOLLAR)"Mean of b" = mean(:b)
           $(DOLLAR)"Standard deviation of b" = std(:b)
       end
2×3 DataFrame
 Row │ a      Mean of b  Standard deviation of b
     │ Int64  Float64    Float64
─────┼───────────────────────────────────────────
   1 │     1        5.5                 0.707107
   2 │     2       75.0                 7.07107

This implementation is more complicated than that of @t from DataFrameMacros.jl. In DataFrameMacros.jl, the following will fail

df = DataFrame(a = 1)
@transform df @t begin 
    :x = 1
    b + :x 
end

this is because it sends :x to src in src => fun => dest even though it doesn't exist in the DataFrame. @astable does not have this problem, at the cost of a more complicated implementation.

cc @bkamins @nalimilan

@bkamins
Copy link
Member

bkamins commented Sep 16, 2021

transform(df, [:a] => f => AsTable)

I would think it should be transform(df, [:a] => ByRow(f) => AsTable)?

@pdeffebach
Copy link
Collaborator Author

Good catch. Will update.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty cool! This will require a breaking release, right?

Using @astable to ensure operations are run sequentially is clever. The name is a bit surprising for this, but well... I also hope the compilation overhead isn't too large.

src/macros.jl Outdated
Comment on lines 429 to 432
julia> @by df :a @astable begin
$(DOLLAR)"Mean of b" = mean(:b)
$(DOLLAR)"Standard deviation of b" = std(:b)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example can be achieved without @astable, right? Maybe do m = mean(:b); std(:b, mean=m) to illustrate the power of this function? Or, simpler, call extrema(:b) to create two columns.

Also, I wouldn't use long column names with spaces in them: better illustrate a single feature at a time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great. changed.

src/parsing_astable.jl Outdated Show resolved Hide resolved
src/parsing_astable.jl Show resolved Hide resolved
src/parsing_astable.jl Show resolved Hide resolved
src/parsing.jl Outdated Show resolved Hide resolved
src/macros.jl Outdated Show resolved Hide resolved
src/macros.jl Outdated Show resolved Hide resolved
src/macros.jl Outdated
Comment on lines 391 to 402
`@astable` is useful when performing intermediate calculations
yet store their results in new columns. For example, the following fails.

```
@rtransform df begin
:new_col_1 = :x + :y
:new_col_2 = :new_col_1 + :z
end
```

This because DataFrames.jl does not guarantee sequential evaluation of
transformations. `@astable` solves this problem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is an interesting side-effect, the main goal of AsTable is to allow returning multiple columns from a single "function". Probably worth mentioning? For example it's useful with extrema to compute the minimum and the maximum at the same time.

src/macros.jl Show resolved Hide resolved
src/macros.jl Outdated Show resolved Hide resolved
@pdeffebach
Copy link
Collaborator Author

Using @astable to ensure operations are run sequentially is clever. The name is a bit surprising for this, but well... I also hope the compilation overhead isn't too large.

This macro does more than that. It allows for local variables to per persistent as well. If I were to just force sequential transform calls, I would just create a transform call for each :y = f(:x) expression.

@pdeffebach
Copy link
Collaborator Author

This is ready for another round of reviews. I have added docs to the manual as well.

information.

In a single block, all assignments of the form `:y = f(:x)`
or `$y = f(:x)` at the top-level are generate new columns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
or `$y = f(:x)` at the top-level are generate new columns.
or `$y = f(:x)` at the top-level generate new columns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add what $y has to resolve to (I understand it has to be Symbol, or strings are also accepted?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Turns out I was allowing unexpected behavior and patched the code.

src/macros.jl Outdated Show resolved Hide resolved
src/macros.jl Show resolved Hide resolved
src/macros.jl Show resolved Hide resolved
src/macros.jl Outdated

Column assignment in `@astable` follows the same rules as
column assignment more generally. Construct a new column
from a string by escaping it with `$DOLLAR`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an example of this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

src/macros.jl Outdated Show resolved Hide resolved
src/macros.jl Show resolved Hide resolved
@pdeffebach
Copy link
Collaborator Author

Thanks, I've responded to the most recent round of reviews.

I have added more tests. I can't think of any new tests to add at the moment, they seem pretty well covered.

@pdeffebach
Copy link
Collaborator Author

cc @jkrumbiegel, if you want to review.

@nalimilan
Copy link
Member

This macro does more than that. It allows for local variables to per persistent as well. If I were to just force sequential transform calls, I would just create a transform call for each :y = f(:x) expression.

Yes I know. What I'm saying is that mentioning sequential operations as the main justification for it was a bit weird.

src/macros.jl Outdated Show resolved Hide resolved
src/macros.jl Show resolved Hide resolved
src/parsing_astable.jl Outdated Show resolved Hide resolved
:b_max = ex[2]
end

@test sort(d.b_min) == [5, 7]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of test is quite fragile. Better sort the whole data frame and compare it to the reference to make sure that groups and values match.

@pdeffebach
Copy link
Collaborator Author

pdeffebach commented Sep 23, 2021

I don't understand the changes to the github review interface, but I think i've addressed lingering issues.

I just added some checks to make sure you can't do @passmissing and @astable at the same time. I think this is intuitive in the @byrow case, but I want to finish #276 , which makes @passmissing work on column-wise transformations, before I do this.

@pdeffebach
Copy link
Collaborator Author

Tests pass and this can be merged.

end

@testset "@astable with just assignments, mutating" begin
# After finalizing above testset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdeffebach - this seems to be WIP?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that. Thank you working on it now.


d = @rtransform df @astable begin
:x = 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline is not needed. Also I am not clear why you add nothing here and below? Does it change anything?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I remember.

@transform df begin 
    :x = 1
end

is valid code and does the same thing as the same with @astable does. So I wanted to test something that made sure it was hittng the @astable path and not the vanilla path.

I have deleted the extra new lines.

Copy link
Collaborator Author

@pdeffebach pdeffebach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the round of reviews. I think I have addressed everything. Sorry about forgetting some of the tests.

src/macros.jl Show resolved Hide resolved
end

@testset "@astable with just assignments, mutating" begin
# After finalizing above testset
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that. Thank you working on it now.


d = @rtransform df @astable begin
:x = 1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I remember.

@transform df begin 
    :x = 1
end

is valid code and does the same thing as the same with @astable does. So I wanted to test something that made sure it was hittng the @astable path and not the vanilla path.

I have deleted the extra new lines.

test/astable_flag.jl Show resolved Hide resolved
Copy link
Member

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just please check why Documenter fails before merging. Thank you!

docs/src/index.md Outdated Show resolved Hide resolved
src/macros.jl Outdated Show resolved Hide resolved
src/macros.jl Outdated Show resolved Hide resolved
src/macros.jl Outdated Show resolved Hide resolved
src/parsing.jl Outdated
Comment on lines 229 to 230
println(MacroTools.prettify(fun))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
println(MacroTools.prettify(fun))

pdeffebach and others added 2 commits September 24, 2021 05:03
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@pdeffebach
Copy link
Collaborator Author

Thanks for the review. @nalimilan ready to merge?

@pdeffebach pdeffebach merged commit cc066df into JuliaData:master Sep 24, 2021
@pdeffebach pdeffebach deleted the astable_2 branch September 24, 2021 21:36
@bkamins
Copy link
Member

bkamins commented Sep 24, 2021

Bravo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants