-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Order of arguments in data frame methods? #128
Comments
or we make is exact the opposite, and put select/exclude to the end. data |>
find_colums(starts_with("xy")) |>
data_cut("mean") or data |>
find_colums(starts_with("xy")) |>
data_addprefix("NEW_") |
We should definitely do something about consistency Option 1 is the most straightforward, Option 2 is interesting... And maybe option 2 makes sense, because when you think about it in most function the selectors are not the "main" arguments (e.g. in add prefix), and people expect the main arguments to be described first and then the more additional ones at the end? |
@mattansb @bwiernik @IndrajeetPatil Thoughts? |
I like the idea of consistency. I find option 2 sort of confusing--I don't have an intuition of what that pipe chain is supposed to do. I would prefer to hew as close to the tidyverse-style API here as we can. So I would prefer select/exclude be the second/third arguments |
I second Brenton. Designing API that mimics tidyverse API also means that the users familiar with that API won't need to change their habits, and can easily adapt to easystats API. |
But there's one difference to the tidyverse API, namely that Functions like tidyverse: We could stick to this, or change it to: But for the pipe-workflow, |
I’m not sure without some context. Can you give some examples with our functions? |
These functions from both packages work in a similar fashion: library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(datawizard)
iris |>
select(starts_with("Sep")) |>
summarise(m = mean(Sepal.Length)) # 2nd argument does not select data
#> m
#> 1 5.843333
iris |>
select(starts_with("Sep")) |>
slice(1:5) # 2nd argument does not select data
#> Sepal.Length Sepal.Width
#> 1 5.1 3.5
#> 2 4.9 3.0
#> 3 4.7 3.2
#> 4 4.6 3.1
#> 5 5.0 3.6
iris |>
select(starts_with("Sep")) |>
data_cut("median") |> # 2nd argument does not select data
head()
#> Sepal.Length Sepal.Width
#> 1 1 2
#> 2 1 2
#> 3 1 2
#> 4 1 2
#> 5 1 2
#> 6 1 2
iris |>
select(starts_with("Sep")) |>
data_addprefix("HU") |> # 2nd argument does not select data
head()
#> HUSepal.Length HUSepal.Width
#> 1 5.1 3.5
#> 2 4.9 3.0
#> 3 4.7 3.2
#> 4 4.6 3.1
#> 5 5.0 3.6
#> 6 5.4 3.9 When we would use |
I just saw, for most/all(?) functions, where applicable, |
I think Dom summarised it well:
|
Okay, I agree. We should move the secondary select/exclude arguments after the main function arguments |
closing in favor of #133 |
Should we harmonize the order of arguments for our data frame methods, so that we have data - select - exclude - the others?
E.g.
https://easystats.github.io/datawizard/reference/data_cut.html
https://easystats.github.io/datawizard/reference/data_rename.html
We could maybe make select always the 2nd argument.
The text was updated successfully, but these errors were encountered: