Order of arguments in data frame methods? #128

strengejacke · 2022-03-20T13:09:50Z

Should we harmonize the order of arguments for our data frame methods, so that we have data - select - exclude - the others?

E.g.
https://easystats.github.io/datawizard/reference/data_cut.html
https://easystats.github.io/datawizard/reference/data_rename.html

We could maybe make select always the 2nd argument.

strengejacke · 2022-03-20T15:32:43Z

or we make is exact the opposite, and put select/exclude to the end.
Then we could do, without naming arguments,

data |>
  find_colums(starts_with("xy")) |>
  data_cut("mean")

or

data |>
  find_colums(starts_with("xy")) |>
  data_addprefix("NEW_")

DominiqueMakowski · 2022-03-20T22:45:57Z

We should definitely do something about consistency

Option 1 is the most straightforward, Option 2 is interesting...

And maybe option 2 makes sense, because when you think about it in most function the selectors are not the "main" arguments (e.g. in add prefix), and people expect the main arguments to be described first and then the more additional ones at the end?

strengejacke · 2022-03-21T08:45:46Z

@mattansb @bwiernik @IndrajeetPatil Thoughts?

bwiernik · 2022-03-21T08:47:18Z

I like the idea of consistency. I find option 2 sort of confusing--I don't have an intuition of what that pipe chain is supposed to do. I would prefer to hew as close to the tidyverse-style API here as we can. So I would prefer select/exclude be the second/third arguments

IndrajeetPatil · 2022-03-21T08:53:09Z

I second Brenton.

Designing API that mimics tidyverse API also means that the users familiar with that API won't need to change their habits, and can easily adapt to easystats API.

strengejacke · 2022-03-21T09:09:59Z

But there's one difference to the tidyverse API, namely that select and exclude are arguments to select columns, which we have in many of our functions, while tidyverse (or dplyr) functions have their "operational" arguments after the first data argument. tidyverse has no arguments to select columns, it "only" has select().

Functions like slice(), muate(), summarize(), arrange() etc. all have a data argument first, followed by arguments that are specific to the function and their intended operation. We have data argument, then column selection, then the arguments specific to the function:

tidyverse: foo(data, what_to_do)
datawizard: foo(data, select_column, what_to_do)

We could stick to this, or change it to: foo(data, what_to_do, select_column)

But for the pipe-workflow, find_columns() should have an option to return the data (either by having an argument, or we have a separate function, get_columns(), that returns the data instead of the names).

bwiernik · 2022-03-21T09:23:23Z

I’m not sure without some context. Can you give some examples with our functions?

strengejacke · 2022-03-21T09:49:29Z

These functions from both packages work in a similar fashion:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(datawizard)

iris |> 
  select(starts_with("Sep")) |> 
  summarise(m = mean(Sepal.Length)) # 2nd argument does not select data
#>          m
#> 1 5.843333

iris |> 
  select(starts_with("Sep")) |> 
  slice(1:5) # 2nd argument does not select data
#>   Sepal.Length Sepal.Width
#> 1          5.1         3.5
#> 2          4.9         3.0
#> 3          4.7         3.2
#> 4          4.6         3.1
#> 5          5.0         3.6

iris |> 
  select(starts_with("Sep")) |> 
  data_cut("median") |> # 2nd argument does not select data
  head()
#>   Sepal.Length Sepal.Width
#> 1            1           2
#> 2            1           2
#> 3            1           2
#> 4            1           2
#> 5            1           2
#> 6            1           2

iris |> 
  select(starts_with("Sep")) |> 
  data_addprefix("HU") |> # 2nd argument does not select data
  head()
#>   HUSepal.Length HUSepal.Width
#> 1            5.1           3.5
#> 2            4.9           3.0
#> 3            4.7           3.2
#> 4            4.6           3.1
#> 5            5.0           3.6
#> 6            5.4           3.9

When we would use select as 2nd argument in all those functions in datawizard, they would no longer behave similar to dplyr. select and exclude are convenient "bonus" arguments, that are not necessarily related to the core task of the functions. - like all the other "2nd" arguments in the tidyverse functions (and like in some of our datawizard's functions).

strengejacke · 2022-03-21T09:52:59Z

I just saw, for most/all(?) functions, where applicable, select/ exclude are not the 2nd and 3rd argument. There are exceptions, like data_extract() or reshape_longer(), but in these cases, the 2nd argument must select data (like dplyr::pull()). So probably we're already ok with the current function design...

strengejacke · 2022-03-21T10:21:40Z

I think Dom summarised it well:

And maybe option 2 makes sense, because when you think about it in most function the selectors are not the "main" arguments (e.g. in add prefix), and people expect the main arguments to be described first and then the more additional ones at the end?

bwiernik · 2022-03-21T12:12:25Z

Okay, I agree. We should move the secondary select/exclude arguments after the main function arguments

strengejacke · 2022-03-21T17:39:56Z

closing in favor of #133

strengejacke added the Discussion 🦜 label Mar 21, 2022

strengejacke mentioned this issue Mar 21, 2022

Missing select helpers #133

Closed

7 tasks

strengejacke closed this as completed Mar 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Order of arguments in data frame methods? #128

Order of arguments in data frame methods? #128

strengejacke commented Mar 20, 2022

strengejacke commented Mar 20, 2022

DominiqueMakowski commented Mar 20, 2022

strengejacke commented Mar 21, 2022

bwiernik commented Mar 21, 2022 •

edited

Loading

IndrajeetPatil commented Mar 21, 2022

strengejacke commented Mar 21, 2022 •

edited

Loading

bwiernik commented Mar 21, 2022

strengejacke commented Mar 21, 2022

strengejacke commented Mar 21, 2022

strengejacke commented Mar 21, 2022

bwiernik commented Mar 21, 2022

strengejacke commented Mar 21, 2022

Order of arguments in data frame methods? #128

Order of arguments in data frame methods? #128

Comments

strengejacke commented Mar 20, 2022

strengejacke commented Mar 20, 2022

DominiqueMakowski commented Mar 20, 2022

strengejacke commented Mar 21, 2022

bwiernik commented Mar 21, 2022 • edited Loading

IndrajeetPatil commented Mar 21, 2022

strengejacke commented Mar 21, 2022 • edited Loading

bwiernik commented Mar 21, 2022

strengejacke commented Mar 21, 2022

strengejacke commented Mar 21, 2022

strengejacke commented Mar 21, 2022

bwiernik commented Mar 21, 2022

strengejacke commented Mar 21, 2022

bwiernik commented Mar 21, 2022 •

edited

Loading

strengejacke commented Mar 21, 2022 •

edited

Loading