Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update .groups message after group_by() |> summarize() #6986

Open
mine-cetinkaya-rundel opened this issue Jan 29, 2024 · 1 comment
Open

Comments

@mine-cetinkaya-rundel
Copy link
Member

Currently this is the message dplyr emits for summarize() after group_by() with multiple variables.

library(dplyr)

mtcars |>
  group_by(vs, am) |>
  summarize(mean_mpg = mean(mpg))
#> `summarise()` has grouped output by 'vs'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 × 3
#> # Groups:   vs [2]
#>      vs    am mean_mpg
#>   <dbl> <dbl>    <dbl>
#> 1     0     0     15.0
#> 2     0     1     19.8
#> 3     1     0     20.7
#> 4     1     1     28.4

Created on 2024-01-29 with reprex v2.0.2

I think this message is still confusing and would be more clear if the grouping message was about the output and it explicitly stated .groups is an argument in summarize(), e.g.,

The output is grouped by `vs`. You can specify grouping structure of the output using the `.groups` argument in `summarize()`.

If going this route some things to keep in mind:

  • Maybe "result" instead of "output" in two places in the message, or change the description of the .groups argument to say "Grouping structure of the output." Basically, we should match what we're calling the "thing" that the function spits out.
  • It would be a nice-to-have if US/UK spelling of the function in the message matched what the spelling in the code that generates the message.

An alternative suggestion by @DavisVaughan was

summarize() has computed your expressions grouped by (foo, bar), and has regrouped the output by (foo).

I think this is an improvement over the current message too, but I'd suggest going with something simpler like the one above.

@janxkoci
Copy link

janxkoci commented Oct 17, 2024

Also, you grouped by two variables (vs, am) but the message only mentions vs. I see the same thing in my session with dplyr 1.1.4.

I have another problem where my results from summarise have duplicated rows for no obvious reason, so I'm looking around here if it's reported...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants