`summarise` ignoring .groups argument #245

sbashevkin · 2021-05-19T00:05:13Z

I've noticed that the summarise function used within a dtplyr pipline is ignoring the .groups argument and instead creating a new ".groups" column. When the .groups argument is left out, the resulting tibble doesn't retain any grouping, while base dplyr retains grouping for the first grouping column.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(dtplyr)
#> Warning: package 'dtplyr' was built under R version 4.0.5

# Data
data<-tibble(group1=rep(1:2, each=3), group2=rep(3:4, each=3), value=1:6)

# With regular dplyr
data%>%
    group_by(group1, group2)%>%
    summarise(value_mean=mean(value), .groups="drop")
#> # A tibble: 2 x 3
#>   group1 group2 value_mean
#>    <int>  <int>      <dbl>
#> 1      1      3          2
#> 2      2      4          5

# With dtplyr, .groups argument is ignored and turned into a new column
data%>%
    lazy_dt()%>%
    group_by(group1, group2)%>%
    summarise(value_mean=mean(value), .groups="drop")%>%
    as_tibble()
#> # A tibble: 2 x 4
#>   group1 group2 value_mean .groups
#>    <int>  <int>      <dbl> <chr>  
#> 1      1      3          2 drop   
#> 2      2      4          5 drop

# With dtplyr without the .groups argument, grouping is still removed
data%>%
    lazy_dt()%>%
    group_by(group1, group2)%>%
    summarise(value_mean=mean(value))%>%
    as_tibble()
#> # A tibble: 2 x 3
#>   group1 group2 value_mean
#>    <int>  <int>      <dbl>
#> 1      1      3          2
#> 2      2      4          5

# Whereas the same pipeline with just dplyr would preserve grouping
data%>%
    group_by(group1, group2)%>%
    summarise(value_mean=mean(value))
#> `summarise()` has grouped output by 'group1'. You can override using the `.groups` argument.
#> # A tibble: 2 x 3
#> # Groups:   group1 [2]
#>   group1 group2 value_mean
#>    <int>  <int>      <dbl>
#> 1      1      3          2
#> 2      2      4          5

^{Created on 2021-05-18 by the reprex package (v2.0.0)}

Session info

sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 4.0.3 (2020-10-10)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       America/Los_Angeles         
#>  date     2021-05-18                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.2)
#>  backports     1.2.1   2020-12-09 [1] CRAN (R 4.0.3)
#>  cli           2.5.0   2021-04-26 [1] CRAN (R 4.0.5)
#>  crayon        1.4.1   2021-02-08 [1] CRAN (R 4.0.3)
#>  data.table    1.14.0  2021-02-21 [1] CRAN (R 4.0.5)
#>  DBI           1.1.1   2021-01-15 [1] CRAN (R 4.0.3)
#>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.0.3)
#>  dplyr       * 1.0.6   2021-05-05 [1] CRAN (R 4.0.3)
#>  dtplyr      * 1.1.0   2021-02-20 [1] CRAN (R 4.0.5)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.0.5)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.2)
#>  fansi         0.4.2   2021-01-15 [1] CRAN (R 4.0.3)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)
#>  generics      0.1.0   2020-10-31 [1] CRAN (R 4.0.3)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.0.5)
#>  htmltools     0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
#>  knitr         1.33    2021-04-24 [1] CRAN (R 4.0.5)
#>  lifecycle     1.0.0   2021-02-15 [1] CRAN (R 4.0.4)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.0.3)
#>  pillar        1.6.0   2021-04-13 [1] CRAN (R 4.0.5)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.2)
#>  ps            1.6.0   2021-02-28 [1] CRAN (R 4.0.5)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.2)
#>  R6            2.5.0   2020-10-28 [1] CRAN (R 4.0.3)
#>  reprex        2.0.0   2021-04-02 [1] CRAN (R 4.0.5)
#>  rlang         0.4.11  2021-04-30 [1] CRAN (R 4.0.5)
#>  rmarkdown     2.8     2021-05-07 [1] CRAN (R 4.0.5)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.0.3)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.2)
#>  stringi       1.6.1   2021-05-10 [1] CRAN (R 4.0.3)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.2)
#>  styler        1.4.1   2021-03-30 [1] CRAN (R 4.0.5)
#>  tibble        3.1.1   2021-04-18 [1] CRAN (R 4.0.5)
#>  tidyselect    1.1.1   2021-04-30 [1] CRAN (R 4.0.5)
#>  utf8          1.2.1   2021-03-12 [1] CRAN (R 4.0.5)
#>  vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.0.5)
#>  withr         2.4.2   2021-04-18 [1] CRAN (R 4.0.5)
#>  xfun          0.22    2021-03-11 [1] CRAN (R 4.0.5)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.2)
#> 
#> [1] C:/Users/sbashevkin/Documents/R/R-4.0.3/library

The text was updated successfully, but these errors were encountered:

mgirlich · 2021-07-02T06:57:27Z

I added support for the .groups argument in this PR.
Note that as_tibble() drops the grouping. This was a breaking change in tibble 2.0, see News. To keep the grouping you have to use collect().

mgirlich mentioned this issue Jul 2, 2021

Support the .groups argument in summarise() #265

Merged

mgirlich closed this as completed in #265 Jul 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`summarise` ignoring .groups argument #245

`summarise` ignoring .groups argument #245

sbashevkin commented May 19, 2021

mgirlich commented Jul 2, 2021

summarise ignoring .groups argument #245

summarise ignoring .groups argument #245

Comments

sbashevkin commented May 19, 2021

mgirlich commented Jul 2, 2021

`summarise` ignoring .groups argument #245

`summarise` ignoring .groups argument #245