Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dtplyr not recognizing .data$var syntax and sometimes returning 0s after summarise with no warning #138

Closed
sbashevkin opened this issue Dec 20, 2019 · 1 comment
Labels
bug an unexpected problem or unintended behavior dplyr-compat 💞 dplyr compatibility issues

Comments

@sbashevkin
Copy link

I recently noticed that using dtplyr with summarise and .data$var syntax results in some unexpected behavior without warning. I am using dtplyr within my package and have been referencing unquoted variable names with .data$varname as recommended. dtplyr does not seem to recognize that syntax and, at least when used with summarise, returns sums of 0s.

Please see the reprex below.

Thank you,
Sam

library(magrittr) #Normally my package would import just %>%
library(rlang) # Normally my package would import .data
#> 
#> Attaching package: 'rlang'
#> The following object is masked from 'package:magrittr':
#> 
#>     set_names
d<-tibble::tibble(Group=rep(c("A", "B"), 10), Num=1:20)

d
#> # A tibble: 20 x 2
#>    Group   Num
#>    <chr> <int>
#>  1 A         1
#>  2 B         2
#>  3 A         3
#>  4 B         4
#>  5 A         5
#>  6 B         6
#>  7 A         7
#>  8 B         8
#>  9 A         9
#> 10 B        10
#> 11 A        11
#> 12 B        12
#> 13 A        13
#> 14 B        14
#> 15 A        15
#> 16 B        16
#> 17 A        17
#> 18 B        18
#> 19 A        19
#> 20 B        20

# Works without `dtplyr`

d%>%
  dplyr::group_by(.data$Group)%>%
  dplyr::summarise(Num=sum(.data$Num, na.rm=TRUE))%>%
  dplyr::ungroup()%>%
  tibble::as_tibble()
#> # A tibble: 2 x 2
#>   Group   Num
#>   <chr> <int>
#> 1 A       100
#> 2 B       110

# `.data` does not seem to work with `dtplyr`

d%>%
  dtplyr::lazy_dt()%>%
  dplyr::group_by(.data$Group)%>%
  dplyr::summarise(Num=sum(.data$Num, na.rm=TRUE))%>%
  dplyr::ungroup()%>%
  tibble::as_tibble()
#> Error in eval(bysub, x, parent.frame()): object 'Group' not found

# But if you remove the `.data$` from `group_by` and leave it in
# the `summarise` call, it returns 0s, but no warnings or errors

d%>%
  dtplyr::lazy_dt()%>%
  dplyr::group_by(Group)%>%
  dplyr::summarise(Num=sum(.data$Num, na.rm=TRUE))%>%
  dplyr::ungroup()%>%
  tibble::as_tibble()
#> # A tibble: 2 x 2
#>   Group   Num
#>   <chr> <int>
#> 1 A         0
#> 2 B         0

# With `group_by_at` (what I was actually trying to use in my case),
# you can use `.data$` but it again returns 0s with no warnings or errors

d%>%
  dtplyr::lazy_dt()%>%
  dplyr::group_by_at(dplyr::vars(.data$Group))%>%
  dplyr::summarise(Num=sum(.data$Num, na.rm=TRUE))%>%
  dplyr::ungroup()%>%
  tibble::as_tibble()
#> # A tibble: 2 x 2
#>   Group   Num
#>   <chr> <int>
#> 1 A         0
#> 2 B         0

Created on 2019-12-20 by the reprex package (v0.3.0)

@hadley
Copy link
Member

hadley commented Dec 24, 2019

Minimal reprex:

library(dtplyr)
library(dplyr, warn.conflicts = FALSE)

df <- tibble(g = rep(c("A", "B"), 10), x = 1:20)

df %>%
  lazy_dt() %>%
  summarise(x = sum(.data$x)) %>% 
  show_query()
#> `_DT1`[, .(x = sum(.data$x))]

Created on 2019-12-24 by the reprex package (v0.3.0)

@hadley hadley added bug an unexpected problem or unintended behavior dplyr-compat 💞 dplyr compatibility issues labels Dec 24, 2019
@hadley hadley closed this as completed in 7544de4 Dec 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior dplyr-compat 💞 dplyr compatibility issues
Projects
None yet
Development

No branches or pull requests

2 participants