Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mutate on a grouped tibble drops attributes of columns #3923

Closed
bnicenboim opened this issue Oct 17, 2018 · 14 comments
Closed

mutate on a grouped tibble drops attributes of columns #3923

bnicenboim opened this issue Oct 17, 2018 · 14 comments
Milestone

Comments

@bnicenboim
Copy link

I have columns that belong to a custom class, if I mutate them when the tibble is grouped, the class disappears and I get a warning. It might be related to #390, but I get a different error. And it might be related to issue #1150, but I can't understand if I need to do something on my end or there is a problem in dplyr, since non-grouped tables work just fine.


Example:

This case works fine:


   myclass <- function(x, X){
    	class(x) <- "myclass"
    	attributes(x)$X <- X
    	x
    }
 tx <- tibble(a = rep(c(1, 2), 5),b = myclass(1:10, 100))

 mutate(c = b/2) %>% .$c
    # [1] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
    # attr(,"class")
    # [1] "myclass"
    # attr(,"X")
    # [1] 100

I would expect to see myclass if I grouped the tibble, instead I get a warning:

group_by(tx,a) %>% 
        mutate(c=b/2) %>% 
        .$c
   # [1] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
    # Warning messages:
    # 1: In mutate_impl(.data, dots) :
    # Vectorizing 'myclass' elements may not preserve their attributes
    # 2: In mutate_impl(.data, dots) :
    # Vectorizing 'myclass' elements may not preserve their attributes
@batpigandme
Copy link
Contributor

I think this may be related to #3429, which is still open.

@romainfrancois
Copy link
Member

Yes. Same issue. There are two things involved:

  • We need an [.myclass to correctly extract the subset, respecting the class. We've added support for that recently, but IIRC the method has to be registered, we might want to roll back on that @lionel-
  • We need to be able to combine the pieces of c together into an myclass object. We need vctrs for that (and so will have to wait for 0.9.0).

@lionel- because of where [.myclass is defined (global env), it's not registered right ? I was expecting to get myclass objects here

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
myclass <- function(x, X){
  structure(x, class = "myclass", X = X)
}
`[.myclass` <- function(x, i, ...) {
  structure(unclass(x)[i, ...], class = "myclass", X = attr(x, "X"))
}

tx <- tibble(a = rep(c(1, 2), 5), b = myclass(1:10, 100))

group_by(tx,a) %>% 
  mutate(c=list(b)) %>% 
  pull()
#> [[1]]
#> [1] 1 3 5 7 9
#> 
#> [[2]]
#> [1]  2  4  6  8 10
#> 
#> [[3]]
#> [1] 1 3 5 7 9
#> 
#> [[4]]
#> [1]  2  4  6  8 10
#> 
#> [[5]]
#> [1] 1 3 5 7 9
#> 
#> [[6]]
#> [1]  2  4  6  8 10
#> 
#> [[7]]
#> [1] 1 3 5 7 9
#> 
#> [[8]]
#> [1]  2  4  6  8 10
#> 
#> [[9]]
#> [1] 1 3 5 7 9
#> 
#> [[10]]
#> [1]  2  4  6  8 10

tx$b[ tx$a == 1] 
#> [1] 1 3 5 7 9
#> attr(,"class")
#> [1] "myclass"
#> attr(,"X")
#> [1] 100

eval( substitute(tx$b[ tx$a == 1], list(tx = tx)), baseenv() )
#> [1] 1 3 5 7 9

Created on 2018-10-17 by the reprex package (v0.2.1.9000)

@lionel-
Copy link
Member

lionel- commented Oct 17, 2018

We could evaluate the [ call in the data mask with all objects inlined in the call.

@romainfrancois
Copy link
Member

Why would we need to inline them ?

@lionel-
Copy link
Member

lionel- commented Oct 17, 2018

Can you point me to the previous discussion / code? I'm not sure what are we talking about.

@romainfrancois
Copy link
Member

I have this gist: https://gist.github.com/romainfrancois/eb5268503abb5dea70464b4092958161

and the background discussion is in this PR: #3854

@romainfrancois
Copy link
Member

Ultimately [ is called here:

SEXP r_column_subset(SEXP x, const Index& index) {

@lionel-
Copy link
Member

lionel- commented Oct 17, 2018

Thanks! As a matter of principle, I would inline the [ primitive in the CAR so that we don't pick up potential parasite definitions in the mask / global env. Or at least call it with a base:: prefix, but inlining is a bit faster and a tiny teeny bit safer (no assumption about ::).

@romainfrancois
Copy link
Member

I have a fix for the first issue in #3926.

The second issue, i.e. reconstructing an object of the correct class is something we'll tackle on 0.9.0

@romainfrancois romainfrancois added this to the 0.9.0 milestone Oct 18, 2018
@bnicenboim
Copy link
Author

Thanks for taking care of this! Is there a workaround that I can use for now?

@romainfrancois
Copy link
Member

This will need I believe an implementation of vec_restore (https://vctrs.r-lib.org/articles/s3-vector.html#cached-sum).

In the meantime, you can always do something like this:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
myclass <- function(x, X){
  structure(x, class = "myclass", X = X)
}
`[.myclass` <- function(x, i, ...) {
  structure(unclass(x)[i, ...], class = "myclass", X = attr(x, "X"))
}

tibble(a = rep(c(1, 2), 5), b = myclass(1:10, 100)) %>% 
  group_by(a) %>% 
  mutate(c = b / 2) %>% 
  ungroup() %>% 
  mutate(c = myclass(c, X = attr(b, "X")))
#> Warning in mutate_impl(.data, dots): Vectorizing 'myclass' elements may not
#> preserve their attributes

#> Warning in mutate_impl(.data, dots): Vectorizing 'myclass' elements may not
#> preserve their attributes
#> # A tibble: 10 x 3
#>        a b             c            
#>    <dbl> <S3: myclass> <S3: myclass>
#>  1     1  1            0.5          
#>  2     2  2            1.0          
#>  3     1  3            1.5          
#>  4     2  4            2.0          
#>  5     1  5            2.5          
#>  6     2  6            3.0          
#>  7     1  7            3.5          
#>  8     2  8            4.0          
#>  9     1  9            4.5          
#> 10     2 10            5.0

Created on 2018-10-23 by the reprex package (v0.2.1.9000)

@bnicenboim
Copy link
Author

yeah, I know, but my real case is much more complex. What I'm really doing is working on package that reimplements dplyr functions for a new class that contains tibbles with columns of a new class and attributes.

And so the end user just do:

group_by(myobject, something) %>% mutate(x = c*2)

And my package should take care of everything. I was wondering, if I could just add a new vec_restore or a function like that in my package that would solve this problem for now.

@hadley
Copy link
Member

hadley commented May 27, 2019

Duplicate of #2432

@hadley hadley marked this as a duplicate of #2432 May 27, 2019
@hadley hadley closed this as completed May 27, 2019
@lock
Copy link

lock bot commented Nov 23, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Nov 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants