-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter, arrange, slice dropping custom attributes of base vectors #4219
Comments
Actually when a column has a class, since 0.8.0 dplyr falls back to use R library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
d <- tibble(
x = structure(0:1, label = "foo"),
y = 0:1
)
# filter() will keep them in that case
# but I actually believe it should not
filter(d, 1 == 1) %>% str()
#> Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 2 variables:
#> $ x: int 0 1
#> ..- attr(*, "label")= chr "foo"
#> $ y: int 0 1
# but it should perhaps not
str(d$x[1:2])
#> int [1:2] 0 1
str(d$y[1:2])
#> int [1:2] 0 1
# vctrs::vec_slice() agrees
str(vctrs::vec_slice(d$x, 1:2))
#> int [1:2] 0 1
# when the object has a class, dplyr falls back
# to calling R which drops the attributes
d <- tibble(
x = structure(0:1, label = "foo", class = "numeric")
)
filter(d, 1 == 1) %>% str()
#> Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 1 variable:
#> $ x: int 0 1 This is the correct behaviour, because neither dplyr nor base R knows what to do with the attributes. You'd need to define a custom str(structure(0:1, label = "foo")[1:2])
#> int [1:2] 0 1
str(structure(0:1, label = "foo", class = "numeric")[1:2])
#> int [1:2] 0 1 |
Follow up; library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
d <- tibble(
y = 12:1,
x = structure(1:12, label = "foo", class = "myclass")
)
d %>%
filter(y > 10) %>%
str()
#> Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 2 variables:
#> $ y: int 12 11
#> $ x: int 1 2
`[.myclass` <- function(x, ...) {
structure(unclass(x)[...], class = "myclass", label = "foo")
}
d %>%
filter(y > 10) %>%
str()
#> Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 2 variables:
#> $ y: int 12 11
#> $ x: 'myclass' int 1 2
#> ..- attr(*, "label")= chr "foo" Created on 2019-03-04 by the reprex package (v0.2.1.9000) |
This might change when we use |
I'm confused. If this is the correct behavior, then why isn't it documented as a breaking change? It seems odd to switch to using What are the odds that |
Reopening this, but I guess the discussion should move to vctrs::vec_slice(). I’ll add some more content here in the morning. |
Thank you. |
An intermediate library(dplyr)
library(sjlabelled)
d <- data.frame(
factor = structure(1:2, .Label = c("0", "1"), class = "factor" , label = "foo"),
logical = structure(0:1, class = "logical" , label = "foo"),
numeric = structure(0:1, class = "numeric" , label = "foo"),
integer = structure(0:1, class = "integer" , label = "foo"),
char = structure(c("0", "1"), class = "character", label = "foo")
)
d %>% sapply(attr, "label")
#> $factor
#> [1] "foo"
#>
#> $logical
#> [1] "foo"
#>
#> $numeric
#> [1] "foo"
#>
#> $integer
#> [1] "foo"
#>
#> $char
#> NULL
d %>%
filter (1 == 1) %>%
sjlabelled::copy_labels(d) %>%
sapply(attr, "label")
#> $factor
#> [1] "foo"
#>
#> $logical
#> [1] "foo"
#>
#> $numeric
#> [1] "foo"
#>
#> $integer
#> [1] "foo"
#>
#> $char
#> NULL Created on 2019-03-08 by the reprex package (v0.2.1) |
(value labels, i.e. attribute |
Yes and no. While the To me the key issue at hand is if attribute preserving behavior should continue to be exhibited by Also, while base R tools like library(dplyr)
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tibble)
library(data.table)
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dplyr':
#>
#> between, first, last
d <- tibble(
factor = structure(1:2, .Label = c("0", "1"), class = "factor" , label = "foo", attr_1 = "some_value"),
logical = structure(0:1, class = "logical" , label = "foo", attr_2 = "some_value"),
numeric = structure(0:1, class = "numeric" , label = "foo", attr_3 = "some_value"),
integer = structure(0:1, class = "integer" , label = "foo", attr_4 = "some_value"),
char = structure(c("0", "1"), class = "character", label = "foo", attr_5 = "some_value")
)
### While [ doesn't preserve attributes, it's worth noting that other ecosystems like data.table do preserve custom attributes
dt <- as.data.table(d)
dt[numeric == 0] %>% str
#> Classes 'data.table' and 'data.frame': 1 obs. of 5 variables:
#> $ factor : Factor w/ 2 levels "0","1": 1
#> ..- attr(*, "label")= chr "foo"
#> ..- attr(*, "attr_1")= chr "some_value"
#> $ logical:Class 'logical' atomic [1:1] 0
#> .. ..- attr(*, "label")= chr "foo"
#> .. ..- attr(*, "attr_2")= chr "some_value"
#> $ numeric:Class 'numeric' atomic [1:1] 0
#> .. ..- attr(*, "label")= chr "foo"
#> .. ..- attr(*, "attr_3")= chr "some_value"
#> $ integer:Class 'integer' atomic [1:1] 0
#> .. ..- attr(*, "label")= chr "foo"
#> .. ..- attr(*, "attr_4")= chr "some_value"
#> $ char :Class 'character' atomic [1:1] 0
#> .. ..- attr(*, "label")= chr "foo"
#> .. ..- attr(*, "attr_5")= chr "some_value"
#> - attr(*, ".internal.selfref")=<externalptr>
dt[1] %>% str
#> Classes 'data.table' and 'data.frame': 1 obs. of 5 variables:
#> $ factor : Factor w/ 2 levels "0","1": 1
#> ..- attr(*, "label")= chr "foo"
#> ..- attr(*, "attr_1")= chr "some_value"
#> $ logical:Class 'logical' atomic [1:1] 0
#> .. ..- attr(*, "label")= chr "foo"
#> .. ..- attr(*, "attr_2")= chr "some_value"
#> $ numeric:Class 'numeric' atomic [1:1] 0
#> .. ..- attr(*, "label")= chr "foo"
#> .. ..- attr(*, "attr_3")= chr "some_value"
#> $ integer:Class 'integer' atomic [1:1] 0
#> .. ..- attr(*, "label")= chr "foo"
#> .. ..- attr(*, "attr_4")= chr "some_value"
#> $ char :Class 'character' atomic [1:1] 0
#> .. ..- attr(*, "label")= chr "foo"
#> .. ..- attr(*, "attr_5")= chr "some_value"
#> - attr(*, ".internal.selfref")=<externalptr>
dt[order(-numeric)] %>% str
#> Classes 'data.table' and 'data.frame': 2 obs. of 5 variables:
#> $ factor : Factor w/ 2 levels "0","1": 2 1
#> ..- attr(*, "label")= chr "foo"
#> ..- attr(*, "attr_1")= chr "some_value"
#> $ logical:Class 'logical' atomic [1:2] 1 0
#> .. ..- attr(*, "label")= chr "foo"
#> .. ..- attr(*, "attr_2")= chr "some_value"
#> $ numeric:Class 'numeric' atomic [1:2] 1 0
#> .. ..- attr(*, "label")= chr "foo"
#> .. ..- attr(*, "attr_3")= chr "some_value"
#> $ integer:Class 'integer' atomic [1:2] 1 0
#> .. ..- attr(*, "label")= chr "foo"
#> .. ..- attr(*, "attr_4")= chr "some_value"
#> $ char :Class 'character' atomic [1:2] 1 0
#> .. ..- attr(*, "label")= chr "foo"
#> .. ..- attr(*, "attr_5")= chr "some_value"
#> - attr(*, ".internal.selfref")=<externalptr> Created on 2019-03-08 by the reprex package (v0.2.1) |
Duplicate of #3923 |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
filter, arrange, slice all seem to drop custom attributes of base vectors in version 0.8.0 +. I've found similar posts about this behavior, but they all seem to be around custom / user defined classes, not base vectors (#4079, #3923, #3429 )
This wasn't in the documentation as a breaking change so I am hoping it is a bug, otherwise it's a pretty substantial breaking change.
Created on 2019-03-08 by the reprex package (v0.2.1)
The text was updated successfully, but these errors were encountered: