-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lead/Lag behavior with nested column #3789
Comments
library(dplyr, warn.conflicts = FALSE)
dt <-
tibble::tribble(
~id, ~a, ~b,
1L, 0L, 0L,
2L, 0L, 1L,
3L, 1L, 0L,
4L, 1L, 1L
)
# get the nested data
dt_nested <- dt %>%
tidyr::nest(-id)
first(dt_nested$data) %>% str()
#> Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 2 variables:
#> $ a: int 0
#> $ b: int 0
first(dt_nested$data) %>% length()
#> [1] 2
identical(
first(dt_nested$data),
dt_nested$data[[1]]
)
#> [1] TRUE
lag(dt_nested$data) %>% str()
#> List of 4
#> $ : logi NA
#> $ :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 2 variables:
#> ..$ a: int 0
#> ..$ b: int 0
#> $ :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 2 variables:
#> ..$ a: int 0
#> ..$ b: int 1
#> $ :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 2 variables:
#> ..$ a: int 1
#> ..$ b: int 0 Using lag(dt_nested$data, default = dt_nested$data[[1]]) %>% str()
#> List of 5
#> $ : int 0
#> $ : int 0
#> $ :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 2 variables:
#> ..$ a: int 0
#> ..$ b: int 0
#> $ :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 2 variables:
#> ..$ a: int 0
#> ..$ b: int 1
#> $ :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 2 variables:
#> ..$ a: int 1
#> ..$ b: int 0 You have a list of length 5, more than the number of rows in your tibble. It is why lag(dt_nested$data, default = dt_nested$data[1]) %>% str()
#> List of 4
#> $ :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 2 variables:
#> ..$ a: int 0
#> ..$ b: int 0
#> $ :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 2 variables:
#> ..$ a: int 0
#> ..$ b: int 0
#> $ :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 2 variables:
#> ..$ a: int 0
#> ..$ b: int 1
#> $ :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 2 variables:
#> ..$ a: int 1
#> ..$ b: int 0 You can use base R head(dt_nested$data, n = 1) %>% str()
#> List of 1
#> $ :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 2 variables:
#> ..$ a: int 0
#> ..$ b: int 0
dt %>%
tidyr::nest(-id) %>%
mutate(prior = lag(data, default = head(data, 1)))
#> # A tibble: 4 x 3
#> id data prior
#> <int> <list> <list>
#> 1 1 <tibble [1 x 2]> <tibble [1 x 2]>
#> 2 2 <tibble [1 x 2]> <tibble [1 x 2]>
#> 3 3 <tibble [1 x 2]> <tibble [1 x 2]>
#> 4 4 <tibble [1 x 2]> <tibble [1 x 2]> So I think you get the error because the result of |
This does feel surprising to me. It's possible that I've forgotten some reason that this can't work, but I suspect we've forgotten to think fully about lists in either |
Is this correct behavior? dplyr::first(as.list(1:3))
#> [1] 1
|
I think the problem is with > nth
function(x, n, order_by = NULL, default = default_missing(x)) {
stopifnot(length(n) == 1, is.numeric(n))
n <- trunc(n)
if (n == 0 || n > length(x) || n < -length(x)) {
return(default)
}
# Negative values index from RHS
if (n < 0) {
n <- length(x) + n + 1
}
if (is.null(order_by)) {
x[[n]]
} else {
x[[ order(order_by)[[n]] ]]
}
}
<environment: namespace:dplyr> which does not return something of the same type as the input, it should return a length-1 list rather than its content. |
Things appear slightly more consistent in the hybrid case library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
d <- tibble(x = list(1L, 1:2))
# hybrid
d %>% summarise(first = first(x), last = last(x))
#> # A tibble: 1 x 2
#> first last
#> <list> <list>
#> 1 <int [1]> <int [2]>
# standard
d %>% summarise(first = (first(x)))
#> # A tibble: 1 x 1
#> first
#> <int>
#> 1 1
# not working because last element of length 2
d %>% summarise(last = (last(x)))
#> Error: Column `last` must be length 1 (a summary value), not 2 Created on 2018-09-05 by the reprex package (v0.2.0). |
This means that |
Please do not change the behaviour of |
I feel pretty confident that library(tidyverse)
x <- as.list(1:3)
# Shouldn't unpack
first(x)
#> [1] 1
nth(x, 2)
#> [1] 2
# Should insert NULL aka vctrs::vec_na(list())
lead(x)
#> [[1]]
#> [1] 2
#>
#> [[2]]
#> [1] 3
#>
#> [[3]]
#> [1] NA
lag(x)
#> [[1]]
#> [1] NA
#>
#> [[2]]
#> [1] 1
#>
#> [[3]]
#> [1] 2 Created on 2019-06-27 by the reprex package (v0.3.0) |
We'll resolve this in tidyverse/funs#35 |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
1 similar comment
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
I'm not sure if this is an intentional design, but I believe that lead and lag do not behave appropriately with nested columns.
Why do I need to encapsulate with list?
Edit: On second thought, perhaps this has to do with how
first
works?The text was updated successfully, but these errors were encountered: