Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework simplification #909

Merged
merged 38 commits into from
Sep 12, 2022
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
bcc5415
Implement list_simplify() and use it in accumulate()
hadley Aug 31, 2022
1ead93b
Try a strict argument
hadley Aug 31, 2022
a927496
Tweak simplify spec
hadley Sep 2, 2022
be3fe11
Finish off list_transpose()
hadley Sep 2, 2022
17bffab
Use vec_unchop()
hadley Sep 2, 2022
8d63020
Remove out of date comment
hadley Sep 2, 2022
02ea731
Merge commit 'a118aeca0768493da48058882598389521fa3558'
hadley Sep 7, 2022
71ba50c
Feedback from code review
hadley Sep 7, 2022
20387d0
Finish off list_simplify() tests
hadley Sep 7, 2022
7805591
Update accumulate tests
hadley Sep 7, 2022
5d6d874
Basic docs for list_transpose()
hadley Sep 7, 2022
d9c76d3
Mildly consider simplification errors
hadley Sep 7, 2022
1980e24
Test list_transpose()
hadley Sep 7, 2022
eb85559
Deprecate transpose()
hadley Sep 7, 2022
5be021d
Merge commit '1cf95cbdf0fdcff27cad77b9a5f791d00107ced4'
hadley Sep 9, 2022
5940cc2
Replace accidental use of base pipe
hadley Sep 9, 2022
675a8a8
Implement user facing list_simplify()
hadley Sep 9, 2022
8b32bfa
Deprecate as_vector(), simplify(), simplify_all()
hadley Sep 9, 2022
a4f864d
Add news bullets
hadley Sep 9, 2022
822d714
Use cli for tests
hadley Sep 9, 2022
5254896
Apply suggestions from code review
hadley Sep 12, 2022
c8b4bc9
Merge commit '95a568c2bb8a5a7c0c7a70b02fc233d0a2c4ca02'
hadley Sep 12, 2022
e4311be
Re-document
hadley Sep 12, 2022
80fa90c
Error tweaking
hadley Sep 12, 2022
41b2039
Simplify simplify errors
hadley Sep 12, 2022
a7a41c8
More code review feedback
hadley Sep 12, 2022
4577473
Let list_transpose() work with numeric templates
hadley Sep 12, 2022
5319949
Add more transpose examples
hadley Sep 12, 2022
fcb92d6
Remove unnused error_arg
hadley Sep 12, 2022
2397f21
Avoid offense to the delicate senisbilities of Lionel and Davis
hadley Sep 12, 2022
471f340
Merge commit 'ff4dfcb16d64ed80bbf34e32c1ac4eff002ba11e'
hadley Sep 12, 2022
b16071a
Merge commit '61fb2accc032fab3ff3b2012bab8948194e8d08f'
hadley Sep 12, 2022
da1f7c9
Apply suggestions from code review
hadley Sep 12, 2022
0590ccf
Re-document & update snapshots
hadley Sep 12, 2022
99c925e
Improve list_simplify() errors + docs
hadley Sep 12, 2022
808568b
list_transpose() improvements
hadley Sep 12, 2022
2c5866a
Tweak docs
hadley Sep 12, 2022
78938f7
Move accidental change
hadley Sep 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,8 @@ export(list_flatten)
export(list_merge)
export(list_modify)
export(list_rbind)
export(list_simplify)
export(list_transpose)
export(list_update)
export(lmap)
export(lmap_at)
Expand Down
16 changes: 16 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,24 @@
* `*_dfc()` and `*_dfr()` have been deprecated in favour of using the
appropriate map function along with `list_rbind()` or `list_cbind()` (#912).

* `simplify()`, `simplify_all()`, and `as_vector()` have been deprecated in
favour of `list_simplify()`. It provides a more consistent definition of
simplification (#900).

* `transpose()` has been deprecated in favour of `list_transpose()` (#875).
It has built-in simplification.

## Features and fixes

* New `list_simplify()` reduces a list of length-1 vectors to a simpler atomic
or S3 vector (#900).

* New `list_transpose()` which automatically simplifies if possible (#875).

* `accumulate()` and `accumulate2()` now both simplify the output if possible.
New arguments `simplify` and `ptype` allow you to control the details of
simplification (#774, #809).

* New `list_update()` which is similar to `list_modify()` but doesn't work
recursively (#822).

Expand Down
58 changes: 33 additions & 25 deletions R/coercion.R
Original file line number Diff line number Diff line change
@@ -1,38 +1,35 @@
#' Coerce a list to a vector
#'
#' `as_vector()` collapses a list of vectors into one vector. It
#' checks that the type of each vector is consistent with
#' `.type`. If the list can not be simplified, it throws an error.
#' `simplify` will simplify a vector if possible; `simplify_all`
#' will apply `simplify` to every element of a list.
#' @description
#' `r lifecycle::badge("deprecated")`
#'
#' `.type` can be a vector mold specifying both the type and the
#' length of the vectors to be concatenated, such as `numeric(1)`
#' or `integer(4)`. Alternatively, it can be a string describing
#' the type, one of: "logical", "integer", "double", "complex",
#' "character" or "raw".
#' These functions are deprecated in favour of `list_simplify()`:
#'
#' * `as_vector(x)` is now `list_simplify(x)`
#' * `simplify(x)` is now `list_simplify(strict = FALSE)`
hadley marked this conversation as resolved.
Show resolved Hide resolved
#' * `simplify_all(x)` is `map(x, list_simplify, strict = FALSE)`
#'
#' @param .x A list of vectors
#' @param .type A vector mold or a string describing the type of the
#' input vectors. The latter can be any of the types returned by
#' [typeof()], or "numeric" as a shorthand for either
#' "double" or "integer".
#' @param .type can be a vector mold specifying both the type and the
hadley marked this conversation as resolved.
Show resolved Hide resolved
#' length of the vectors to be concatenated, such as `numeric(1)`
#' or `integer(4)`. Alternatively, it can be a string describing
#' the type, one of: "logical", "integer", "double", "complex",
#' "character" or "raw".
#' @export
#' @keywords internal
#' @examples
#' # Supply the type either with a string:
#' # was
#' as.list(letters) %>% as_vector("character")
#' # now
#' as.list(letters) %>% list_simplify(ptype = character())
#'
#' # Or with a vector mold:
#' as.list(letters) %>% as_vector(character(1))
#'
#' # Vector molds are more flexible because they also specify the
#' # length of the concatenated vectors:
#' # was:
#' list(1:2, 3:4, 5:6) %>% as_vector(integer(2))
#'
#' # Note that unlike vapply(), as_vector() never adds dimension
#' # attributes. So when you specify a vector mold of size > 1, you
#' # always get a vector and not a matrix
#' # now:
#' list(1:2, 3:4, 5:6) %>% list_c(ptype = integer())
as_vector <- function(.x, .type = NULL) {
lifecycle::deprecate_warn("0.4.0", "as_vector()", "list_simplify()")

if (can_simplify(.x, .type)) {
unlist(.x)
} else {
Expand All @@ -43,6 +40,7 @@ as_vector <- function(.x, .type = NULL) {
#' @export
#' @rdname as_vector
simplify <- function(.x, .type = NULL) {
lifecycle::deprecate_warn("0.4.0", "as_vector()", "list_simplify()")
if (can_simplify(.x, .type)) {
unlist(.x)
} else {
Expand All @@ -53,7 +51,17 @@ simplify <- function(.x, .type = NULL) {
#' @export
#' @rdname as_vector
simplify_all <- function(.x, .type = NULL) {
map(.x, simplify, .type = .type)
lifecycle::deprecate_warn("0.4.0", "as_vector()", I("map() + list_simplify()"))

# Inline simplify to avoid double deprecation
simplify <- function(.x) {
if (can_simplify(.x, .type)) {
unlist(.x)
} else {
.x
}
}
map(.x, simplify)
}


Expand Down
89 changes: 89 additions & 0 deletions R/list-simplify.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
#' Simplify a list to an atomic or S3 vector
#'
#' @details
#' Simplification maintains a one-to-one correspondence between the input
#' and output, implying that each element of `x` must contain a vector of
#' length 1. If you don't want to maintain this correspondence, then you
#' probably want either [list_c()] or [list_flatten()].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No @description? I feel like these details would probably make a decent description instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. No idea why I did that.

#'
#' @param x A list.
#' @param strict What should happen if simplification fails? If `TRUE`,
#' will error. If `FALSE`, will return `x` unchanged.
#' @param ptype An optional prototype to ensure that the output type is always
#' the same.
hadley marked this conversation as resolved.
Show resolved Hide resolved
#' @returns A vector the same length as `x`.
#' @export
#' @examples
#' list_simplify(list(1, 2, 3))
#'
#' try(list_simplify(list(1, 2, "x")))
#' try(list_simplify(list(1, 2, 1:3)))
list_simplify <- function(x, strict = TRUE, ptype = NULL) {
simplify_impl(x, strict = strict, ptype = ptype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe do some validation of strict?

}

# Wrapper used by purrr functions that do automatic simplification
list_simplify_internal <- function(
hadley marked this conversation as resolved.
Show resolved Hide resolved
x,
simplify = NA,
ptype = NULL,
error_arg = "x",
error_call = caller_env()
) {
if (length(simplify) > 1 || !is.logical(simplify)) {
cli::cli_abort("{.arg simplify} must be `TRUE`, `FALSE`, or `NA`.")
hadley marked this conversation as resolved.
Show resolved Hide resolved
}
if (!is.null(ptype) && isFALSE(simplify)) {
cli::cli_abort("Must not specify {.arg ptype} when `simplify = FALSE`.")
hadley marked this conversation as resolved.
Show resolved Hide resolved
}

if (isFALSE(simplify)) {
hadley marked this conversation as resolved.
Show resolved Hide resolved
return(x)
}

simplify_impl(
x,
strict = !is.na(simplify),
ptype = ptype,
error_arg = error_arg,
error_call = error_call
)
}

simplify_impl <- function(
x,
strict = TRUE,
ptype = NULL,
error_arg = "`x`",
error_call = caller_env()
) {
vec_check_list(x, arg = error_arg, call = error_call)

can_simplify <- every(x, vec_is, size = 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we feel about list_simplify(x, strict = FALSE) when the elements of x aren't vectors? Like:

list_simplify(list(lm(1 ~ 1)), strict = FALSE)

I feel like this should still error, because purrr functions should only work on vector types?

I feel like strict should only be for:

  • Size incompatibility
  • Vector type incompatibility

i.e. this scalar object issue is out of scope and would still be an error


If you agree, then I'd argue that the current implementation here will be very slow (vec_is() is quite slow) and you could use this instead:

list_check_all_vectors(x, call = error_call)
can_simplify <- all(list_sizes(x) == 1L)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if you want to support scalar types, it might be worth using the approach above and wrapping it in tryCatch() that returns FALSE on error, as it will still probably be an order of magnitude or more faster than vec_is()


if (can_simplify) {
tryCatch(
vec_unchop(x, ptype = ptype),
vctrs_error_incompatible_type = function(err) {
if (strict || !is.null(ptype)) {
cli::cli_abort(
hadley marked this conversation as resolved.
Show resolved Hide resolved
"Failed to simplify {error_arg}.",
parent = err,
call = error_call
)
} else {
x
}
}
)
} else {
if (strict) {
cli::cli_abort(
"Failed to simplify {error_arg}: not all elements vectors of length 1.",
hadley marked this conversation as resolved.
Show resolved Hide resolved
call = error_call
)
} else {
x
}
}
}
98 changes: 98 additions & 0 deletions R/list-transpose.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
#' Transpose a list
#'
#' @description
#' `list_transpose()` turns a list-of-lists "inside-out"; it turns a pair of
hadley marked this conversation as resolved.
Show resolved Hide resolved
#' lists into a list of pairs, or a list of pairs into pair of lists. For
hadley marked this conversation as resolved.
Show resolved Hide resolved
#' example, if you had a list of length `n` where each component had values `a`
#' and `b`, `list_transpose()` would make a list with elements `a` and
#' `b` that contained lists of length n.
hadley marked this conversation as resolved.
Show resolved Hide resolved
#'
#' It's called transpose because `x[["a"]][["b"]]` is equivalent to
#' `transpose(x)[["b"]][["a"]]`, i.e. transposing a list flips the order of
hadley marked this conversation as resolved.
Show resolved Hide resolved
#' indices in a similar way to transposing a matrix.
#'
#' @param x A list of vectors to transpose.
#' @param template A "template" that specifies the names of output list.
hadley marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually isn't clear to me that this is supposed to be a character vector.

It seems like it is supposed to be a named vector where the names get used as the output names, like c(x = 1, a = 2) or something like that.

Is there any reason not to call it names like in transpose()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops forgot to update these docs. Hopefully it's more obvious why it's called template and not names when you realise it can also take positions.

#' Usually taken from the name of the first element of `x`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is very useful to mention that template can be an integer vector to do positional transposition.

Like, I didn't know that template could ignore names like this until i got to this example

ll <- list(
  list(x = 1, y = "one"),
  list(z = "deux", x = 2)
)
ll %>% list_transpose(template = 1)

#' @param simplify Should the result be simplified?
hadley marked this conversation as resolved.
Show resolved Hide resolved
#' * `TRUE`: simplify or die trying.
#' * `NA`: simplify if possible.
#' * `FALSE`: never try to simplify, always leaving as a list.
#'
#' Alternatively, a named list specifying the simplification by output column.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say by output column a few times, but there aren't any columns in this operation right?

#' @param ptype An optional vector prototype used to control the simplification.
#' Alternatively, a named list specifying the prototype by output column.
#' @param default A default value to use if a value is absent of `NULL`.
hadley marked this conversation as resolved.
Show resolved Hide resolved
#' Alternatively, a named list specifying the prototype by output column.
hadley marked this conversation as resolved.
Show resolved Hide resolved
#' @export
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random comment:

I tried this somewhere along the way while exploring this and this error didn't make much sense to me

x <- list(
  a = list(integer(), "x"),
  b = list(2L, "y")
)
list_transpose(x, default = list(a = NA))
#> Error in `match_template()` at purrr/R/list-transpose.R:76:2:
#> ! List `default` must be same length as numeric template

Also it looks like a call needs to be passed through

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now: Length of default (1) and template (2) must be the same when transposing by position.

#' @examples
#' # list_transpose() is useful in conjunction with safely()
#' x <- list("a", 1, 2)
#' y <- x %>% map(safely(log))
#' y %>% str()
#' # Put all the errors and results together
#' y %>% list_transpose() %>% str()
#' # Supply a default result to further simplify
#' y %>% list_transpose(default = list(result = NA)) %>% str()
#'
#' # list_transpose() will try to simplify by default:
#' x <- list(list(a = 1, b = 2), list(a = 3, b = 4), list(a = 5, b = 6))
#' x %>% list_transpose()
#' # use simplify = FALSE to always return lists:
#' x %>% list_transpose(simplify = FALSE) %>% str()
hadley marked this conversation as resolved.
Show resolved Hide resolved
#'
#' # Provide explicit template if you know which elements you want to extract
hadley marked this conversation as resolved.
Show resolved Hide resolved
#' ll <- list(
#' list(x = 1, y = "one"),
#' list(z = "deux", x = 2)
#' )
#' ll %>% list_transpose()
#' ll %>% list_transpose(template = c("x", "y", "z"))
#'
#' # And specify default if you want to simplify
hadley marked this conversation as resolved.
Show resolved Hide resolved
#' ll %>% list_transpose(c("x", "y", "z"), default = NA)
list_transpose <- function(x, template = NULL, simplify = NA, ptype = NULL, default = NULL) {
vec_check_list(x)
if (length(x) == 0) {
return(list())
}

template <- template %||%
names(x[[1]]) %||%
cli::cli_abort("First element of {.arg x} is unnamed, please supply {.arg template}.")
hadley marked this conversation as resolved.
Show resolved Hide resolved
if (!is.character(template)) {
cli::cli_abort("{.arg template} must be a character vector.")
}

simplify <- match_template(simplify, template)
default <- match_template(default, template)
ptype <- match_template(ptype, template)

out <- rep_named(template, list())
for (nm in template) {
res <- map(x, nm, .default = default[[nm]])
res <- list_simplify_internal(res,
simplify = simplify[[nm]] %||% NA,
ptype = ptype[[nm]],
error_arg = paste0("output `", nm, "`")
)
out[[nm]] <- res
}

out
}

match_template <- function(x, template, error_arg = caller_arg(x), error_call = caller_env()) {
if (is_bare_list(x) && is_named(x)) {
extra_names <- setdiff(names(x), template)
if (length(extra_names)) {
cli::cli_abort(
hadley marked this conversation as resolved.
Show resolved Hide resolved
"{.arg {error_arg}} contains unknown names: {.str {extra_names}}",
call = error_call
)
}
x
} else {
rep_named(template, list(x))
}
}
26 changes: 13 additions & 13 deletions R/reduce.R
Original file line number Diff line number Diff line change
Expand Up @@ -342,11 +342,15 @@ seq_len2 <- function(start, end) {
#' the accumulation, rather than using `.x[[1]]`. This is useful if
#' you want to ensure that `reduce` returns a correct value when `.x`
#' is empty. If missing, and `.x` is empty, will throw an error.
#'
#' @param .dir The direction of accumulation as a string, one of
#' `"forward"` (the default) or `"backward"`. See the section about
#' direction below.
#'
#' @param .simplify If `NA`, the default, the accumulated list of
#' results is simplified to an atomic vector if possible.
#' If `TRUE`, the result is simplified, erroring if not possible.
#' If `FALSE`, the result is not simplified, always returning a list.
#' @param .ptype If `simplify` is `TRUE`, optionally supply a vector prototype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' @param .ptype If `simplify` is `TRUE`, optionally supply a vector prototype
#' @param .ptype If `.simplify` is `TRUE`, optionally supply a vector prototype

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also works if .simplify is NA right?

#' to enforce the output types.
#' @return A vector the same length of `.x` with the same names as `.x`.
#'
#' If `.init` is supplied, the length is extended by 1. If `.x` has
Expand Down Expand Up @@ -454,26 +458,22 @@ seq_len2 <- function(start, end) {
#' ggtitle("Simulations of a random walk with drift")
#' }
#' @export
accumulate <- function(.x, .f, ..., .init, .dir = c("forward", "backward")) {
accumulate <- function(.x, .f, ..., .init, .dir = c("forward", "backward"), .simplify = NA, .ptype = NULL) {
.dir <- arg_match(.dir, c("forward", "backward"))
.f <- as_mapper(.f, ...)

res <- reduce_impl(.x, .f, ..., .init = .init, .dir = .dir, .acc = TRUE)
names(res) <- accumulate_names(names(.x), .init, .dir)

# It would be unappropriate to simplify the result rowwise with
# `accumulate()` because it has invariants defined in terms of
# `length()` rather than `vec_size()`
if (some(res, is.data.frame)) {
res
} else {
vec_simplify(res)
}
res <- list_simplify_internal(res, .simplify, .ptype, error_arg = "accumulated results")
res
}
#' @rdname accumulate
#' @export
accumulate2 <- function(.x, .y, .f, ..., .init) {
reduce2_impl(.x, .y, .f, ..., .init = .init, .acc = TRUE)
accumulate2 <- function(.x, .y, .f, ..., .init, .simplify = NA, .ptype = NULL) {
res <- reduce2_impl(.x, .y, .f, ..., .init = .init, .acc = TRUE)
res <- list_simplify_internal(res, .simplify, .ptype, error_arg = "accumulated results")
res
}

accumulate_names <- function(nms, init, dir) {
Expand Down
Loading