-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework flattening #912
Rework flattening #912
Conversation
* Use `modify_if` * Use `vec_unchop()` * Test * Add `name_spec` argument * Simplify `splice_if` using `name_spec`
@@ -129,8 +129,12 @@ export(lift_lv) | |||
export(lift_vd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level thought:
I really like list_c()
, list_cbind()
, and list_rbind()
as the main high level functions here.
Since flat-map and dfc/dfr operations are somewhat rare, I would actually be very happy with:
x |> map(f) |> list_c()
x |> map(f) |> list_rbind()
x |> map(f) |> list_cbind()
i.e. no need for map_c()
, map_cbind()
, and map_rbind()
.
I feel like those may clutter the map API for little benefit, especially considering you need to add map2/pmap variants, which greatly expands the number of options.
I really would love if all map*()
functions had the invariant of length(x) == length(map*(x, f))
, and we'd get there eventually by removing map_dfr/dfc()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list_c/rbind/cbind()
tools just feel semantically separate from map()
, since they are something you do after the map or on a generic list.
Like, list_c()
and friends will be very useful to call on furrr::future_map()
results too, but I'd rather not expand the furrr API with future_map_c()
, future_map_cbind()
, map2/pmap variants, etc if we don't have to, and it doesn't really seem like we need to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the other reason I think these are separate is because map_c()
physically can't be faster than sequential calls to map() + list_c()
, because you need all of the results from the map()
to be able to compute the output size that we'd get from list_c()
, and to me that suggests they shouldn't be merged together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed. I think the length(x) == length(map*(x, f))
invariant is key.
x |> map_vec(f)
feels a bit like this because it'll begin by being implemented as x |> map(f) |> list_simplify()
but I think that's more of a current implementation detail. Especially if you do supply a ptype
we could potentially implement map_vec()
much more efficiently.
R/list-combine.R
Outdated
check_is_list(x) | ||
vctrs::vec_unchop(x, ptype = ptype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the vec_unchop()
error here bad enough that you needed check_is_list()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left it for now for consistency with the others.
Co-authored-by: Davis Vaughan <davis@rstudio.com>
Conflicts: NEWS.md R/map.R man/imap.Rd man/map.Rd man/map2.Rd man/pmap.Rd
R/list-combine.R
Outdated
#' @param ptype An optional prototype to ensure that the output type is always | ||
#' the same. | ||
#' @param id By default, `names(x)` are lost. Alternatively, supply a string | ||
#' and the names will be saved into a column with name `id`. If `id` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it sound like the column name is always named id
. Perhaps:
Alternatively, supply
id
with a column name into which the names will be saved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll use {id}
since that's a convention we use elsewhere and should be familiar to many readers.
R/list-combine.R
Outdated
vec_check_list(x) | ||
vctrs::vec_cbind(!!!x, .name_repair = name_repair, .size = size, .call = current_env()) | ||
} | ||
|
||
#' @export | ||
#' @rdname list_c | ||
list_rbind <- function(x, id = rlang::zap(), ptype = NULL) { | ||
vec_check_list(x) | ||
vctrs::vec_rbind(!!!x, .names_to = id, .ptype = ptype, .call = current_env()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to require each element to be a data frame, or do we embrace the way vec_rbind()
and vec_cbind()
convert vectors to data frames?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try requiring data frames.
x <- modify_if(x, vec_is_list, identity, .else = list) | ||
vec_unchop( | ||
x, | ||
ptype = list(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking this should be ptype = x
(the original x
) to preserve the input type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this ever be something than a list though? x
can't be a data frame, and it seems unlikely to be a list_of
unless it's a list_of<list>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh it has to be a list, which we check with vec_check_list()
. But it could be some kind of vctrs list. I agree that list_of
is unlikely until we start supporting more complex type definitions.
I think my concern is that it might make sense to preserve types in the future if we have more vctrs lists, but it will be difficult to change then, so better do it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assume a list()
can always be coerced to x
, even if x
is a subtype of list? I don't think we can, so this seems risky?
But I tried it anyway, and then I got:
list_flatten(list_of(list(1, 2, 3), list(4), list(4)))
#> Error:
#> ! Can't convert <list> to <list_of<list>>.
So I'm going to leave as is for this PR and we can revisit in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes. I think the proper way would be to initialise the ptype of the input to the full output size, then assign elements into it.
return(map(.x, .f, ...)) | ||
} | ||
|
||
# Should this be replaced with a generic way of figuring out atomic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we now use vec_is_list()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, tracking in #920
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
R/reduce.R
Outdated
#' map_dfr(~ tibble(value = .x, step = 1:100), .id = "simulation") %>% | ||
#' map_rbind(~ tibble(value = .x, step = 1:100), .id = "simulation") %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs an update, not map_rbind()
anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. IMO it reads better now that list_rbind()
is a separate step.
Conflicts: NEWS.md R/map.R man/map.Rd man/map2.Rd man/pmap.Rd
This is a large PR that reworks flattening, removing inconsistencies and switching to vctrs as a backend.
The key idea is to introduce a new family of "combining" functions:
list_c()
,list_rbind()
, andlist_cbind()
, which replaceflatten_lgl()
,flatten_int()
,flatten_dbl()
,flatten_chr()
(nowlist_c()
),flatten_dfc()
(list_cbind()
), andflatten_dfr()
(list_rbind()
). The new functions are straightforward wrappers around vctrs functions, but somehow feel natural in purrr to me.This leaves
flatten()
, which had a rather idiosyncratic interface. It's now been replaced bylist_flatten()
which now always removes a single layer of list hierarchy (and nothing else). While working on this I realised that this was actually whatsplice()
did, so overall this feels like a major improvement in naming consistency.With those functions in place we can deprecate
map_dfr()
andmap_dfc()
which are actually "flat" map functions because they combine, rather than simplify, the results. They have never actually belonged withmap_int()
and friends because they don't have the restriction that.f
needs to return a length-1 results. This also strongly implies thatflat_map()
would just bemap_c()
and is thus not necessary.map_dfc()
map_c()
map_dfr()
list_c()
,list_rbind()
, andlist_cbind()
flatten_dfr()
andflatten_dfc()
list_rbind()
andlist_cbind()
flatten()
,simplify()
and friends #900Updated to reflect that we no longer believe that
_cbind()
,_rbind()
and_c()
are necessary as they're very lightweight wrappers.