-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mutate(.by_row =)
, reframe(.by_row =)
, and possibly filter(.by_row =)
#6660
Comments
Sounds good! |
I like this a lot. Reading the first part, I thought about a So, now wandering in another direction, which I know is a bit silly, but what if |
I like the idea of automatically wrapping scalars in a list. This is the sort of things that vctrs makes possible in a predictable and consistent manner. However, I feel like we should commit to the argument syntax of So in this case I'd like us to consider using an argument. It could be a simple boolean: df |> mutate(foo(bar), .by = baz) # By group
df |> mutate(foo(bar), .by_rows = TRUE) # By row We could also add a variant of # Like `.by_row` but `[` subsetting
df |> mutate(foo(bar), .by_vector = 1:n())
df |> summarise(foo(bar), .by_vector = cut(baz, 3)) In this case we'd end up with a trio of complementary arguments that change the semantics of evaluation: I think using modifiers instead of variants fits the general evolution of the dplyr API, e.g. we've removed the suffixed variants of the verbs in favour of |
I'd be open to I'm also slightly more empathetic to the idea of also adding this to |
mutate_row()
and reframe_row()
mutate(.by_row =)
, reframe(.by_row =)
, and possibly filter(.by_row =)
This comment was marked as resolved.
This comment was marked as resolved.
I see that my suggestion for allowing Anyway, just wanted to voice this. In the end I trust your judgment and will hold my peace regarding this issue forevermore. Thank for the ongoing dedication to and improvement of |
Maybe Currently if there is a column which is unique I will use that or if I am sure that there are no duplicate rows then |
Had been considering opening a feature request to add Now that I have read through @DavisVaughan's proposal, I like that better. While I would be happy with however it got done, I don't think adding a I would agree that creating new functions would not be good if it meant every function that took I believe both |
Yes but the suggestion was to add |
What I was trying to get at is other tidy select operators (e.g., Adding
etc. |
To me it makes more sense to extend tidy-select than to introduce a bunch of special case functions. The extension can have uses in other contexts too as these examples show. |
IMHO, this is the best argument. There will be defensive programming either way, throwing an error either if As using |
You have to be a bit careful with
|
@ggrothendieck I understand the idea and the appeal, but I think there may be some devil in the details in figuring out
Specifically, adding pseudo columns means you have to deal with the fact that
For example, let's take the algebra portion. How do these operators work
It might be tempting to say it is just like a regular column. Doing this then mean something like
is now also computing the cummulative sum of the pseudo row number column (as And once it has computed this value, what is it supposed to do with it? It can't write it back to the pseudo column because it doesn't actually exist. Does it create a new column named And then what about things like |
I guess the other question about An answer of no might suggest that really it is more about |
The examples already given show examples of Also Dan's post points out that |
to
|
@twhitehead, I know that. The point of my comment was that since |
Was working on some
This makes me think, with regard to @DavisVaughan's second point about turning something that would be an error into something sensible
maybe it made also make sense for the ones that have to return single rows (the
|
Related to #4723
With the introduction of
.by
, it seems reasonable to once again reconsiderrowwise()
as well. I think we are convinced that the idea of rowwise is useful, but the implementation could possibly be improved. A few pain points:rowwise()
is a form of persistent grouping, but you rarely want it on for more than 1 operationungroup()
is an odd verb for turning off rowwise behaviorsummarise(model = list(lm(...)))
, i.e. thelist()
wrapping is manualrowwise_df
class is difficult and error prone for usmutate()
andreframe()
.With that in mind, I'd like to suggest a two-part replacement for
rowwise()
:mutate_row()
andreframe_row()
. These become the only two places in dplyr where rowwise behavior is applicable.mutate()
,summarise()
,reframe()
,mutate_row()
, andreframe_row()
the ability to automatically wrap scalars in a list. i.e. ifvec_is(elt)
isFALSE
, wrap automatically into a list. This means that value could never exist in a data frame column as is, so there is no ambiguity about wrapping and it is fairly easy to explain.Those two proposals result in the following new patterns:
This two part proposal has the very nice property that the difference between
mutate()
andmutate_row()
becomes purely about column access:mutate()
accesses columns usingvec_slice()
/[
mutate_row()
accesses columns usingvec_slice2()
/[[
In other words, rowwise has nothing to do with the output type of each column expression, and you still get useful results.
In terms of other invariants, there is one related to
vec_size()
:mutate_row()
requires each expression to return an element ofvec_size() == 1
reframe_row()
allows each expression to return an element of any sizeOther niceties:
.by
being in the verb)Extra notes:
mutate_row()
andreframe_row()
won't get.by
because they operation "by row".by
about rowwise behavior, like.by = .row
or something. We want.by
to be pure tidyselect. Plus this special behavior would only apply formutate()
andreframe()
and that would be very confusing.summarise_row()
. This would have the exact same semantics asmutate_row()
, but would just drop unused columns (which can mostly be done with.keep
inmutate_row()
). In particularsummarise_row()
andmutate_row()
would both have to have thevec_size() == 1
invariant from above, so we really don't need both.filter_row()
. The only useful thing I can think of is something likefilter_row(!is.null(model))
for filtering outNULL
list elements. But you can do that way more efficiently with an ungrouped call tofilter(!funs::is_na(model))
.mutate_row()
andreframe_row()
mostly have the semantics of the wrappers below, but this doesn't do the automatic list-wrapping of scalars:The text was updated successfully, but these errors were encountered: