-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add .ptype
and .size
arguments, adjust NULL
and empty handling
#80
Add .ptype
and .size
arguments, adjust NULL
and empty handling
#80
Conversation
Only entirely missing rows are updated, which is inline with `vec_equal_na()`. It is entirely possible to update a missing row with a partially missing row.
If |
They both error, just with different errors: coalesce()
#> Error in `coalesce()`:
#> ! `...` must contain at least one input.
coalesce(NULL)
#> Error in `coalesce()`:
#> ! `..1` must be a vector, not NULL.
Yea this is what I was alluding to at the end of the second bullet point: "An alternative to this is to drop all Are you ok with |
This is more in line with our standard assumption that `fun(x, NULL, NULL)` is treated like `fun(x, , )`, which is essentially `fun(x)`.
Okay we now stick to this principle:
Giving us: coalesce()
#> Error in `coalesce()`:
#> ! `...` must contain at least one input.
coalesce(NULL)
#> Error in `coalesce()`:
#> ! `...` must contain at least one input.
coalesce(c(1, NA), NULL, 2)
#> [1] 1 2 I am fairly certain this is what you wanted, so I'll merge this. Feel free to follow up if not. |
Yeah, that's what I was thinking, thanks! But reading through #64 again, I do find your arguments there compelling, even if it would make |
It does mean you can't splice in all of the inputs at once anymore coalesce2 <- function(x, ...) {
funs::coalesce(x, ...)
}
vecs <- list(
c(1, 2, NA, NA, 5),
c(NA, NA, 3, 4, 5)
)
coalesce2(!!!vecs)
#> Error in !vecs: invalid argument type
funs::coalesce(!!!vecs)
#> [1] 1 2 3 4 5 I think the main question is deciding which of these two is the main use case:
The first favors a signature of The other way it is described by most of the SQL docs I found is as syntactic sugar for COALESCE(expression1, expression2, ..., expressionN) CASE
WHEN (expression1 IS NOT NULL) THEN expression1
WHEN (expression2 IS NOT NULL) THEN expression2
...
ELSE expressionN
END |
Assuming you have GitHub code search: Approximately 20-ish distinct uses of |
Oh I forgot about the FWIW I do think in I guess we could still make the first argument of |
Base R's I'd argue that ideally The main difference between those 4 and However, if we stick with computing the common type for the 4 functions mention above, then this comment would indeed feel inconsistent, as you suggested:
|
Ok, that reasoning sounds solid to me. One more thought: we should probably add a |
Re: There are also a number of cases that code search brought up where people are splicing in data frames. Like they used Here is one example: |
Oh bummer 😞 |
* Port tidyverse/funs#80 to `coalesce()` * NEWS bullet * Use `vec_case_when()` infrastructure in `coalesce()` * Backtick `NULL` * Tweak parameter documentation one more time
Closes #48
Closes #64
A few changes to
coalesce()
as I prep to port this over to dplyr.For #48, over slack we decided that we'd rather keep
coalesce()
simple and always use vctrs invariants, so when given data frames as input it should only ever update entirely missing rows (that is whatvec_equal_na()
thinks is "missing"). If you want to coalesce by column then you should really justmap2()
orpmap()
over the data frames, callingcoalesce()
as the function to use on each column. I have added a few tests to reflect this expectation.For #64, I had suggested we switch to
coalesce(x, ...)
and always cast to the type ofx
and recycle to the size ofx
. I did some research, and the SQL standards state thatcoalesce()
should cast to the common type of all of its inputs. See this link for one example where that is documented https://docs.microsoft.com/en-us/sql/t-sql/language-elements/coalesce-transact-sql?view=sql-server-ver15#return-types. So I think we should keep that behavior by default. It sort of makes sense if you have >=2 full vectors and you don't really have a "primary" input, likecoalesce(x, y, z)
. My original use case was something likecoalesce(x, 0)
, where you'd want to retain the type ofx
.To optionally enforce type and size stability in cases like
coalesce(x, 0)
, I've added.ptype
and.size
arguments. I think this is good enough for what I wanted.I've added
list_check_all_vectors()
to check that all inputs are vectors. This disallowsNULL
values in...
. I think this is a good idea becausecoalesce(NULL, .size = 5)
would need to return something of size 5 and there is no way to do that without typed inputs. An alternative to this is to drop allNULL
inputs at the beginning, but I have a feeling aNULL
input would be a user error in this function, so I prefer an error. We can change to this approach in the future if this proves to be too strict (better to start strict, I think?).I've required at least 1 input, rather than returning
NULL
. This is the current dplyr behavior and makes sense becausecoalesce(.size = 5)
would have to return something of size 5, and there is no way to do that if no inputs are given.