-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Glue collapse should always return a single string #295
Conversation
I don't have a strong opinion on this (at least not yet). But it feels like a big-ish decision worth some discussion. (I just kicked off cloud revdep checks for this branch, in case that reveals anything interesting.) This doesn't feel consistent with tidyverse recycling rules. Specifically the one that says anything combined with something of length-0 becomes length-0: tibble::tibble(x = integer(), y = 1L)
#> # A tibble: 0 × 2
#> # … with 2 variables: x <int>, y <int>
vctrs::vec_size_common(character())
#> [1] 0 Do we think that collapsing has nothing to do with tidyverse recycling rules and thus this doesn't matter? Or do we think of collapsing as always having a "shadow" empty string |
Revdep results: We checked 614 reverse dependencies (608 from CRAN + 6 from Bioconductor), comparing R CMD check results across CRAN and dev versions of this package.
These are the 3 packages with new problems: cpp11, lvmisc, tinkr. The cpp11 result looks obviously related to this. (Most of the packages that weren't checked are due to sf not getting installed.) cpp11
Run Newly broken
lvmisc
Run Newly broken
tinkr
Run Newly broken
|
Just adding a quick note, not a full review yet: I treat collapsing as a summary operation (i.e. So I feel like there is a strong support in favor of this change. The docs also say
Regarding recycling rules, I don't think this is quite the same thing. It is somewhat adjacent, because both are about size stability, and I think always returning a size 1 result regardless of the input is an important part of that size stability. Lastly, we may also consider an |
Noting also that this PR makes paste( collapse="|")
#> [1] ""
paste( collapse="|", recycle0 = TRUE)
#> [1] ""
paste({}, collapse="|")
#> [1] ""
paste({}, collapse="|", recycle0 = TRUE)
#> [1] ""
paste(character(), collapse="|")
#> [1] ""
paste(character(), collapse="|", recycle0 = TRUE)
#> [1] "" Created on 2023-03-13 with reprex v2.0.2.9000 |
Thanks @DavisVaughan. I can certainly get on board with this being the right thing to do. In which case, I'll just need to follow up on any associated breakages (which look to be fairly modest). |
In particular, this is the sort of principle I was looking for. I.e. no the tidyverse recycling rules don't guide us here, but rather this convention around a summary does:
|
In all 3 of cpp11, lvmisc, and tinkr, there is (at least) one function that basically makes a series of calls to Therefore, it was actually kind of handy that length-0 inputs effectively disappeared. With this PR, the fact that Here's a super simplified version of the type of function I'm describing. Notice the trailing library(glue)
packageVersion("glue")
#> [1] '1.6.2.9000'
foo <- function(main_stuff, ...) {
extra_junk <- list(...)
extra_junk <- glue_collapse(extra_junk, sep = "\n")
glue_collapse(c(main_stuff, extra_junk), sep = "\n")
}
foo("Hello!", "and", "Goodbye!")
#> Hello!
#> and
#> Goodbye!
unclass(foo("Hello!", "and", "Goodbye!"))
#> [1] "Hello!\nand\nGoodbye!"
foo("Yo!")
#> Yo!
unclass(foo("Yo!"))
#> [1] "Yo!\n" It makes me wonder if we should offer a way of telling input <- c("", "hey", "", "", "I herd", "", "you like", "whitespace")
glue_collapse(input, sep = "\n")
#>
#> hey
#>
#>
#> I herd
#>
#> you like
#> whitespace
glue_collapse(input[nzchar(input)], sep = "\n")
#> hey
#> I herd
#> you like
#> whitespace To @DavisVaughan's point about |
I also note the extra newlines problem goes away if one does "combine then summarize" instead of "summarize, combine, then summarize again". But in real applications sometimes you really have to do the latter. library(glue)
packageVersion("glue")
#> [1] '1.6.2.9000'
foo <- function(main_stuff, ...) {
extra_junk <- list(...)
extra_junk <- glue_collapse(extra_junk, sep = "\n")
#glue_collapse(c(main_stuff, extra_junk), sep = "\n")
glue_collapse(c(main_stuff, ...), sep = "\n")
}
foo("Hello!", "and", "Goodbye!")
#> Hello!
#> and
#> Goodbye!
unclass(foo("Hello!", "and", "Goodbye!"))
#> [1] "Hello!\nand\nGoodbye!"
foo("Yo!")
#> Yo!
unclass(foo("Yo!"))
#> [1] "Yo!" |
This isn't exactly the same, but I think it's in the same ballpark: I see a lot of people write code like this: paste0(c("Hello!", "and", "Goodbye!"), collapse = "\n") But that's actually subtly wrong because it puts a new line in between each line, not at the end of each line. In most cases what I think you actually want is this: paste0(c("Hello!", "and", "Goodbye!"), "\n", collapse = "") So I don't find that example particularly compelling as a reason to break the very strong convention of "summary functions always return a scalar". |
Somewhat motivating find: we actually have |
So @hadley you're basically arguing the exact opposite of what I was seeing in those revdeps. I think it's because the affected But I'm going to merge this and make the relevant PRs elsewhere. Thanks for the discussion. |
I wasn't proposing to not return a scalar. I was proposing some sort of |
* Respond to changes proposed in tidyverse/glue#295 * Use the glue PR * Remove glue remote, put back in Suggests (not Imports) * Skip empty string tests for insufficient glue version
Fixes #88