Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glue collapse should always return a single string #295

Merged
merged 6 commits into from
Mar 17, 2023
Merged

Conversation

hadley
Copy link
Member

@hadley hadley commented Jan 26, 2023

Fixes #88

@hadley hadley requested a review from jennybc January 26, 2023 23:49
@jennybc
Copy link
Member

jennybc commented Mar 13, 2023

I don't have a strong opinion on this (at least not yet). But it feels like a big-ish decision worth some discussion.

(I just kicked off cloud revdep checks for this branch, in case that reveals anything interesting.)

This doesn't feel consistent with tidyverse recycling rules. Specifically the one that says anything combined with something of length-0 becomes length-0:

tibble::tibble(x = integer(), y = 1L)
#> # A tibble: 0 × 2
#> # … with 2 variables: x <int>, y <int>
vctrs::vec_size_common(character())
#> [1] 0

Do we think that collapsing has nothing to do with tidyverse recycling rules and thus this doesn't matter?

Or do we think of collapsing as always having a "shadow" empty string "" as part of the collapsees?

@jennybc
Copy link
Member

jennybc commented Mar 13, 2023

Revdep results:

We checked 614 reverse dependencies (608 from CRAN + 6 from Bioconductor), comparing R CMD check results across CRAN and dev versions of this package.

  • We saw 3 new problems
  • We failed to check 29 packages

These are the 3 packages with new problems: cpp11, lvmisc, tinkr.

The cpp11 result looks obviously related to this. The other two will require a bit more analysis. lvmisc and tinkr are also related to this PR. I have fixes in local PRs, which I will make into actual PRs once I merge this.

(Most of the packages that weren't checked are due to sf not getting installed.)

cpp11

Run revdepcheck::cloud_details(, "cpp11") for more info

Newly broken

  • checking tests ... ERROR
      Running ‘testthat.R’
    Running the tests in ‘tests/testthat.R’ failed.
    Last 13 lines of output:
       3.   └─base::seq.default(mid, end)
      ── Failure ('test-register.R:222:5'): generate_cpp_functions: returns the empty string if there are no functions ──
      generate_cpp_functions(funs) (`actual`) not equal to character() (`expected`).
      
      `actual`:   ""
      `expected`:   
      ── Failure ('test-register.R:391:5'): generate_r_functions: returns the empty string if there are no functions ──
      generate_r_functions(funs) (`actual`) not equal to character() (`expected`).
      
      `actual`:   ""
      `expected`:   
      
      [ FAIL 3 | WARN 0 | SKIP 2 | PASS 104 ]
      Error: Test failures
      Execution halted
    

lvmisc

Run revdepcheck::cloud_details(, "lvmisc") for more info

Newly broken

  • checking tests ... ERROR
      Running ‘testthat.R’
    Running the tests in ‘tests/testthat.R’ failed.
    Last 13 lines of output:
      err$message (`actual`) not equal to glue::glue("The method `my_fun` is not yet implemented \\\n      for an object of class `my_class`.") (`expected`).
      
      lines(actual) vs lines(expected)
        "The method `my_fun` is not yet implemented for an object of class `my_class`."
      - ""
      ── Failure ('test-abort.R:109:5'): abort_no_method_for_class() works in objects with multiple classes ──
      err$message (`actual`) not equal to glue::glue("The method `my_fun` is not yet implemented \\\n       for an object of classes `my_class_1` and `my_class_2`.") (`expected`).
      
      lines(actual) vs lines(expected)
        "The method `my_fun` is not yet implemented for an object of classes `my_class_1` and `my_class_2`."
      - ""
      
      [ FAIL 2 | WARN 18 | SKIP 7 | PASS 238 ]
      Error: Test failures
      Execution halted
    

tinkr

Run revdepcheck::cloud_details(, "tinkr") for more info

Newly broken

  • checking tests ... ERROR
      Running ‘testthat.R’
    Running the tests in ‘tests/testthat.R’ failed.
    Last 13 lines of output:
      
      `lines(actual[[2]])` is a character vector ('## Dataset', '', 'The data used:', '[data](https://example.com)', '')
      `lines(expected[[2]])` is absent
      
      [ FAIL 14 | WARN 0 | SKIP 12 | PASS 92 ]
      Deleting unused snapshots:
      • class-yarn/yarn-kilroy.md
      • class-yarn/yarn.Rmd
      • class-yarn/yarn.md
      • to_md/new-code-chunk.Rmd
      • to_md/table.md
      • to_md/to_md-works-for-Rmd.Rmd
      • to_md/to_md-works.md
      Error: Test failures
      Execution halted
    

@DavisVaughan
Copy link
Member

DavisVaughan commented Mar 14, 2023

Just adding a quick note, not a full review yet:

I treat collapsing as a summary operation (i.e. ?Summary) like min()/max()/any()/all(), which always return a size 1 result regardless of their inputs. i.e. any(logical()) == FALSE.

So I feel like there is a strong support in favor of this change.

The docs also say

glue_collapse() collapses a character vector of any length into a length 1 vector


Regarding recycling rules, I don't think this is quite the same thing. It is somewhat adjacent, because both are about size stability, and I think always returning a size 1 result regardless of the input is an important part of that size stability.


Lastly, we may also consider an na_rm argument (since summary functions generally always have one) that removes missing values from x before collapsing.

@jennybc
Copy link
Member

jennybc commented Mar 14, 2023

Noting also that this PR makes glue_collapse() work (more) like paste(collapse =).

paste(    collapse="|")
#> [1] ""
paste(    collapse="|", recycle0 = TRUE)
#> [1] ""
paste({}, collapse="|")
#> [1] ""
paste({}, collapse="|", recycle0 = TRUE)
#> [1] ""
paste(character(), collapse="|")
#> [1] ""
paste(character(), collapse="|", recycle0 = TRUE)
#> [1] ""

Created on 2023-03-13 with reprex v2.0.2.9000

@jennybc
Copy link
Member

jennybc commented Mar 14, 2023

Thanks @DavisVaughan.

I can certainly get on board with this being the right thing to do. In which case, I'll just need to follow up on any associated breakages (which look to be fairly modest).

jennybc added a commit to r-lib/cpp11 that referenced this pull request Mar 14, 2023
@jennybc
Copy link
Member

jennybc commented Mar 14, 2023

In particular, this is the sort of principle I was looking for. I.e. no the tidyverse recycling rules don't guide us here, but rather this convention around a summary does:

I treat collapsing as a summary operation (i.e. ?Summary) like min()/max()/any()/all(), which always return a size 1 result regardless of their inputs. i.e. any(logical()) == FALSE.

@jennybc
Copy link
Member

jennybc commented Mar 14, 2023

In all 3 of cpp11, lvmisc, and tinkr, there is (at least) one function that basically makes a series of calls to glue_collapse() or, effectively, nested calls.

Therefore, it was actually kind of handy that length-0 inputs effectively disappeared.

With this PR, the fact that glue_collapse() returns "" for the "no input" case creates new, gratuitous newlines in cpp11, lvmisc, and tinkr.

Here's a super simplified version of the type of function I'm describing. Notice the trailing \n when I call foo() without providing any extra_junk via .... That trailing \n is not present if actual extra_junk is passed in.

library(glue)
packageVersion("glue")
#> [1] '1.6.2.9000'

foo <- function(main_stuff, ...) {
  extra_junk <- list(...)
  extra_junk <- glue_collapse(extra_junk, sep = "\n")
  
  glue_collapse(c(main_stuff, extra_junk), sep = "\n")
}

foo("Hello!", "and", "Goodbye!")
#> Hello!
#> and
#> Goodbye!

unclass(foo("Hello!", "and", "Goodbye!"))
#> [1] "Hello!\nand\nGoodbye!"

foo("Yo!")
#> Yo!

unclass(foo("Yo!"))
#> [1] "Yo!\n"

It makes me wonder if we should offer a way of telling glue_collapse() to drop length-0 inputs. Something along these lines:

input <- c("", "hey", "", "", "I herd", "", "you like", "whitespace")

glue_collapse(input, sep = "\n")
#> 
#> hey
#> 
#> 
#> I herd
#> 
#> you like
#> whitespace

glue_collapse(input[nzchar(input)], sep = "\n")
#> hey
#> I herd
#> you like
#> whitespace

To @DavisVaughan's point about na.rm, maybe this is the more relevant feature for glue_collapse()?

@jennybc
Copy link
Member

jennybc commented Mar 15, 2023

I also note the extra newlines problem goes away if one does "combine then summarize" instead of "summarize, combine, then summarize again". But in real applications sometimes you really have to do the latter.

library(glue)
packageVersion("glue")
#> [1] '1.6.2.9000'

foo <- function(main_stuff, ...) {
  extra_junk <- list(...)
  extra_junk <- glue_collapse(extra_junk, sep = "\n")
  
  #glue_collapse(c(main_stuff, extra_junk), sep = "\n")
  glue_collapse(c(main_stuff, ...), sep = "\n")
}

foo("Hello!", "and", "Goodbye!")
#> Hello!
#> and
#> Goodbye!

unclass(foo("Hello!", "and", "Goodbye!"))
#> [1] "Hello!\nand\nGoodbye!"

foo("Yo!")
#> Yo!

unclass(foo("Yo!"))
#> [1] "Yo!"

@hadley
Copy link
Member Author

hadley commented Mar 15, 2023

This isn't exactly the same, but I think it's in the same ballpark: I see a lot of people write code like this:

paste0(c("Hello!", "and", "Goodbye!"), collapse = "\n")

But that's actually subtly wrong because it puts a new line in between each line, not at the end of each line. In most cases what I think you actually want is this:

paste0(c("Hello!", "and", "Goodbye!"), "\n", collapse = "")

So I don't find that example particularly compelling as a reason to break the very strong convention of "summary functions always return a scalar".

@DavisVaughan
Copy link
Member

Somewhat motivating find: we actually have glue_collapse_data() that post processes the glue_collapse() result to force this size stability by returning "" in the empty case
https://github.com/r-lib/cpp11/blob/2ddc12f6ab268542c88f4aa24a500682ed260b72/R/utils.R#L1-L7

@jennybc
Copy link
Member

jennybc commented Mar 15, 2023

So @hadley you're basically arguing the exact opposite of what I was seeing in those revdeps. I think it's because the affected glue_collapse() usages were mostly generating some sort of message, where that trailing newline doesn't need to be explicit. I.e. we're more likely to want message("howdy") than message("howdy\n").

But I'm going to merge this and make the relevant PRs elsewhere. Thanks for the discussion.

@jennybc
Copy link
Member

jennybc commented Mar 15, 2023

So I don't find that example particularly compelling as a reason to break the very strong convention of "summary functions always return a scalar".

I wasn't proposing to not return a scalar. I was proposing some sort of compact = TRUE type of argument that says to drop empty strings, in the presence of at least 1 nonempty string.

@jennybc jennybc merged commit cbac82a into main Mar 17, 2023
@jennybc jennybc deleted the empty-glue-collapse branch March 17, 2023 05:32
romainfrancois pushed a commit to r-lib/cpp11 that referenced this pull request Mar 17, 2023
* Respond to changes proposed in tidyverse/glue#295

* Use the glue PR

* Remove glue remote, put back in Suggests (not Imports)

* Skip empty string tests for insufficient glue version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

glue_collapse() should always return length 1 vectors
3 participants