Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't "double-print" in output= tests #6188

Open
MichaelChirico opened this issue Jun 18, 2024 · 5 comments
Open

Don't "double-print" in output= tests #6188

MichaelChirico opened this issue Jun 18, 2024 · 5 comments
Labels

Comments

@MichaelChirico
Copy link
Member

Whenever output= is used in test(), we force print(x):

out = capture.output(print(x <- suppressMessages(withCallingHandlers(tryCatch(x, error=eHandler), warning=wHandler, message=mHandler))))

We can consider not doing so if x already contains a call to print(), e.g. these 27 cases:

library(lintr)
l <- make_linter_from_xpath(
  "//SYMBOL_FUNCTION_CALL[text()='test']
    /parent::expr
    /following-sibling::expr[expr/SYMBOL_FUNCTION_CALL[text()='print']]
    /following-sibling::SYMBOL_SUB[text()='output']",
  "xxx"
)
lint_dir("inst/tests", l(), pattern="Rraw") |>
  as.data.frame() |>
  _[, "line"] |>
  writeLines()
test(3, print(DT), output=c("   x             y", "1: 1 <ex_class[3]>",     "2: 2 <ex_class[3]>"))
test(779, print(DT[,sum(a),by=b]), output="   b V1\n1: a  3\n2: b  3$")
test(1006, print(as.data.table(M), nrows=10), output="gear NA.*1: 21.0")
    test(1037.103, print(ans[c(1,1326)]), output="state.*income.*retailprice.*percent_15_19.*beercons.*variable.*value.*1.*9.6.*89.34.*smoking88.*smoking00.*41.6")
  test(1037.301, print(melt(DT, id.vars="month", verbose=TRUE)), output="'measure.vars' is missing.*Assigned 'measure.vars' are [[]Record high, Average high, Daily mean, Average low, ...[]].*1:.*Jan.*Record high.*12.8.*108:.*Sep.*sunshine hours.*174.1")
test(1056, print(DT, digits=2), output="   x       y logy\n1: a      18  2.9\n2: a      79  4.4\n3: a     337  5.8")
test(1057, print(DT, digits=2, big.mark=","), output="   x         y logy\n1: a        18  2.9.*6: b    26,556 10.2\n7: c   113,811 11.6")
test(1636.2, print(dt), output="1: 1 <function[1]>")
test(1636.4, print(dt), output="2: 2 <function[1]>")
test(1636.6, print(dt), output="2: 2 <function[1]>")
test(1765.1, print(IDateTime(x)), output=".*idate.*itime.*1: 2017-03-1[67]",
test(1893.1, print(DT), output=txt<-c("   A    B", "1: 1  FOO", "2: 2     ", "3: 3 <NA>", "4: 4   NA"))
test(1893.2, print(DF), output=txt)
test(1893.3, print(fread(txt,na.strings="")), output="A    B\n1: 109   MT\n2:   7    N\n3:  11   NA\n4:  41   NB\n5:  60   ND\n6:   1     \n7:   2 <NA>\n8:   3   NA\n9:   4   NA")
test(2015.4, print(DT), output="V1.*1:[ ]+<NA>.*2:[ ]+1.*101:[ ]+100.*102:[ ]+<NA>")
test(2015.5, print(DT), output="V1.*1:[ ]+<NA>.*2:[ ]+1.*4:[ ]+3.*5:[ ]+<NA>")
test(2089.3, print(DF), output="<multi-column>")
test(2114.4, print(res), output="date.*-0.830")
test(2130.01, print(DT), output=c("   x                 y", "1: 1 <data.table[1x2]>", "2: 2 <data.table[1x2]>"))
test(2130.02, print(DT), output=c("   x         y", "1: 1 <list[2]>",         "2: 2 <list[2]>"))
test(2130.101, print(DT, timezone=TRUE), output='UTC')
test(2130.11, print(data.table(e = expression(1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13))), output = '1 + 2 + 3')
test(2130.13, print(DT), output="myclass2130:foo.*myclass2130:bar")
test(2130.14, print(DT), output="myclass2130:foo.*myclass2130:bar") # because length 1 from format but needs to be length(x)
test(2130.15, print(DT), output="All hail foo")  # e.g. sf:::format.sfc rather than sf:::format.sfg on each item
test(2130.16, print(DT), output="myclass2130:foo.*myclass2130:bar")
test(2222, print(DT), output="A.*3")

Note that cases like print(DT), output=... could be linted into just DT, output=..., but cases like print(DT, ...) with non-default options need explicit print() invocation.

OTOH, maybe it's overkill / won't make the code any easier to understand to pursue this.

@Nj221102
Copy link
Contributor

Nj221102 commented Jun 18, 2024

Should we remove Print() from the already existing tests or this advice is just for the future use?

@MichaelChirico
Copy link
Member Author

Honestly not sure. I think leaving it in existing tests is fine. This is about if we can avoid running print() "under the hood" in test() when output= is provided and x= is a call to print(). Not sure it will actually improve the implementation at all, just a thought that it's odd to actually wind up running print() twice.

@joshhwuu
Copy link
Member

joshhwuu commented Aug 7, 2024

IMO, it would be healthy to only print once -- I vaguely recall having trouble finding small differences when testing with print and output with large tables, as the test failure would "expect" a double print, which certainly looks worse than it really is... A small example I was able to make here:

test(2279.1, print(data.table(a = 1)), output="   a\n1: 2")

Test 2279.1 did not produce correct output:
  Expected: <<   a\n1: 2>>
  Observed: <<   a\n1: 1\n   a\n1: 1>>

It makes things a little harder to read as content gets bigger, requiring the reader to scroll horizontally a bit more.

Anyway, I believe separating not calling print twice when x already contains a call to print should be pretty easy, I'd be happy to write up a quick PR. (Also there may be a slight speed improvement for R CMD Check by calling print less? Although this is a constant and probably wouldn't matter in the big picture, might do some benchmarking)

@MichaelChirico
Copy link
Member Author

Great motivating example. For now, I have #6266 in mind -- even if we fix the existing test scripts, it will be easy for them to backslide unless we have simple static checks in place. I also have #5830 and #5845 in mind -- while making this change does have the nice benefits you cite, it's not particularly urgent & will generate some git conflict churn in the meantime. So I would still prioritize burning down the PR queue first.

@joshhwuu
Copy link
Member

joshhwuu commented Aug 9, 2024

while making this change does have the nice benefits you cite, it's not particularly urgent & will generate some git conflict churn in the meantime. So I would still prioritize burning down the PR queue first.

Right, this makes sense to me. If someone (probably future me) ever decides to pursue this, here's what I did to evaluate x without printing if it already contains a call to print:

# test.data.table.R
if (any(grepl("^print\\b", deparse(xsub)))) {
      out = capture.output(x <- suppressMessages(withCallingHandlers(tryCatch(x, error=eHandler), warning=wHandler, message=mHandler)))
    } else {
      out = capture.output(print(x <- suppressMessages(withCallingHandlers(tryCatch(x, error=eHandler), warning=wHandler, message=mHandler))))
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants