Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print Option: Print columns that fit in single console without wrapping #4074

Merged
merged 40 commits into from
Dec 18, 2019
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
c1a7c74
added "trunc.cols" option to print.data.table
Nov 22, 2019
e1c8767
commenting the code for print.data.table dt_width()
Nov 22, 2019
7bccbbd
fixed typo in tunc.cols option and update docs
Nov 22, 2019
7ebf6a7
typo in print.data.table
Nov 22, 2019
a89dda8
using x object to get widths and then using toprint in trunc.cols
Nov 22, 2019
fd9c8d9
fix missing message about vars not printed & small adjustment to dt_w…
Nov 22, 2019
90a9ff1
fixed missing arg to get widths
Nov 22, 2019
a9f47f0
added unit tests
Nov 23, 2019
debbbc6
resolved attribute issue with printing with trunc.cols with nrow(DT) = 1
Nov 23, 2019
22f4c9e
updated print.data.table doc
Nov 23, 2019
c5c0fe0
added classes to trunc.col message
Nov 27, 2019
ea4051f
updated tests for new print trunc.cols
Nov 27, 2019
3408884
added trunc.cols to NEWS.md
Nov 27, 2019
def64c7
Merge pull request #1 from TysonStanley/print_test
Nov 27, 2019
03f0356
expanded NEWS item
MichaelChirico Nov 27, 2019
2bc1544
using toprint with dt_width to avoid coersion (and avoid using the fu…
Nov 28, 2019
f604791
fixed missing arg in dt_width
Nov 28, 2019
510b9a4
added \n to message
Nov 28, 2019
ca6105a
found lingering x intead of print in trunc.cols
Nov 28, 2019
1ef864c
fixed issue of additional space before comma in message
Nov 28, 2019
4631758
small adjustments to trunc.cols message
Nov 28, 2019
4927f69
drop=FALSE!!! and error message for too small of console
Nov 28, 2019
0624921
fix case where class=TRUE and col.names = "none" and update tests
Nov 28, 2019
3d0affa
add more trunc.col=TRUE tests
Nov 28, 2019
e6cc430
integers and Ls
Nov 28, 2019
4c467b2
Revisions to trunc.cols option in print.data.table
Nov 28, 2019
8025bad
added unit test for when console too small to print first column
Nov 28, 2019
bd11f04
simplified dt_width function
Nov 28, 2019
0869c95
faster rowname padding and simplified widths calculation
Nov 28, 2019
687f5ff
replace base::ifelse with pmax in dt_width
Nov 28, 2019
7848984
minor fix to dt_width and prints message only when console too small
Nov 28, 2019
0289bee
update message with more proper ngettext and a few minor adjustments
Nov 28, 2019
32f3b64
fixed typos in tests and added one
Nov 28, 2019
d85ec9a
use brackify to truncate large # of columns
MichaelChirico Nov 28, 2019
037a5c9
update to tests
Nov 28, 2019
f8df273
Merge branch 'master' of https://github.com/TysonStanley/data.table
Nov 28, 2019
13ced05
removed an extra space and fixed tests
Nov 28, 2019
0c002dc
renamed tests so they are in order
Nov 28, 2019
45e7c41
Merge branch 'master' into print_test
Nov 28, 2019
020d56f
renamed tests so they are in order and fixed 2122.10
Nov 28, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@

1. `DT[, {...; .(A,B)}]` (i.e. when `.()` is the final item of a multi-statement `{...}`) now auto-names the columns `A` and `B` (just like `DT[, .(A,B)]`) rather than `V1` and `V2`, [#2478](https://github.com/Rdatatable/data.table/issues/2478) [#609](https://github.com/Rdatatable/data.table/issues/609). Similarly, `DT[, if (.N>1) .(B), by=A]` now auto-names the column `B` rather than `V1`. Explicit names are unaffected; e.g. `DT[, {... y= ...; .(A=C+y)}, by=...]` named the column `A` before, and still does. Thanks also to @renkun-ken for his go-first strong testing which caught an issue not caught by the test suite or by revdep testing, related to NULL being the last item, [#4061](https://github.com/Rdatatable/data.table/issues/4061).

2. `print` method for `data.table`s gains `trunc.cols` argument (and corresponding option `datatable.print.trunc.cols`, default `FALSE`), [#1497](https://github.com/Rdatatable/data.table/issues/1497), part of [#1523](https://github.com/Rdatatable/data.table/issues/1523). This argument makes it possible to print only as many columns as fit in the console without wrapping to new lines (e.g., the first 5 of 80 columns). In the case of truncation, a message is printed that states the count and the names of the variables not shown. When `class = TRUE` in `print.data.table()`, the message also contains the classes of the variables. `data.table` has always automatically truncated _rows_ of a table for efficiency (e.g. printing 10 rows instead of 10 million); in the future, we may do the same for _columns_ (e.g., 10 columns instead of 20,000) by changing the default for this argument. Thanks to @nverno for the initial suggestion and to @TysonStanley for the PR.

## BUG FIXES

## NOTES
Expand Down
5 changes: 3 additions & 2 deletions R/onLoad.R
Original file line number Diff line number Diff line change
Expand Up @@ -71,14 +71,15 @@
# In fread and fwrite we have moved back to using getOption's default argument since it is unlikely fread and fread will be called in a loop many times, plus they
# are relatively heavy functions where the overhead in getOption() would not be noticed. It's only really [.data.table where getOption default bit.
# Improvement to base::getOption() now submitted (100x; 5s down to 0.05s): https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17394
opts = c("datatable.verbose"="FALSE", # datatable.<argument name>
opts = c("datatable.verbose"="FALSE", # datatable.<argument name>
"datatable.optimize"="Inf", # datatable.<argument name>
"datatable.print.nrows"="100L", # datatable.<argument name>
"datatable.print.topn"="5L", # datatable.<argument name>
"datatable.print.class"="FALSE", # for print.data.table
"datatable.print.rownames"="TRUE", # for print.data.table
"datatable.print.colnames"="'auto'", # for print.data.table
"datatable.print.colnames"="'auto'", # for print.data.table
"datatable.print.keys"="FALSE", # for print.data.table
"datatable.print.trunc.cols"="FALSE", # for print.data.table
"datatable.allow.cartesian"="FALSE", # datatable.<argument name>
"datatable.dfdispatchwarn"="TRUE", # not a function argument
"datatable.warnredundantby"="TRUE", # not a function argument
Expand Down
58 changes: 58 additions & 0 deletions R/print.data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,13 @@ print.data.table = function(x, topn=getOption("datatable.print.topn"),
row.names=getOption("datatable.print.rownames"),
col.names=getOption("datatable.print.colnames"),
print.keys=getOption("datatable.print.keys"),
trunc.cols=getOption("datatable.print.trunc.cols"),
quote=FALSE,
timezone=FALSE, ...) {
# topn - print the top topn and bottom topn rows with '---' inbetween (5)
# nrows - under this the whole (small) table is printed, unless topn is provided (100)
# class - should column class be printed underneath column name? (FALSE)
# trunc.cols - should only the columns be printed that can fit in the console? (FALSE)
if (!col.names %chin% c("auto", "top", "none"))
stop("Valid options for col.names are 'auto', 'top', and 'none'")
if (col.names == "none" && class)
Expand Down Expand Up @@ -87,7 +89,19 @@ print.data.table = function(x, topn=getOption("datatable.print.topn"),
toprint = rbind(abbs, toprint)
rownames(toprint)[1L] = ""
}
if (isFALSE(class) || (isTRUE(class) && col.names == "none")) abbs = ""
if (quote) colnames(toprint) <- paste0('"', old <- colnames(toprint), '"')
if (isTRUE(trunc.cols)) {
# allow truncation of columns to print only what will fit in console PR #4074
widths = dt_width(toprint, class, row.names, col.names)
cons_width = getOption("width")
cols_to_print = widths <= cons_width
not_printed = colnames(toprint)[!cols_to_print]
if (sum(cols_to_print) == 0L) stop("Width of console too small to print a single column when `trunc.cols=TRUE`. Consider increasing the width of the console or use `trunc.cols=FALSE`.", call. = FALSE)
TysonStanley marked this conversation as resolved.
Show resolved Hide resolved
# When nrow(toprint) = 1, attributes get lost in the subset,
# function below adds those back when necessary
toprint = toprint_subset(toprint, cols_to_print)
}
if (printdots) {
toprint = rbind(head(toprint, topn + isTRUE(class)), "---"="", tail(toprint, topn))
rownames(toprint) = format(rownames(toprint), justify="right")
Expand All @@ -96,6 +110,10 @@ print.data.table = function(x, topn=getOption("datatable.print.topn"),
} else {
print(toprint, right=TRUE, quote=quote)
}
if (trunc.cols && length(not_printed) > 0L)
# prints names of variables not shown in the print
trunc_cols_message(not_printed, abbs, class)

return(invisible(x))
}
if (nrow(toprint)>20L && col.names == "auto")
Expand All @@ -107,6 +125,10 @@ print.data.table = function(x, topn=getOption("datatable.print.topn"),
} else {
print(toprint, right=TRUE, quote=quote)
}
if (trunc.cols && length(not_printed) > 0L)
# prints names of variables not shown in the print
trunc_cols_message(not_printed, abbs, class)

invisible(x)
}

Expand Down Expand Up @@ -164,3 +186,39 @@ shouldPrint = function(x) {
# as opposed to printing a blank line, for excluding col.names per PR #1483
cut_top = function(x) cat(capture.output(x)[-1L], sep = '\n')

# to calculate widths of data.table for PR #4074
# gets the width of the data.table at each column
# and compares it to the console width
dt_width = function(x, class, row.names, col.names) {
widths = apply(nchar(x, type='width'), 2L, max)
if (class) widths = pmax(widths, 6L)
if (col.names != "none") names = sapply(colnames(x), nchar, type = "width") else names = 0L
dt_widths = pmax(widths, names)
rownum_width = if (row.names) as.integer(ceiling(log10(nrow(x)))+1) else 0L
cumsum(dt_widths + 1L) + rownum_width + 1L
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another micro-optimization -- we can do +2 else 0L. I think if(!row.names) the +1L in the next step is wrong. However I also just noticed this:

#4083

So even if row.names=FALSE, it could still occupy some characters. I think you can safely ignore that in this PR, rather than complicating logic to detect whether --- will be present.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about the --- but hadn't done anything about it yet. Imagine we'll want to wait for that to be resolved before deciding any padding.

I believe the +1L in cumsum(dt_widths + 1L) should account for the space between the row names and the first column instead of needing to add another to the rownum_width.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'll take off the +1L after the rownum_width bc I think you are right that it is unnecessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the +1L is right when row.names=TRUE

}
# keeps the dim and dimnames attributes
toprint_subset = function(x, cols_to_print) {
if (nrow(x) == 1L){
atts = attributes(x)
atts$dim = c(1L, sum(cols_to_print))
atts$dimnames[[2L]] = atts$dimnames[[2L]][cols_to_print]
x = x[, cols_to_print, drop=FALSE]
attributes(x) = atts
x
} else {
x[, cols_to_print, drop=FALSE]
}
}
# message for when trunc.cols=TRUE and some columns are not printed
trunc_cols_message = function(not_printed, abbs, class){
n = length(not_printed)
if (class) classes = paste0(" ", tail(abbs, n)) else classes = ""
not_printed_paste = paste0(not_printed, classes, collapse = ", ")
cat(sprintf(ngettext(n,
paste0("1 variable not shown: %s"),
paste0(n, " variables not shown: %s")),
TysonStanley marked this conversation as resolved.
Show resolved Hide resolved
not_printed_paste),
"\n")
}

77 changes: 77 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -16382,6 +16382,83 @@ test(2121.5, DT[, .(.N=.N), by=a], data.table(a=c(1,2), .N=2:1)) # user supplied
test(2121.6, DT[ , {.(a, b=b+1); NULL}], NULL)


# trunc.cols in print.data.table #4074
old_width = options("width" = 40)
# Single row printing (to check issue with losing attributes)
DT = data.table(a = "aaaaaaaaaaaaa",
b = "bbbbbbbbbbbbb",
c = "ccccccccccccc",
d = "ddddddddddddd")
test(2122.1,
capture.output(print(DT, trunc.cols=TRUE)),
c(" a b",
"1: aaaaaaaaaaaaa bbbbbbbbbbbbb",
"2 variables not shown: c, d " ))
# Printing with dots
DT = data.table(a = vector("integer", 102),
b = "bbbbbbbbbbbbb",
c = "ccccccccccccc",
d = c("ddddddddddddd", "d"))
test(2122.2, capture.output(print(DT, trunc.cols=TRUE)),
c(" a b c",
" 1: 0 bbbbbbbbbbbbb ccccccccccccc",
" 2: 0 bbbbbbbbbbbbb ccccccccccccc",
" 3: 0 bbbbbbbbbbbbb ccccccccccccc",
" 4: 0 bbbbbbbbbbbbb ccccccccccccc",
" 5: 0 bbbbbbbbbbbbb ccccccccccccc",
" --- ",
" 98: 0 bbbbbbbbbbbbb ccccccccccccc",
" 99: 0 bbbbbbbbbbbbb ccccccccccccc",
"100: 0 bbbbbbbbbbbbb ccccccccccccc",
"101: 0 bbbbbbbbbbbbb ccccccccccccc",
"102: 0 bbbbbbbbbbbbb ccccccccccccc",
"1 variable not shown: d "))
test(2122.3, capture.output(print(DT, trunc.cols=TRUE, row.names=FALSE)),
c(" a b c",
" 0 bbbbbbbbbbbbb ccccccccccccc",
" 0 bbbbbbbbbbbbb ccccccccccccc",
" 0 bbbbbbbbbbbbb ccccccccccccc",
" 0 bbbbbbbbbbbbb ccccccccccccc",
" 0 bbbbbbbbbbbbb ccccccccccccc",
"--- ",
" 0 bbbbbbbbbbbbb ccccccccccccc",
" 0 bbbbbbbbbbbbb ccccccccccccc",
" 0 bbbbbbbbbbbbb ccccccccccccc",
" 0 bbbbbbbbbbbbb ccccccccccccc",
" 0 bbbbbbbbbbbbb ccccccccccccc",
"1 variable not shown: d " ))
test(2122.4, capture.output(print(DT, trunc.cols=TRUE, class=TRUE))[14],
"1 variable not shown: d <char> ")
test(2122.5, capture.output(print(DT, trunc.cols=TRUE, class=TRUE, row.names=FALSE))[c(1,14)],
c(" a b c",
"1 variable not shown: d <char> " ))
test(2122.6, capture.output(print(DT, trunc.cols=TRUE, col.names="none"))[c(1,12)],
c(" 1: 0 bbbbbbbbbbbbb ccccccccccccc",
"1 variable not shown: d " ))
test(2122.7, capture.output(print(DT, trunc.cols=TRUE, class=TRUE, col.names="none"))[c(1,13)],
c(" 1: 0 bbbbbbbbbbbbb ccccccccccccc",
"1 variable not shown: d " ),
warning = "Column classes will be suppressed when col.names is 'none'")
options("width" = 20)
DT = data.table(a = vector("integer", 2),
b = "bbbbbbbbbbbbb",
c = "ccccccccccccc",
d = "ddddddddddddd")
test(2122.8, capture.output(print(DT, trunc.cols=TRUE)),
c(" a b",
"1: 0 bbbbbbbbbbbbb",
"2: 0 bbbbbbbbbbbbb",
"2 variables not shown: c, d "))
options("width" = 10)
DT = data.table(a = "aaaaaaaaaaaaa",
b = "bbbbbbbbbbbbb",
c = "ccccccccccccc",
d = "ddddddddddddd")
test(2122.9, print(DT, trunc.cols=TRUE),
error = "Width of console too small to print a single column when `trunc.cols=TRUE`. Consider increasing the width of the console or use `trunc.cols=FALSE`.")
options(old_width)


###################################
# Add new tests above this line #
###################################
26 changes: 20 additions & 6 deletions man/print.data.table.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,13 @@
}
\usage{
\method{print}{data.table}(x,
topn=getOption("datatable.print.topn"), # default: 5
nrows=getOption("datatable.print.nrows"), # default: 100
class=getOption("datatable.print.class"), # default: FALSE
row.names=getOption("datatable.print.rownames"), # default: TRUE
col.names=getOption("datatable.print.colnames"), # default: "auto"
print.keys=getOption("datatable.print.keys"), # default: FALSE
topn=getOption("datatable.print.topn"), # default: 5
nrows=getOption("datatable.print.nrows"), # default: 100
class=getOption("datatable.print.class"), # default: FALSE
row.names=getOption("datatable.print.rownames"), # default: TRUE
col.names=getOption("datatable.print.colnames"), # default: "auto"
print.keys=getOption("datatable.print.keys"), # default: FALSE
trunc.cols=getOption("datatable.print.trunc.cols"), # default: FALSE
quote=FALSE,
timezone=FALSE, \dots)
}
Expand All @@ -25,6 +26,7 @@
\item{row.names}{ If \code{TRUE}, row indices will be printed alongside \code{x}. }
\item{col.names}{ One of three flavours for controlling the display of column names in output. \code{"auto"} includes column names above the data, as well as below the table if \code{nrow(x) > 20}. \code{"top"} excludes this lower register when applicable, and \code{"none"} suppresses column names altogether (as well as column classes if \code{class = TRUE}. }
\item{print.keys}{ If \code{TRUE}, any \code{\link{key}} and/or \code{\link[=indices]{index}} currently assigned to \code{x} will be printed prior to the preview of the data. }
\item{trunc.cols}{ If \code{TRUE}, only the columns that can be printed in the console without wrapping the columns to new lines will be printed (similar to \code{tibbles}). }
\item{quote}{ If \code{TRUE}, all output will appear in quotes, as in \code{print.default}. }
\item{timezone}{ If \code{TRUE}, time columns of class POSIXct or POSIXlt will be printed with their timezones (if attribute is available). }
\item{\dots}{ Other arguments ultimately passed to \code{format}. }
Expand Down Expand Up @@ -58,5 +60,17 @@
setindexv(DT, c("a", "b"))
setindexv(DT, "a")
print(DT, print.keys=TRUE)

# `trunc.cols` will make it so only columns that fit in console will be printed
# with a message that states the variables not shown
old_width = options("width" = 40)
DT <- data.table(thing_11 = vector("integer", 3),
thing_21 = vector("complex", 3),
thing_31 = as.IDate(paste0("2016-02-0", 1:3)),
thing_41 = "aasdfasdfasdfasdfasdfasdfasdfasdfasdfasdf",
thing_51 = vector("integer", 3),
thing_61 = vector("complex", 3))
print(DT, trunc.cols=TRUE)
options(old_width)
}