Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress towards #1648; improvements to metadata function tables() #1804

Merged
merged 1 commit into from
Aug 7, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,8 @@
43. `first()` is now exported to return the first element of vectors, data.frames and data.tables.

44. Added `second` and `minute` extraction functions which, like extant `hour`/`yday`/`week`/etc, always return an integer, [#874](https://github.com/Rdatatable/data.table/issues/874). Thanks to @bthieurmel for the FR and @MichaelChirico for the PR.

45. `tables` gains `index` argument for supplementary metadata about `data.table`s in memory (or any optionally specified environment), part of [#1648](https://github.com/Rdatatable/data.table/issues/1648). Thanks due variously to @jangorecki, @rsaporta, @MichaelChirico for ideas and work towards PR.

#### BUG FIXES

Expand Down
59 changes: 34 additions & 25 deletions R/tables.R
Original file line number Diff line number Diff line change
@@ -1,42 +1,51 @@
# globals to pass NOTE from R CMD check, see http://stackoverflow.com/questions/9439256
MB = NCOL = NROW = NULL

MB = NCOL = NROW = NULL # globals to pass NOTE from R CMD check

tables <- function(mb=TRUE,order.col="NAME",width=80,env=parent.frame(),silent=FALSE)
tables <- function(mb=TRUE, order.col="NAME", width=80,
env=parent.frame(), silent=FALSE, index=FALSE)
{
# Prints name, size and colnames of all data.tables in the calling environment by default
tt = objects(envir=env, all.names=TRUE)
ss = which(as.logical(sapply(tt, function(x) is.data.table(get(x,envir=env)))))
if (!length(ss)) {
all_obj = objects(envir=env, all.names=TRUE)
is_DT = which(as.logical(sapply(all_obj, function(x) is.data.table(get(x, envir=env)))))
if (!length(is_DT)) {
if (!silent) cat("No objects of class data.table exist in .GlobalEnv\n")
return(invisible(data.table(NULL)))
}
tab = tt[ss]
info = data.table(NAME=tab)
for (i in seq_along(tab)) {
DT = get(tab[i],envir=env) # doesn't copy
set(info,i,"NROW",nrow(DT))
set(info,i,"NCOL",ncol(DT))
if (mb) set(info,i,"MB",ceiling(as.numeric(object.size(DT))/1024^2)) # mb is an option because object.size() appears to be slow. TO DO: revisit
set(info,i,"COLS",paste(colnames(DT),collapse=","))
set(info,i,"KEY",paste(key(DT),collapse=","))
}
info[,NROW:=format(sprintf("%4s",prettyNum(NROW,big.mark=",")),justify="right")] # %4s is for minimum width
info[,NCOL:=format(sprintf("%4s",prettyNum(NCOL,big.mark=",")),justify="right")]
DT_names = all_obj[is_DT]
info = rbindlist(lapply(DT_names, function(dt_n){
DT = get(dt_n, envir=env) # doesn't copy
info_i =
data.table(NAME = dt_n,
NROW = nrow(DT),
NCOL = ncol(DT))
if (mb)
# mb is an option because object.size() appears to be slow.
# **TO DO: revisit**
set(info_i, , "MB",
#1048576 = 1024^2
round(as.numeric(object.size(DT))/1048576))
set(info_i, , "COLS", list(list(names(DT))))
set(info_i, , "KEY", list(list(key(DT))))
if (index) set(info_i, , "INDICES", list(list(indices(DT))))
info_i
}))
info[ , NROW := format(sprintf("%4s", prettyNum(NROW, big.mark=",")), justify="right")] # %4s is for minimum width
info[ , NCOL := format(sprintf("%4s", prettyNum(NCOL, big.mark=",")), justify="right")]
if (mb) {
total = sum(info$MB)
info[, MB:=format(sprintf("%2s",prettyNum(MB,big.mark=",")),justify="right")]
info[ , MB := format(sprintf("%2s", prettyNum(MB, big.mark=",")), justify="right")]
}
if (!order.col %in% names(info)) stop("order.col='",order.col,"' not a column name of info")
info = info[base::order(info[[order.col]])] # base::order to maintain locale ordering of table names
m = as.matrix(info)
colnames(m)[2] = sprintf(paste("%",nchar(m[1,"NROW"]),"s",sep=""), "NROW")
colnames(m)[3] = sprintf(paste("%",nchar(m[1,"NCOL"]),"s",sep=""), "NCOL")
if (mb) colnames(m)[4] = sprintf(paste("%",nchar(m[1,"MB"]),"s",sep=""), "MB")
m[,"COLS"] = substring(m[,"COLS"],1,width)
m[,"KEY"] = substring(m[,"KEY"],1,width)
colnames(m)[2] = sprintf(paste("%",nchar(m[1,"NROW"]), "s", sep=""), "NROW")
colnames(m)[3] = sprintf(paste("%",nchar(m[1,"NCOL"]), "s", sep=""), "NCOL")
if (mb) colnames(m)[4] = sprintf(paste("%", nchar(m[1,"MB"]), "s", sep=""), "MB")
m[ , "COLS"] = substring(m[,"COLS"], 1L, width)
m[ , "KEY"] = substring(m[,"KEY"], 1L, width)
if (!silent) {
print(m, quote=FALSE, right=FALSE)
if (mb) cat("Total: ",prettyNum(as.character(total),big.mark=","),"MB\n",sep="")
if (mb) cat("Total: ", prettyNum(as.character(total), big.mark=","), "MB\n", sep="")
}
invisible(info)
}
Expand Down
23 changes: 22 additions & 1 deletion inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,28 @@ test(59, TESTDT[J(c("g","d","d","d","e","d"),c("b","g","k","b","a","f")),v,roll=
test(68, "TESTDT" %in% tables(silent=TRUE)[,NAME]) # NAME is returned as a column in which we look for the string
test(69, "TESTDT" %in% tables(silent=TRUE)[,as.character(NAME)]) # an old test (from when NAME was factor) but no harm in keeping it
test(69.1, names(tables(silent=TRUE)), c("NAME","NROW","NCOL","MB","COLS","KEY"))
test(69.2, names(tables(silent=TRUE,mb=FALSE)), c("NAME","NROW","NCOL","COLS","KEY"))
test(69.2, names(tables(silent=TRUE, mb=FALSE)), c("NAME","NROW","NCOL","COLS","KEY"))
test(69.3, names(tables(silent=TRUE, mb=FALSE, index=TRUE)),
c("NAME", "NROW", "NCOL", "COLS", "KEY", "INDICES"))

#clear data.tables from current environment
# to maintain stricter control over output of tables()
xenv <- new.env()
xenv$TESTDT <- TESTDT
rm(TESTDT)
DT <- data.table(a = 1)
setnames(DT, paste(rev(LETTERS), collapse=""))
test(69.4, capture.output(tables(width = 10L)),
c(" NAME NROW NCOL MB COLS KEY ",
"[1,] \"DT\" \" 1\" \" 1\" \" 0\" \"ZYXWVUTSRQ\" \"NULL\"",
"Total: 0MB"))

nenv <- new.env()
nenv$DT <- data.table(a = 1)
test(69.5, tables(silent=TRUE, env=nenv)$NAME, "DT")

#revert
TESTDT <- xenv$TESTDT

a = "d"
# Variable Twister. a in this scope has same name as a inside DT scope.
Expand Down
34 changes: 20 additions & 14 deletions man/tables.Rd
Original file line number Diff line number Diff line change
@@ -1,29 +1,35 @@
\name{tables}
\alias{tables}
\title{Display all objects of class 'data.table' }
\title{Display 'data.table' metadata }
\description{
Lists all data.table's in memory, including number of rows, column names and any keys.
Convenience function for concisely summarizing some metadata of all \code{data.table}s in memory (or an optionally specified environment).
}
\usage{
tables(mb = TRUE, order.col = "NAME", width = 80, env=parent.frame(), silent=FALSE)
tables(mb=TRUE, order.col="NAME", width=80,
env=parent.frame(), silent=FALSE, index=FALSE)
}
\arguments{
\item{mb}{ TRUE adds size of the data.table in MB to the output (slow
in older versions of R). }
\item{order.col}{ Quoted column name to sort the output by }
\item{width}{ Number of characters to truncate the COLS output }
\item{env}{ Usually tables() is executed at the prompt where parent.frame() returns .GlobalEnv. tables() may also be useful inside functions where parent.frame() is the local scope of the function, or set it to .GlobalEnv }
\item{silent}{ By default tables() is expected to be called at the prompt for its compact print output. silent=TRUE prints nothing. The data statistics are returned as a data.table, silently, whether silent is TRUE or FALSE }
\item{mb}{ \code{logical}; \code{TRUE} adds the rough size of each \code{data.table} in megabytes to the output under column \code{MB}. }
\item{order.col}{ Column name (\code{character}) by which to sort the output. }
\item{width}{ \code{integer}; number of characters beyond which the output for each of the columns \code{COLS}, \code{KEY}, and \code{INDICES} are truncated. }
\item{env}{ An \code{environment}, typically the \code{.GlobalEnv} by default, see Details. }
\item{silent}{ \code{logical}; should the output be printed? }
\item{index}{ \code{logical}; if \code{TRUE}, the column \code{INDICES} is added to indicate the indices assorted with each object, see \code{\link{indices}}. }
}
\details{
Usually \code{tables()} is executed at the prompt, where \code{parent.frame()} returns \code{.GlobalEnv}. \code{tables()} may also be useful inside functions where \code{parent.frame()} is the local scope of the function; in such a scenario, simply set it to \code{.GlobalEnv} to get the same behavior as at prompt.

Note that on older versions of \R, \code{object.size} may be slow, so setting \code{mb=FALSE} may speed up execution of \code{tables} significantly.

Setting \code{silent=TRUE} prints nothing; the metadata are returned as a \code{data.table}, invisibly, whether silent is \code{TRUE} or \code{FALSE}.
}
% \details{
% }
\value{
A data.table containing the information printed.
A \code{data.table} containing the information printed.
}
\seealso{ \code{\link{data.table}}, \code{\link{setkey}}, \code{\link{ls}}, \code{\link{objects}}, \code{\link{object.size}} }
\examples{
DT = data.table(A=1:10,B=letters[1:10])
DT2 = data.table(A=1:10000,ColB=10000:1)
DT = data.table(A=1:10, B=letters[1:10])
DT2 = data.table(A=1:10000, ColB=10000:1)
setkey(DT,B)
tables()
}
Expand Down