Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow a single column to be used as rownames in as.matrix #2702

Merged
merged 22 commits into from
Apr 7, 2018
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,8 @@ The response has been positive ([this tweet](https://twitter.com/MattDowle/statu
```
Thus, `with=` should no longer be needed in any cases. Please change to using the `..` prefix and in a few years we will start to formally deprecate and remove the `with=` parameter. If this is well received, the `..` prefix could be expanded to symbols appearing in `i=` and `by=`, too.

19. `as.matrix.data.table` method now has an additional `rownames` argument allowing for a single column to be used as the `rownames` after conversion to a `matrix`. Thanks to @sritchie73 for the suggestion, use cases, [#2692](https://github.com/Rdatatable/data.table/issues/2692) and implementation [PR#2702](https://github.com/Rdatatable/data.table/pull/2702) and @MichaelChirico for additional use cases.

#### BUG FIXES

1. The new quote rules handles this single field `"Our Stock Screen Delivers an Israeli Software Company (MNDO, CTCH)<\/a> SmallCapInvestor.com - Thu, May 19, 2011 10:02 AM EDT<\/cite><\/div>Yesterday in \""Google, But for Finding
Expand Down
57 changes: 49 additions & 8 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -1881,17 +1881,58 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
# x
#}


as.matrix.data.table <- function(x,...)
{
dm <- dim(x)
cn <- names(x)
as.matrix.data.table <- function(x, rownames, ...) {
rn <- NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we have a style guide on this, but I note that the corresponding CRAN cheat for [.data.table symbols are defined in the package environment rather than the function body:

https://github.com/Rdatatable/data.table/blob/master/R/data.table.R#L11

  • side is cutting that tiny bit of overhead for use cases that might repeatedly call this method; - side is increasing potential for unintentional collisions (so if moving outside the body, perhaps use some more obscure name)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are defined there because they are exported. rn won't be used by user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap - rn is internal here, it will contain the vector of rownames to put in the matrix (after all the processing in if (!missing(rownames)) {}. rnc will contain the index of the column in x to be dropped.

rnc <- NULL
if (!missing(rownames)) { # Convert rownames to a column index if possible
if (is.null(rownames)) {
warning("rownames is NULL, ignoring rownames")
} else if (length(rownames) != 1) {
stop("rownames must be a single column in x")
} else if (is.na(rownames)) {
warning("rownames is NA, ignoring rownames")
} else if (is.logical(rownames) && !isTRUE(rownames)) {
warning("rownames is FALSE, ignoring rownames")
} else if (!(is.logical(rownames) || is.character(rownames) || is.numeric(rownames))) {
# E.g. because rownames is some sort of object that cant be converted to a column index
stop("rownames must be TRUE, a column index, or a column name in x")
} else {
if (is.logical(rownames) && isTRUE(rownames)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isTRUE(rownames) is sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I've changed this statement to identical(rownames, TRUE) which I think is clearer (and is used elsewhere in data.table.R).

if (haskey(x)) {
rownames <- key(x)
if (length(rownames) > 1) {
warning("rownames is TRUE but multiple keys found in key(x), using first column instead")
rownames <- 1
}
} else {
rownames <- 1
}
}
if (is.character(rownames)) { # Handles cases where rownames is a column name, or key(x) from TRUE
rnc <- chmatch(rownames, names(x))
if (is.na(rnc)) stop(rownames, " is not a column of x")
} else { # rownames is an index already
if (rownames < 1 || rownames > ncol(x))
stop("rownames is ", rownames, " which is outside the column number range [1,ncol=", ncol(x), "]")
rnc <- rownames
}
}
}
if (!is.null(rnc)) { # If there are rownames, extract and drop that column
rn <- x[[rnc]]
dm <- dim(x) - c(0, 1)
cn <- names(x)[-rnc]
X <- x[, -rnc, with = FALSE]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think x[, .SD, .SDcols = c(cn)] or x[, (rn) := NULL] could work -- not 100% on what the variables are, or the best way to do this within data.table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @HughParsonage - I like the .SDcols approach. I had also thought about using the new .. syntax, but then that raises the problem with R CMD check --as-cran complaining that ..rnc is a global variable with no visible binding.

} else {
dm <- dim(x)
cn <- names(x)
X <- x
}
if (any(dm == 0L))
return(array(NA, dim = dm, dimnames = list(NULL, cn)))
return(array(NA, dim = dm, dimnames = list(rn, cn)))
p <- dm[2L]
n <- dm[1L]
collabs <- as.list(cn)
X <- x
class(X) <- NULL
non.numeric <- non.atomic <- FALSE
all.logical <- TRUE
Expand Down Expand Up @@ -1936,7 +1977,7 @@ as.matrix.data.table <- function(x,...)
}
X <- unlist(X, recursive = FALSE, use.names = FALSE)
dim(X) <- c(n, length(X)/n)
dimnames(X) <- list(NULL, unlist(collabs, use.names = FALSE))
dimnames(X) <- list(rn, unlist(collabs, use.names = FALSE))
X
}

Expand Down
21 changes: 21 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -11544,6 +11544,27 @@ test(1894.11, DT[, sum(z)*..z], 72L)
setnames(DT, "z", "..z")
test(1894.12, DT[, sum(y)*..z], error="..z in j is looking for z in calling scope, but a column '..z' exists. Column names should not start with ..")

# Allow column to be used as rownames when converting to matrix # 1895
DT = data.table(id = letters[1:4], X = 1:4, Y = 5:8)
mat <- matrix(1:8, ncol = 2, dimnames=list(letters[1:4], c("X", "Y")))
test(1895.01, as.matrix(DT, 1), mat)
test(1895.02, as.matrix(DT, "id"), mat)
test(1895.03, as.matrix(DT, TRUE), mat)
setkey(DT, id)
test(1895.04, as.matrix(DT, TRUE), mat)
# errors
test(1895.05, as.matrix(DT, -1), error="rownames is -1 which is outside the column number range")
test(1895.06, as.matrix(DT, "Z"), error="Z is not a column of x")
test(1895.07, as.matrix(DT, c(1,2)), error="rownames must be a single column in x")
test(1895.08, as.matrix(DT, complex(1)), error="rownames must be TRUE, a column index, or a column name in x")
# warningsi
mat2 <- matrix(c(letters[1:4], 1:8), ncol=3, dimnames=list(NULL, c("id", "X", "Y")))
test(1895.09, as.matrix(DT, NA), mat2, warning="rownames is NA, ignoring rownames")
test(1895.10, as.matrix(DT, NULL), mat2, warning="rownames is NULL, ignoring rownames")
test(1895.11, as.matrix(DT, FALSE), mat2, warning="rownames is FALSE, ignoring rownames")
setkey(DT, id, X)
test(1895.12, as.matrix(DT, TRUE), mat, warning="rownames is TRUE but multiple keys found in key(x), using first column instead")


###################################
# Add new tests above this line #
Expand Down
62 changes: 62 additions & 0 deletions man/as.matrix.Rd
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
\name{as.matrix}
\alias{as.matrix}
\alias{as.matrix.data.table}
\title{Convert a data.table to a matrix}
\description{
Converts a \code{data.table} into a \code{matrix}, optionally using one
of the columns in the \code{data.table} as the \code{matrix} \code{rownames}.
}
\usage{
\method{as.matrix}{data.table}(x, rownames, ...)}

\arguments{
\item{x}{a \code{data.table}}
\item{rownames}{optional, a single column name or column index to use as
the \code{rownames} inthe returned \code{matrix}. If \code{TRUE} the
\code{\link{key}} of the \code{data.table} will be used if it is a
single column, otherwise the first column in the \code{data.table} will
be used.}
\item{\dots}{additional arguments to be passed to or from methods.}
}

\details{
\code{\link{as.matrix}} is a generic function in base R. It dispatches to
\code{as.matrix.data.table} if its \code{x} argument is a \code{data.table}.

The method for \code{data.table}s will return a character matrix if there
are only atomic columns and any non-(numeric/logical/complex) column,
applying \code{\link{as.vector}} to factors and \code{\link{format}} to other
non-character columns. Otherwise, the usual coercion hierarchy (logical <
integer < double < complex) will be used, e.g., all-logical data frames
will be coerced to a logical matrix, mixed logical-integer will give an
integer matrix, etc.

An additional argument \code{rownames} is provided for \code{as.matrix.data.table}
to facilitate conversions to matrices where the \code{\link{rownames}} are stored
in a single column of \code{x}, e.g. the first column after using
\code{\link{dcast.data.table}}.
}

\value{
A new \code{matrix} containing the contents of \code{x}.
}

\seealso{
\code{\link{data.table}}, \code{\link{as.matrix}}, \code{\link{data.matrix}}
\code{\link{array}}
}

\examples{
(dt1 <- data.table(A = letters[1:10], X = 1:10, Y = 11:20))
as.matrix(dt1) # character matrix
as.matrix(dt1, rownames = "A")
as.matrix(dt1, rownames = 1)
as.matrix(dt1, rownames = TRUE)

(dt1 <- data.table(A = letters[1:10], X = 1:10, Y = 11:20))
setkey(dt1, A)
as.matrix(dt1, rownames = TRUE)
}

\keyword{ array }