Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #2930 -- bugfix to as.matrix.data.table() #2938

Closed
wants to merge 26 commits into from
Closed

Closes #2930 -- bugfix to as.matrix.data.table() #2938

wants to merge 26 commits into from

Conversation

sritchie73
Copy link
Contributor

Supplying a column to use as the rownames in as.matrix.data.table() now also works when the input data.table has only one row. Previously the rownames argument was treated as a vector of user supplied rownames because the length of rownames was the same as the number of rows in x.

The user can now still supply there own length 1 vector of rownames to use when the input data.table has one row, provided it is not a column in the input data.table.

This makes the case when the data.table has one row consistent with the functionality where the data.table has > 1 rows.

@sritchie73
Copy link
Contributor Author

The branch i've submitted the pull request from is the same branch which I used to submit the original PR #2702 - the final commit in the history is the one pertinent to this PR.

Let me know if I should create a new branch to submit this PR from instead.

R/data.table.R Outdated
# rownames argument is a vector of row names, no column in x to drop.
rn <- rownames
rnc <- NULL
} else if (length(rownames) == nrow(x) && nrow(x) == 1 && is.character(rownames) && !(rownames %in% colnames(x))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an extra argument for the length-1 case is going to be necessary. Something like rownames.literal = TRUE.

as.matrix(data.table(x = 1, z = 2), "z")
#>   x z
#> z 1 2
as.matrix(data.table(x = 1, z = 2), "z", rownames.literal = FALSE)
#>   x z
#> 2 1 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I'm not sure having an extra argument is really necessary, since it will only apply to data.tables with 1 row, unless we want to allow vector recycling for the rownames when the data.table has > 1 rows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this PR arose because of confusion about 1-row data.tables. And there may be other cases where the overloaded argument of rownames might cause problems.

rownames.literal=TRUE allows the user to specify a vector of rownames (of length 1) when the input data.table has only 1 row.
Argument added to method section of man page
@@ -1887,14 +1887,19 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
# x
#}

as.matrix.data.table <- function(x, rownames, ...) {
as.matrix.data.table <- function(x, rownames, rownames.literal=FALSE, ...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In such unwanted edge case we don't have to be consistent to data.frame by default, it is important to give option for consistency to data.frame, which new arg already provides, but we could tune its default.
Maybe new arg could have smart default like nrow(x)==1L? probably more correct would be nrow(x)==1L && !missing(rownames) && is.character(rownames) but it is a little bit messy, maybe nrow(x)==1L is sufficient in this case.

# rownames argument is a vector of row names, no column in x to drop.
rn <- rownames
rnc <- NULL
} else if (nrow(x) == 1 && isTRUE(rownames.literal)) {
Copy link
Member

@jangorecki jangorecki Jun 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need length(rownames) == 1L here?

@@ -17,6 +17,11 @@ the \code{rownames} in the returned \code{matrix}. If \code{TRUE} the
single column, otherwise the first column in the \code{data.table} will
be used. Alternative a vector of length \code{nrow(x)} to assign as the
row names of the returned \code{matrix}.}
\item{rownames.literal}{optional, when x is a \code{data.table} that has
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before word "optional" I would put "logical" to clearly state argument type in front, so {logical, optional, when ....

x alone should be wrapped in \code{x}

... when x is ...

to

... when \code{x} is ...

@jangorecki
Copy link
Member

there are definitely too many commits in this branch, git reset --soft your changes, git stash, git rebase master (updated master), and git stash pop.

@sritchie73
Copy link
Contributor Author

Thanks @jangorecki , i tried the above series of commands but git wouldnt let me push the changes. I've created a new branch instead containing just the bugfix and submitted a new PR #2939

@sritchie73 sritchie73 closed this Jun 17, 2018
@sritchie73 sritchie73 deleted the as_matrix_rownames branch June 17, 2018 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants