Allowing rowname specification in as.matrix.data.table #2692

sritchie73 · 2018-03-21T03:36:48Z

Particularly after performing dcast(), I frequently find myself writing and using the following function to convert a data.table to a matrix:

dt.to.matrix <- function(x) {
  x <- as.data.frame(x)
  rownames(x) <- x[,1]
  x <- as.matrix(x[,-1])
  x
}

data.tables do not have a rownames attribute so this information is typically stored as the first column of the data.table. When converting to a matrix it is typically desirable to make this column the rownames() on the matrix. Currently, you have to jump through several hoops to make this conversion following the code above.

This could be taken care of by as.matrix.data.table() itself, e.g. through an additional argument something like as.matrix(dt, rownames = 1), analogous to the keep.rownames argument in as.data.table.

Is there isn't an obvious reason why this is a bad idea (can additional argument be added to S3 methods?) I'm happy to put together and submit a pull request.

The text was updated successfully, but these errors were encountered:

franknarf1 · 2018-03-21T19:01:10Z

And/or maybe port reshape2::acast as was done for dcast .. ?

mattdowle · 2018-03-21T22:20:03Z

I see where you're coming from, from an abstract point of view. But data.table's don't have rownames because this information is typically stored in a multi-column multi-type key, which doesn't have to be the first column. Keys are so superior to rownames, that I don't see why you'd want to convert to a matrix really. Also, a matrix is of a single type for all columns. So the fact that a data.table can be converted to matrix at all means that all the columns in the data.table are the same type. It should either have been a matrix in the first place, or a tall and skinny data.table rather than short and fat, perhaps.

What do you do with the matrix once you've got it?

I don't have any objection to as.matrix.data.table being extended like that, but I wonder if it would make the wrong solution easier to do. Would it paste together the key into one longer string to go in the rownames? The horror of that approach was one reason I created data.table without rownames, but multi-column multi-type keys instead.

sritchie73 · 2018-03-21T22:42:10Z

I generally prefer to work with long skinny data.tables even with matrix data, then convert back to matrices as needed.

A common workflow for me is to:

Convert a matrix into a data.table so that I can:
Melt it to a tall and skinny data.table to:
Run calculations over each column using by rather than apply(mat, 2, ...)
Plot many columns against each other using ggplot2 by treating them as groups

I often also have multiple columns of values in a tall skinny data.table (e.g. a raw data column and a normalised data column) that I might want to split off into individual matrices using dcast.

Another scenario is where the raw data comes in a mixed matrix / data.table format, where there are several columns of information, and many columns of measurements – this I would split off into a data.table of information and a matrix of measurements.

I therefore wouldn't go so far as allowing as.matrix to work on multi-key data.tables, since it would be difficult to split those apart again later. Rather I was thinking the following behaviour:

as.matrix(dt, rownames=TRUE): take the first column as the rownames (or maybe key(dt) if there is a single key column).
as.matrix(dt, rownames=3): take the column index specified as the rownames.
as.matrix(dt, rownames="colname"): take the named column as the rownames.

We might also consider adding the same argument to as.data.frame(), although I cannot think of a scenario where that would be useful off the top of my head.

MichaelChirico · 2018-03-22T00:21:38Z

i often convert to matrix for plotting. especially dcast -> as.matrix -> matplot

…

On Thu, Mar 22, 2018, 6:42 AM Scott Ritchie ***@***.***> wrote: I generally prefer to work with long skinny data.tables even with matrix data, then convert back to matrices as needed. A common workflow for me is to: 1. Convert a matrix into a data.table so that I can: 2. Melt it to a tall and skinny data.table to: 3. Run calculations over each column using by rather than apply(mat, 2, ...) 4. Plot many columns against each other using ggplot2 by treating them as groups I often also have multiple columns of values in a tall skinny data.table (e.g. a raw data column and a normalised data column) that I might want to split off into individual matrices using dcast. Another scenario is where the raw data comes in a mixed matrix / data.table format, where there are several columns of information, and many columns of measurements – this I would split off into a data.table of information and a matrix of measurements. I therefore wouldn't go so far as allowing as.matrix to work on multi-key data.tables, since it would be difficult to split those apart again later. Rather I was thinking the following behaviour: - as.matrix(dt, rownames=TRUE): take the first column as the rownames (or maybe key(dt) if there is a single key column). - as.matrix(dt, rownames=3): take the column index specified as the rownames. - as.matrix(dt, rownames="colname"): take the named column as the rownames. We might also consider adding the same argument to as.data.frame(), although I cannot think of a scenario where that would be useful off the top of my head. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2692 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHQQdexIZbkNv-Pr-HB-243cBNY8Eetoks5tgtdGgaJpZM4Sy-Ay> .

mattdowle · 2018-03-22T01:06:43Z

Thanks for info. I see now. Sounds good, then!

jangorecki · 2018-03-22T04:40:05Z

@sritchie73 Minimal example of your workflow to split dimensions into data.table and measures into matrix (+calculate and join back) could be put as test.

mattdowle · 2018-04-10T02:02:17Z

Merged PR should have auto-closed this one but didn't because the PR had "Implements" at the top not "Closes". Closing manually now.

sritchie73 added the feature request label Mar 21, 2018

sritchie73 mentioned this issue Mar 23, 2018

Allow a single column to be used as rownames in as.matrix #2702

Merged

mattdowle added this to the v1.10.6 milestone Apr 7, 2018

mattdowle closed this as completed Apr 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowing rowname specification in as.matrix.data.table #2692

Allowing rowname specification in as.matrix.data.table #2692

sritchie73 commented Mar 21, 2018

franknarf1 commented Mar 21, 2018

mattdowle commented Mar 21, 2018 •

edited

Loading

sritchie73 commented Mar 21, 2018

MichaelChirico commented Mar 22, 2018 via email

mattdowle commented Mar 22, 2018

jangorecki commented Mar 22, 2018

mattdowle commented Apr 10, 2018

Allowing rowname specification in as.matrix.data.table #2692

Allowing rowname specification in as.matrix.data.table #2692

Comments

sritchie73 commented Mar 21, 2018

franknarf1 commented Mar 21, 2018

mattdowle commented Mar 21, 2018 • edited Loading

sritchie73 commented Mar 21, 2018

MichaelChirico commented Mar 22, 2018 via email

mattdowle commented Mar 22, 2018

jangorecki commented Mar 22, 2018

mattdowle commented Apr 10, 2018

mattdowle commented Mar 21, 2018 •

edited

Loading