-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allowing rowname specification in as.matrix.data.table #2692
Comments
And/or maybe port |
I see where you're coming from, from an abstract point of view. But data.table's don't have What do you do with the matrix once you've got it? I don't have any objection to |
I generally prefer to work with long skinny data.tables even with matrix data, then convert back to matrices as needed. A common workflow for me is to:
I often also have multiple columns of values in a tall skinny Another scenario is where the raw data comes in a mixed matrix / data.table format, where there are several columns of information, and many columns of measurements – this I would split off into a data.table of information and a matrix of measurements. I therefore wouldn't go so far as allowing
We might also consider adding the same argument to |
i often convert to matrix for plotting. especially dcast -> as.matrix ->
matplot
…On Thu, Mar 22, 2018, 6:42 AM Scott Ritchie ***@***.***> wrote:
I generally prefer to work with long skinny data.tables even with matrix
data, then convert back to matrices as needed.
A common workflow for me is to:
1. Convert a matrix into a data.table so that I can:
2. Melt it to a tall and skinny data.table to:
3. Run calculations over each column using by rather than apply(mat,
2, ...)
4. Plot many columns against each other using ggplot2 by treating them
as groups
I often also have multiple columns of values in a tall skinny data.table
(e.g. a raw data column and a normalised data column) that I might want to
split off into individual matrices using dcast.
Another scenario is where the raw data comes in a mixed matrix /
data.table format, where there are several columns of information, and many
columns of measurements – this I would split off into a data.table of
information and a matrix of measurements.
I therefore wouldn't go so far as allowing as.matrix to work on multi-key
data.tables, since it would be difficult to split those apart again
later. Rather I was thinking the following behaviour:
- as.matrix(dt, rownames=TRUE): take the first column as the rownames
(or maybe key(dt) if there is a single key column).
- as.matrix(dt, rownames=3): take the column index specified as the
rownames.
- as.matrix(dt, rownames="colname"): take the named column as the
rownames.
We might also consider adding the same argument to as.data.frame(),
although I cannot think of a scenario where that would be useful off the
top of my head.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2692 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHQQdexIZbkNv-Pr-HB-243cBNY8Eetoks5tgtdGgaJpZM4Sy-Ay>
.
|
Thanks for info. I see now. Sounds good, then! |
@sritchie73 Minimal example of your workflow to split dimensions into data.table and measures into matrix (+calculate and join back) could be put as test. |
Merged PR should have auto-closed this one but didn't because the PR had "Implements" at the top not "Closes". Closing manually now. |
Particularly after performing
dcast()
, I frequently find myself writing and using the following function to convert adata.table
to a matrix:data.table
s do not have arownames
attribute so this information is typically stored as the first column of thedata.table
. When converting to amatrix
it is typically desirable to make this column therownames()
on thematrix
. Currently, you have to jump through several hoops to make this conversion following the code above.This could be taken care of by
as.matrix.data.table()
itself, e.g. through an additional argument something likeas.matrix(dt, rownames = 1)
, analogous to thekeep.rownames
argument inas.data.table
.Is there isn't an obvious reason why this is a bad idea (can additional argument be added to S3 methods?) I'm happy to put together and submit a pull request.
The text was updated successfully, but these errors were encountered: