-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Melt on multiple columns: undocumented behavior #4047
Comments
I have the same question. It would be nice for variable column to have factors "a" and "b" instead of 1 and 2. |
Look at this example, which has data.table, tidyr and base reshape versions of the same problem for comparison. The data.table version is also shown below.
Also it would be nice if
|
hi this is solved by a new function in nc package which uses melt.data.table internally, library(data.table)
dt <- data.table(
i = 1:2, na = rnorm(2), nb=rnorm(2),
ua=runif(2), ub=runif(2))
nc::capture_melt_multiple(dt, column="[un]", letter="[ab]")
#> i letter n u
#> 1: 1 a 0.5509765 0.7095506
#> 2: 2 a 0.7278650 0.2971809
#> 3: 1 b -0.4690630 0.9605627
#> 4: 2 b -1.4568312 0.3414062
nc::capture_melt_multiple(dt, letter="[un]", column="[ab]")
#> i letter a b
#> 1: 1 n 0.5509765 -0.4690630
#> 2: 2 n 0.7278650 -1.4568312
#> 3: 1 u 0.7095506 0.9605627
#> 4: 2 u 0.2971809 0.3414062 |
Again, we should first discuss if that functionality is going to be in scope of DT before closing. |
the original question: "what is the relationship between the numeric label and the original column name?" it is true that we should add documentation about what values go into the AFAICT the closest documentation is that |
even better just avoid the variable column altogether, use new functionality in #4731 remotes::install_github("Rdatatable/data.table@melt-custom-variable")
#> Skipping install of 'data.table' from a github remote, the SHA1 (c02fa9e8) has not changed since last install.
#> Use `force = TRUE` to force installation
library(data.table)
dt <- data.table(
i = 1:2, na = rnorm(2), nb=rnorm(2),
ua=runif(2), ub=runif(2))
melt(dt, measure.vars=measure(value.name, letter, pattern="([un])([ab])"))
#> i letter n u
#> 1: 1 a -0.6042333 0.3756086
#> 2: 2 a 0.4125218 0.2719224
#> 3: 1 b 0.2163859 0.5793461
#> 4: 2 b -0.6725394 0.1945757
melt(dt, measure.vars=measure(letter, value.name, pattern="([un])([ab])"))
#> i letter a b
#> 1: 1 n -0.6042333 0.2163859
#> 2: 2 n 0.4125218 -0.6725394
#> 3: 1 u 0.3756086 0.5793461
#> 4: 2 u 0.2719224 0.1945757 |
Just a reminder here: the issue is about missing documentation of the current behavior. Myself, I would prefer the new functionality, but I can work around with the existing one. But without details laid out in documentation, it feels somewhat unsafe to assume how the current approach works. In particular, to assume that the numeric values correspond to the original values in alphabetic order. |
I have proposed a doc fix in #4723 |
hi again the doc fix in #4723 was recently merged into master, so if this is good enough for you then @otoomet can you please close this issue?
|
closed since I believe the doc changes address the issue |
Consider a simple data table:
when melting it into multiple colums we get:
In particular
variable
is a factor with levels "1" and "2". This behavior seems to be undocumented.?melt
tellsand
However, I cannot find anything about
I know there are related feature requests (#2551and #3396). I am also aware of related solutions (e.g. on SO) that revolve around renaming the corresponding factor levels. However, for such solutions to be considered safe, the behavior of numeric levels should be documented and considered part of the API.
data.table 1.12.6; R 3.4, 3.6.
The text was updated successfully, but these errors were encountered: