Possibility to provide group labels in melt `variable` column when melting a list of columns simultaneously #2551

matthiasgomolka · 2018-01-05T08:46:10Z

Right now, there is a loss of information when one melts a list of columns simultaneously since the variable column only gives the number of the group (as a factor by default), not a name.

It would be great if the functionality of melting a list of columns simultaneously would be extended such that it is possible to provide labels for the factor in the variable ###column. I would suggest an additional argument, e.g. variable.labels which makes it possible to provide custom labels.

# Minimal reproducible example

library(data.table)

DT <- data.table(val_loss = 1:2,
                 val_acc = 3:4,
                 loss = 5:6,
                 acc = 7:8,
                 id = c("A", "B"))

DT_melted <- 
  melt(DT,
       measure.vars = list(c("loss", "acc"), c("val_loss", "val_acc")),
# the following line shows what I think would be ideal
#      variable.labels = c("group 1", "group 2"),
       value = c("training", "validation"))

# the following step should become obsolete by providing the labels during the melt call above
DT_melted[, variable := factor(variable, levels = 1:2, labels = c("group 1", "group 2"))]

# Output of sessionInfo()

R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.10.4-3

loaded via a namespace (and not attached):
[1] tools_3.3.3

I know that the R Version is not up to date but at my Company it takes a while until we get the most recent version.

Sorry, for not labeling the issue. Somehow I could not figure out, how to set them.

The text was updated successfully, but these errors were encountered:

MichaelChirico · 2018-01-05T16:16:46Z

I don't understand your comment about "information loss." There's a 1-1 mapping between your desired output and the actual output?

I don't see much wrong with:

levels(DT_melted$variable) = paste('group', levels(DT_melted$variable))

matthiasgomolka · 2018-01-05T17:49:07Z

What I mean by "information loss" is that you need to attach the labels afterwards as you did above and as I did in the last line of the minimal example. I feel that this increases the risk of mixing up the labels.

But you are right, it can be done afterwards using the same vector of group names. Therefore, the proposed feature would be rather "nice to have" than adding something which cannot be done right now.

Henrik-P · 2018-01-05T20:19:19Z

Possibly related to FR: expansion of melt functionality for handling names of output

MichaelChirico · 2018-01-08T16:54:52Z

@Henrik-P yes appears quite related. In fact playing around with OP's example, I inadvertently tried some of the fixes for #1547 that don't work yet...

mrdwab · 2018-01-18T14:55:37Z

@MichaelChirico , Glad to hear this is being worked on. I don't know how much of a priority it is for you guys, but it would be great if you also considered how to deal with "unbalanced" data that is in a panel form.

Building on #2564, let's imagine we were missing a few columns:

library(data.table)
set.seed(2334)
DT <- data.table(
  a_alpha = rnorm(3), a_gamma = rnorm(3),
  b_beta = rnorm(3), b_gamma = rnorm(3),
  id = c(1:3)
)

It would still be fantastic to be able to use the measure.vars = patterns() approach, but that does not give the right result, and you can't actually "restore" labels as is demonstrated in this issue.

melt(DT, measure.vars = patterns("a_", "b_"))
#    id variable     value1       value2
# 1:  1        1 -0.1183107  1.682018318
# 2:  2        1  1.2370906 -0.573611132
# 3:  3        1  0.8088209 -0.057320032
# 4:  1        2 -0.7656264 -0.706428227
# 5:  2        2 -0.5919939  0.001899857
# 6:  3        2  0.5279071  1.063851211

To get the correct output, you would have to do the more traditional melt + strsplit + dcast routine:

melt(DT, "id")[, c("stub", "var") := tstrsplit(variable, "_")][, dcast(.SD, id + var ~ stub)]
#    id   var          a            b
# 1:  1 alpha -0.1183107           NA
# 2:  1  beta         NA  1.682018318
# 3:  1 gamma -0.7656264 -0.706428227
# 4:  2 alpha  1.2370906           NA
# 5:  2  beta         NA -0.573611132
# 6:  2 gamma -0.5919939  0.001899857
# 7:  3 alpha  0.8088209           NA
# 8:  3  beta         NA -0.057320032
# 9:  3 gamma  0.5279071  1.063851211

I had previously written merged.stack as part of "splitstackshape" that melted columns separately and then merged them together. Needless to say, not a very efficient approach.

library(splitstackshape)
merged.stack(DT, var.stubs = c("a_", "b_"), sep = "var.stubs")
#    id .time_1         a_           b_
# 1:  1   alpha -0.1183107           NA
# 2:  1    beta         NA  1.682018318
# 3:  1   gamma -0.7656264 -0.706428227
# 4:  2   alpha  1.2370906           NA
# 5:  2    beta         NA -0.573611132
# 6:  2   gamma -0.5919939  0.001899857
# 7:  3   alpha  0.8088209           NA
# 8:  3    beta         NA -0.057320032
# 9:  3   gamma  0.5279071  1.063851211

More recently, working on a replacement for merged.stack I used a different approach where I added in the missing columns before melting.

ReshapeLong_(DT, c(a = "a_", b = "b_"))
#    id variable          a            b
# 1:  1    alpha -0.1183107           NA
# 2:  2    alpha  1.2370906           NA
# 3:  3    alpha  0.8088209           NA
# 4:  1     beta         NA  1.682018318
# 5:  2     beta         NA -0.573611132
# 6:  3     beta         NA -0.057320032
# 7:  1    gamma -0.7656264 -0.706428227
# 8:  2    gamma -0.5919939  0.001899857
# 9:  3    gamma  0.5279071  1.063851211

This works, but:

I'm sure there's a better way
Since column types aren't guessed before adding the empty column, data.table complains about the molten values not being all the same type, so you have lots of warnings

Anyway--as I mentioned, great to hear that this is being worked on. Just thought I would throw this out as a possible test you would want to look into.

mrdwab · 2018-01-18T15:03:14Z

Also, I don't know whether Stata handles unbalanced wide to long transformation, but from what I can tell, pandas handles it with wide_to_long:

import pandas as pd
import numpy as np
np.random.seed(123)
df = pd.DataFrame({"A1970" : {0 : "a", 1 : "b", 2 : "c"},
                   "A1980" : {0 : "d", 1 : "e", 2 : "f"},
                   "A1990" : {0 : "g", 1 : "h", 2 : "i"},
                   "B1970" : {0 : 2.5, 1 : 1.2, 2 : .7},
                   "B1980" : {0 : 3.2, 1 : 1.3, 2 : .1},
                   "X"     : dict(zip(range(3), np.random.randn(3)))
                  })
df["id"] = df.index

print(df)

  A1970 A1980 A1990  B1970  B1980         X  id
0     a     d     g    2.5    3.2 -1.085631   0
1     b     e     h    1.2    1.3  0.997345   1
2     c     f     i    0.7    0.1  0.282978   2

print(pd.wide_to_long(df, ["A", "B"], i="id", j="year"))

                X  A    B
id year                  
0  1970 -1.085631  a  2.5
   1980 -1.085631  d  3.2
   1990 -1.085631  g  NaN
1  1970  0.997345  b  1.2
   1980  0.997345  e  1.3
   1990  0.997345  h  NaN
2  1970  0.282978  c  0.7
   1980  0.282978  f  0.1
   1990  0.282978  i  NaN

MichaelChirico · 2018-01-18T15:09:23Z

@mrdwab Could you file a separate issue with this? Looks very useful

tdhock · 2020-09-29T03:40:41Z

hi the original issue is solved via

library(data.table)
DT <- data.table(
  val_loss = 1:2,
  val_acc = 3:4,
  loss = 5:6,
  acc = 7:8,
  id = c("A", "B"))
nc::capture_melt_multiple(
  DT,
  column="val_|", function(x)ifelse(x=="", "training", "validation"),
  variable="loss|acc")
#>    id variable training validation
#> 1:  A      acc        7          3
#> 2:  B      acc        8          4
#> 3:  A     loss        5          1
#> 4:  B     loss        6          2
nc::capture_melt_multiple(
  DT,
  set="val_|", function(x)ifelse(x=="", "training", "validation"),
  column="loss|acc")
#>    id        set acc loss
#> 1:  A   training   7    5
#> 2:  B   training   8    6
#> 3:  A validation   3    1
#> 4:  B validation   4    2

jangorecki · 2020-09-29T12:36:33Z

Code is not reproducible due to unkown namespace nc. Anyway before closing it is good to merge unit test to ensure requested functionality will keep working in future.

tdhock · 2020-09-29T16:39:41Z

install.packages('nc') ;)

tdhock · 2020-10-07T03:47:54Z

pure data.table solution using #4731

remotes::install_github("Rdatatable/data.table@melt-custom-variable")
#> Skipping install of 'data.table' from a github remote, the SHA1 (c02fa9e8) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(data.table)
DT <- data.table(
  val_loss = 1:2,
  val_acc = 3:4,
  loss = 5:6,
  acc = 7:8,
  id = c("A", "B"))
melt(DT, measure.vars=measure(
  value.name=function(x)ifelse(x=="", "training", "validation"),
  variable, pattern="(val_|)(loss|acc)"))
#>    id variable validation training
#> 1:  A     loss          1        5
#> 2:  B     loss          2        6
#> 3:  A      acc          3        7
#> 4:  B      acc          4        8
melt(DT, measure.vars=measure(
  set=function(x)ifelse(x=="", "training", "validation"),
  value.name, pattern="(val_|)(loss|acc)"))
#>    id        set loss acc
#> 1:  A validation    1   3
#> 2:  B validation    2   4
#> 3:  A   training    5   7
#> 4:  B   training    6   8

matthiasgomolka changed the title ~~Possibility to provide group labels in variable column when melting a list of columns simultaneously #Enhancement #melt~~ Possibility to provide group labels in variable column when melting a list of columns simultaneously @Enhancement @melt Jan 5, 2018

matthiasgomolka changed the title ~~Possibility to provide group labels in variable column when melting a list of columns simultaneously @Enhancement @melt~~ Possibility to provide group labels in variable column when melting a list of columns simultaneously label:Enhancement #melt Jan 5, 2018

matthiasgomolka changed the title ~~Possibility to provide group labels in variable column when melting a list of columns simultaneously label:Enhancement #melt~~ Possibility to provide group labels in variable column when melting a list of columns simultaneously #Enhancement #melt Jan 5, 2018

MichaelChirico mentioned this issue Jan 18, 2018

FR: expansion of melt functionality for handling names of output #1547

Closed

MichaelChirico added this to the v1.10.6 milestone Jan 18, 2018

mrdwab mentioned this issue Jan 18, 2018

Properly handling the "variable" column after melting unbalanced wide data #2575

Closed

MichaelChirico self-assigned this Feb 2, 2018

mattdowle modified the milestones: v1.10.6, Candidate Mar 30, 2018

mattdowle removed this from the Candidate milestone May 10, 2018

MichaelChirico mentioned this issue Feb 13, 2019

melt.data.table should offer variable to match on the name, rather than the number #3396

Closed

franknarf1 mentioned this issue Apr 2, 2019

Melt corrupts data by using column position instead of name #3487

Closed

Henrik-P mentioned this issue Nov 4, 2019

Ability to explicitly pass index names in melt.data.table measure.vars argument #4027

Closed

MichaelChirico changed the title ~~Possibility to provide group labels in variable column when melting a list of columns simultaneously #Enhancement #melt~~ Possibility to provide group labels in melt variable column when melting a list of columns simultaneously Nov 20, 2019

MichaelChirico added the feature request label Nov 20, 2019

MichaelChirico mentioned this issue Nov 20, 2019

melt converts the variable content to numbers (from its factor levels). #4056

Closed

jangorecki added the reshape dcast melt label Apr 5, 2020

tdhock closed this as completed Sep 29, 2020

jangorecki reopened this Sep 29, 2020

tdhock mentioned this issue Oct 1, 2020

melt with custom variable columns using variable_table attribute #4731

Merged

mattdowle added this to the 1.14.1 milestone May 9, 2021

mattdowle closed this as completed in #4731 May 9, 2021

jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibility to provide group labels in melt `variable` column when melting a list of columns simultaneously #2551

Possibility to provide group labels in melt `variable` column when melting a list of columns simultaneously #2551

matthiasgomolka commented Jan 5, 2018 •

edited

Loading

MichaelChirico commented Jan 5, 2018

matthiasgomolka commented Jan 5, 2018

Henrik-P commented Jan 5, 2018

MichaelChirico commented Jan 8, 2018

mrdwab commented Jan 18, 2018

mrdwab commented Jan 18, 2018

MichaelChirico commented Jan 18, 2018

tdhock commented Sep 29, 2020

jangorecki commented Sep 29, 2020 •

edited

Loading

tdhock commented Sep 29, 2020

tdhock commented Oct 7, 2020

Possibility to provide group labels in melt variable column when melting a list of columns simultaneously #2551

Possibility to provide group labels in melt variable column when melting a list of columns simultaneously #2551

Comments

matthiasgomolka commented Jan 5, 2018 • edited Loading

MichaelChirico commented Jan 5, 2018

matthiasgomolka commented Jan 5, 2018

Henrik-P commented Jan 5, 2018

MichaelChirico commented Jan 8, 2018

mrdwab commented Jan 18, 2018

mrdwab commented Jan 18, 2018

MichaelChirico commented Jan 18, 2018

tdhock commented Sep 29, 2020

jangorecki commented Sep 29, 2020 • edited Loading

tdhock commented Sep 29, 2020

tdhock commented Oct 7, 2020

Possibility to provide group labels in melt `variable` column when melting a list of columns simultaneously #2551

Possibility to provide group labels in melt `variable` column when melting a list of columns simultaneously #2551

matthiasgomolka commented Jan 5, 2018 •

edited

Loading

jangorecki commented Sep 29, 2020 •

edited

Loading