Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

measure() multiple.keyword not used with single groups #5065

Closed
keatingw opened this issue Jul 6, 2021 · 7 comments · Fixed by #5112
Closed

measure() multiple.keyword not used with single groups #5065

keatingw opened this issue Jul 6, 2021 · 7 comments · Fixed by #5112
Assignees
Labels
dev reshape dcast melt
Milestone

Comments

@keatingw
Copy link

keatingw commented Jul 6, 2021

First up, thank you very much for implementing measure() - it's a fantastic addition to the reshaping tools.

When using the multiple keyword (value.name by default) it's ignored in the case where only one value is matched in that capture group. This might be intended behaviour, so feel free to close this issue if so.

The documentation for measure() doesn't seem to suggest this behaviour is deliberate, but it might just be a point of clarification to avoid surprises (e.g. describing when measure overrides the value name argument and vice versa):

multiple.keyword | A string, if used as a group name, then measure returns a list and melt returns multiple value columns (with names defined by the unique values in that group). Otherwise if the string not used as a group name, then measure returns a vector and melt returns a single value column.

# Minimal reproducible example

library(data.table)
(x = as.data.table(iris)[1])
#    Sepal.Length Sepal.Width Petal.Length Petal.Width Species                                                                                                                        
# 1:          5.1         3.5          1.4         0.2  setosa

# Melt with multiple groups and value.name keyword, behaves as expected
melt(
  x,
  id.vars = "Species",
  measure.vars = measure(value.name, measurement, pattern = "(Petal|Sepal)\\.(Length|Width)")
)
#    Species measurement Sepal Petal                                                                                                                                                  
# 1:  setosa      Length   5.1   1.4                                                                                                                                                  
# 2:  setosa       Width   3.5   0.2 

# Melt with keyword matching only Sepal columns, and melt reverts to using the value.name argument
melt(
  x,
  id.vars = "Species",
  measure.vars = measure(value.name, measurement, pattern = "(Petal)\\.(Length|Width)")
)
#    Species measurement value                                                                                                                                                        
# 1:  setosa      Length   1.4                                                                                                                                                        
# 2:  setosa       Width   0.2 

# Output of sessionInfo()

r$> sessionInfo()                                                                                                                                                                   
R version 4.1.0 (2021-05-18)                                                                                                                                                        
Platform: x86_64-w64-mingw32/x64 (64-bit)                                                                                                                                           
Running under: Windows 10 x64 (build 19042)                                                                                                                                         
                                                                                                                                                                                    
Matrix products: default                                                                                                                                                            
                                                                                                                                                                                    
locale:                                                                                                                                                                             
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                       LC_TIME=English_Australia.1252      
                                                                                                                                                                                    
attached base packages:                                                                                                                                                             
[1] stats     graphics  grDevices utils     datasets  methods   base                                                                                                                
                                                                                                                                                                                    
other attached packages:                                                                                                                                                            
[1] data.table_1.14.1                                                                                                                                                               
                                                                                                                                                                                    
loaded via a namespace (and not attached):                                                                                                                                          
[1] compiler_4.1.0    
@tdhock tdhock added the reshape dcast melt label Aug 23, 2021
@tdhock tdhock self-assigned this Aug 23, 2021
@tdhock
Copy link
Member

tdhock commented Aug 23, 2021

hi @keatingw thanks for the report. I would think that your case should cause an error. The documentation says that when the multiple.keyword is used there should be "multiple value columns" and in your example there would be only one (Petal) so I would propose an error, something like "multiple.keyword (value.name) used as a group name, but only one value captured in that group (Petal), so only one value column would be created; fix by changing group name from value.name to something else, or by changing pattern/sep so that there are more values captured in that group."

@keatingw
Copy link
Author

Thanks for the response - I think either supporting the single value case or raising an error are both sensible. The former would potentially be convenient for users but I'm unsure how difficult it would be.

That error message sounds good and clear - it may be good to direct the user to using the 'value.name' arg in the outer melt() call as well, since that would get them to the desired output.

Just reading your reply I worked through the functionality that I think is used for this - the list construction of measure variables. I think it's possible the error (if that's the route taken) should be here too?

library(data.table)
x = as.data.table(iris)
melt(x[1:2], measure.vars = list(Petal = c("Petal.Length", "Petal.Width"), Sepal = c("Sepal.Length", "Sepal.Width")))
## works as intended (notwithstanding the variable conversion to numbers)
#   Species variable Petal Sepal                                                                                                                                                     
#    <fctr>   <fctr> <num> <num>                                                                                                                                                     
#1:  setosa        1   1.4   5.1                                                                                                                                                     
#2:  setosa        1   1.4   4.9                                                                                                                                                     
#3:  setosa        2   0.2   3.5                                                                                                                                                     
#4:  setosa        2   0.2   3.0 
melt(x[1:2, c("Petal.Length", "Petal.Width")], measure.vars = list(Petal = c("Petal.Length", "Petal.Width")))
## single group case ignores list names
#         variable value                                                                                                                                                               
#        <fctr> <num>                                                                                                                                                               
#1: Petal.Length   1.4                                                                                                                                                               
#2: Petal.Length   1.4                                                                                                                                                               
#3:  Petal.Width   0.2                                                                                                                                                               
#4:  Petal.Width   0.2

@tdhock
Copy link
Member

tdhock commented Aug 24, 2021

About "either supporting the single value case or raising an error are both sensible. The former would potentially be convenient for users but I'm unsure how difficult it would be." It would probably not be difficult, but I would argue that it would be quite confusing, since then there would be two different ways to get a single value column (one with value.name/multiple.keyword, one without). I would argue that the error message should be preferred to avoid confusion, and to make sure there is only one way to get a single value column (DONT use value.name/multiple.keyword)

About the behavior when measure() is not used. The ?melt docs suggest that it is OK to have a measure.vars list with one element,

measure.vars: Measure variables for 'melt'ing. Can be missing, vector,
          list, or pattern-based.
...
 'list' is a generalization of the vector version - each
              element of the list (which should be 'integer' or
              'character' as above) will become a 'melt'ed column.

so I would propose only adding the error when measure is used.

The docs also say that the names in the measure.vars list should take precedence:

value.name: name for the molten data values column(s). The default name
          is ''value''. Multiple names can be provided here for the
          case when 'measure.vars' is a 'list', though note well that
          the names provided in 'measure.vars' take precedence.

but your example shows that the value.name arg takes precedence over the name in the measure.vars list, in the case of only one molten data values column. I think this is a bug. We could fix by either changing the docs or the functionality. What do you think?

@keatingw
Copy link
Author

keatingw commented Aug 25, 2021

On measure():
I see your point on the confusion - my initial view (docs notwithstanding) was that the behaviour of the value.name keyword should be independent of how many unique values there are?

What's your thinking on potential problems from allowing a single unique match?

In the regular (multiple values) case, trying to specify both (i.e. capture group and a value.name in the melt call) yields a helpful warning and gives precedence to the measure() part - this feels like it may be more consistent with allowing a single value.name

(x = as.data.table(iris)[1]) #(as before)
melt(
  x,
  id.vars = "Species",
  measure.vars = measure(value.name, measurement, pattern = "(Petal|Sepal)\\.(Length|Width)"),
  value.name = c("a", "b")
)
#    Species measurement Sepal Petal
#     <fctr>      <char> <num> <num>
# 1:  setosa      Length   5.1   1.4
# 2:  setosa       Width   3.5   0.2
# Warning message:
# 'value.name' provided in both 'measure.vars' and 'value.name argument'; value provided in 'measure.vars' is given precedence.

On a named list in measure.vars:
Great pickup - I hadn't noticed that in the value.name doc. I think the way it is currently documented makes sense (i.e. names from measure.vars list take precedence), so may be best to change functionality (seems less likely to break prior uses given the docs).

@tdhock
Copy link
Member

tdhock commented Aug 25, 2021

hey @keatingw see the PR linked above for a fix (which was very simple).
About potential problems from allowing a single unique match/value column, there is potential confusion now that there are two ways of using measure() to get the same result, #5112 (comment) but I guess that is OK (no need to add an error if that functionality may be convenient for users sometimes).

@keatingw
Copy link
Author

That PR is exactly the behaviour I would've expected naively - thanks so much for all your work.

@mattdowle
Copy link
Member

Yes we've come up against the convenience factor a few times before. It's because user code might pass the value as a variable: sometimes the variable may contain multiple and sometimes it could contain single. The user having to add a branch to their code to call melt in a different way for the single case is what would be inconvenient. Yes the downside is there might be two ways to do the same thing for the single case. But then presumably there is an error for the argument that is supposed to only accept a single if it receives multiple, which should be appreciated by users when intending to pass a single. In other words, the user intention as to whether to intend to be passing single or single/multiple is conveyed by which argument they use.

@mattdowle mattdowle added this to the 1.14.1 milestone Aug 26, 2021
@mattdowle mattdowle added the dev label Aug 26, 2021
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev reshape dcast melt
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants