-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
measure() multiple.keyword not used with single groups #5065
Comments
hi @keatingw thanks for the report. I would think that your case should cause an error. The documentation says that when the multiple.keyword is used there should be "multiple value columns" and in your example there would be only one (Petal) so I would propose an error, something like "multiple.keyword (value.name) used as a group name, but only one value captured in that group (Petal), so only one value column would be created; fix by changing group name from value.name to something else, or by changing pattern/sep so that there are more values captured in that group." |
Thanks for the response - I think either supporting the single value case or raising an error are both sensible. The former would potentially be convenient for users but I'm unsure how difficult it would be. That error message sounds good and clear - it may be good to direct the user to using the 'value.name' arg in the outer melt() call as well, since that would get them to the desired output. Just reading your reply I worked through the functionality that I think is used for this - the list construction of measure variables. I think it's possible the error (if that's the route taken) should be here too? library(data.table)
x = as.data.table(iris)
melt(x[1:2], measure.vars = list(Petal = c("Petal.Length", "Petal.Width"), Sepal = c("Sepal.Length", "Sepal.Width")))
## works as intended (notwithstanding the variable conversion to numbers)
# Species variable Petal Sepal
# <fctr> <fctr> <num> <num>
#1: setosa 1 1.4 5.1
#2: setosa 1 1.4 4.9
#3: setosa 2 0.2 3.5
#4: setosa 2 0.2 3.0
melt(x[1:2, c("Petal.Length", "Petal.Width")], measure.vars = list(Petal = c("Petal.Length", "Petal.Width")))
## single group case ignores list names
# variable value
# <fctr> <num>
#1: Petal.Length 1.4
#2: Petal.Length 1.4
#3: Petal.Width 0.2
#4: Petal.Width 0.2 |
About "either supporting the single value case or raising an error are both sensible. The former would potentially be convenient for users but I'm unsure how difficult it would be." It would probably not be difficult, but I would argue that it would be quite confusing, since then there would be two different ways to get a single value column (one with value.name/multiple.keyword, one without). I would argue that the error message should be preferred to avoid confusion, and to make sure there is only one way to get a single value column (DONT use value.name/multiple.keyword) About the behavior when measure() is not used. The ?melt docs suggest that it is OK to have a measure.vars list with one element,
so I would propose only adding the error when measure is used. The docs also say that the names in the measure.vars list should take precedence:
but your example shows that the value.name arg takes precedence over the name in the measure.vars list, in the case of only one molten data values column. I think this is a bug. We could fix by either changing the docs or the functionality. What do you think? |
On measure(): What's your thinking on potential problems from allowing a single unique match? In the regular (multiple values) case, trying to specify both (i.e. capture group and a value.name in the melt call) yields a helpful warning and gives precedence to the measure() part - this feels like it may be more consistent with allowing a single value.name (x = as.data.table(iris)[1]) #(as before)
melt(
x,
id.vars = "Species",
measure.vars = measure(value.name, measurement, pattern = "(Petal|Sepal)\\.(Length|Width)"),
value.name = c("a", "b")
)
# Species measurement Sepal Petal
# <fctr> <char> <num> <num>
# 1: setosa Length 5.1 1.4
# 2: setosa Width 3.5 0.2
# Warning message:
# 'value.name' provided in both 'measure.vars' and 'value.name argument'; value provided in 'measure.vars' is given precedence. On a named list in measure.vars: |
hey @keatingw see the PR linked above for a fix (which was very simple). |
That PR is exactly the behaviour I would've expected naively - thanks so much for all your work. |
Yes we've come up against the convenience factor a few times before. It's because user code might pass the value as a variable: sometimes the variable may contain multiple and sometimes it could contain single. The user having to add a branch to their code to call |
First up, thank you very much for implementing
measure()
- it's a fantastic addition to the reshaping tools.When using the multiple keyword (value.name by default) it's ignored in the case where only one value is matched in that capture group. This might be intended behaviour, so feel free to close this issue if so.
The documentation for
measure()
doesn't seem to suggest this behaviour is deliberate, but it might just be a point of clarification to avoid surprises (e.g. describing when measure overrides the value name argument and vice versa):#
Minimal reproducible example
#
Output of sessionInfo()
The text was updated successfully, but these errors were encountered: