-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When column name is name of variable get(variable) generates an error and eval(variable) returnes the incorrect values #4878
Comments
Thank you for reporting. As you noticed the issue is caused by overlapping names of parent scope and data.table scope. It is fragile to rely on a behavior in such situations. I proposed alternative interface for parameterizing queries where such problems are eliminated by providing parameters to a non-lazily evaluated argument. Using proposed interfaces from #4304 all queries runs as expected as the substitution is handled as expected. library(data.table)
xx = data.table(A = seq(1,100,1), B = seq(101,200,1))
A = "B"
xx[, sum(A), env=list(A=A)]
xx[, A, env=list(A=A)]
xx[, .(A), env=list(A=A)] I usually prefer dot-prefix for variables that should be substituted, so here it would be |
@jangorecki Thanks for the reply. I flipped through 4304 and I have run into some of those issues. Is it possible to extend the usage of eval() and get() (and mget()). For cases with ambiguous environment references, add the env option? Example being, data[, ":=" (eval(A) = get(A))] Also, I'd like to note that when using .SD the example works as intended.
|
Maybe off the topic but personally I recommend to avoid such cases by using different naming for external variables and the data.table columns. More specifically, use smaller cases for R variables and use capital cases for data.table column names. I'm not saying this is not an issue but besides of the programming issue, it always introduces confusion for people who read the code, if the column names can't be distinguished from the external variables, easily. Of course, this is just my personal option, but I do consider it as kind of best practices when using data.table. |
@shrektan No doubt it's good practice to name things properly. However, I've exposed a lot of functions in my package for others to use and in an effort to ensure they don't run into issues like the above it would be nice for there to be a way to resolve them properly without burdening the user. |
|
Fair enough |
Not sure if this is an intended consequence of how data.table was designed or not. The issue is that if I have a variable "variable_name" set to a value and I want data.table to reference that value I would typically use get(variable_name) or eval(variable_name) to operate on the value stored in the variable. However, if the variable "variable_name" is a name of a column in the data.table, then data.table throws an error.
I think it would be ideal to have data.table use the stored value in the variable when called by get() or eval().
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.0.3 tools_4.0.3
The text was updated successfully, but these errors were encountered: