Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When column name is name of variable get(variable) generates an error and eval(variable) returnes the incorrect values #4878

Open
AdrianAntico opened this issue Jan 19, 2021 · 6 comments
Labels
programming parameterizing queries: get, mget, eval, env

Comments

@AdrianAntico
Copy link

Not sure if this is an intended consequence of how data.table was designed or not. The issue is that if I have a variable "variable_name" set to a value and I want data.table to reference that value I would typically use get(variable_name) or eval(variable_name) to operate on the value stored in the variable. However, if the variable "variable_name" is a name of a column in the data.table, then data.table throws an error.

I think it would be ideal to have data.table use the stored value in the variable when called by get() or eval().

xx <- data.table::data.table(A = seq(1,100,1), B = seq(101,200,1))
# error

A <- "B"

xx[, sum(get(A))]
# Error in get(A) : invalid first argument

xx[, get(A)]
# Error in get(A) : invalid first argument

xx[, get(eval(A))]
# Error in get(eval(A)) : invalid first argument

xx[, eval(A)]
# [1]   1   2   3   4   5   6   7   8   9  10  11 ... (values from column name A instead of column name B)

packageVersion("data.table")
[1] ‘1.13.2’

sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_4.0.3 tools_4.0.3

@jangorecki jangorecki added the programming parameterizing queries: get, mget, eval, env label Jan 19, 2021
@jangorecki
Copy link
Member

jangorecki commented Jan 19, 2021

Thank you for reporting. As you noticed the issue is caused by overlapping names of parent scope and data.table scope. It is fragile to rely on a behavior in such situations. I proposed alternative interface for parameterizing queries where such problems are eliminated by providing parameters to a non-lazily evaluated argument. Using proposed interfaces from #4304 all queries runs as expected as the substitution is handled as expected.

library(data.table)
xx = data.table(A = seq(1,100,1), B = seq(101,200,1))
A = "B"
xx[, sum(A), env=list(A=A)]
xx[, A, env=list(A=A)]
xx[, .(A), env=list(A=A)]

I usually prefer dot-prefix for variables that should be substituted, so here it would be xx[, sum(.A), env=list(.A=A)], but I kept it like this to show that overlapping names are not a problem using proposed interface.

@AdrianAntico
Copy link
Author

@jangorecki Thanks for the reply. I flipped through 4304 and I have run into some of those issues. Is it possible to extend the usage of eval() and get() (and mget()). For cases with ambiguous environment references, add the env option?

Example being, data[, ":=" (eval(A) = get(A))]

Also, I'd like to note that when using .SD the example works as intended.

library(data.table)
xx = data.table(A = seq(1,100,1), B = seq(101,200,1))
A = "B"
xx[, sum(A), env=list(A=A)]
xx[, A, env=list(A=A)]
xx[, .(A), env=list(A=A)]

xx[, .SD,  .SDcols = eval(A)]
xx[, sum(.SD),  .SDcols = eval(A)] # 15050

@shrektan
Copy link
Member

shrektan commented Jan 19, 2021

Maybe off the topic but personally I recommend to avoid such cases by using different naming for external variables and the data.table columns. More specifically, use smaller cases for R variables and use capital cases for data.table column names.

I'm not saying this is not an issue but besides of the programming issue, it always introduces confusion for people who read the code, if the column names can't be distinguished from the external variables, easily.

Of course, this is just my personal option, but I do consider it as kind of best practices when using data.table.

@AdrianAntico
Copy link
Author

@shrektan No doubt it's good practice to name things properly. However, I've exposed a lot of functions in my package for others to use and in an effort to ensure they don't run into issues like the above it would be nice for there to be a way to resolve them properly without burdening the user.

@jangorecki
Copy link
Member

get, mget and eval could be extended, but I am not sure if they should. Maintenance effort related to this variable substitution is bigger comparing to proposed alternative.
Moreover, supporting data[, ":=" (eval(A) = get(A))] would be even more difficult as functionality to substitute LHS of = is not easily available. Normally user should do setNames(eval(A), A) although I don't know if that will work in general, and quite likely it will not work in this example or .(eval(A)=eval(A)).
On the other hand, solution proposed in #4304 supports substitution on LHS of =. Its logic could be re-used to make it work for get but proposed solution is meant to not only help users to more easily and precisely express complex queries but also reduce maintenance effort caused by substitution methods used so far.

@AdrianAntico
Copy link
Author

Fair enough

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
programming parameterizing queries: get, mget, eval, env
Projects
None yet
Development

No branches or pull requests

3 participants