Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using get on a variable column breaks when variable is already a column name #3052

Closed
nolanp2 opened this issue Sep 19, 2018 · 4 comments
Closed

Comments

@nolanp2
Copy link

nolanp2 commented Sep 19, 2018

I've seen get suggested as a standard way of referring to column names via a variable. In the event that the variable matches a column name in the dt, this seems to cause issues. Here's a simple example without any issues:

dt <- data.table(a = 1:10,b = 1:10)

col1 = names(dt)[1]
col2 = names(dt)[2]

dt[,newcol:=get(col1)/get(col2)]

Now I make a slight change to dt, setting the first column name to match the variable col1:

dt <- data.table(col1 = 1:10,b = 1:10)

col1 = names(dt)[1]
col2 = names(dt)[2]

dt[,newcol:=get(col1)/get(col2)]

returns error:

Error in get(col1) : invalid first argument

@franknarf1
Copy link
Contributor

franknarf1 commented Sep 19, 2018

Are you wanting or expecting different behavior? Currently, you can do..

dt[,newcol:=get(..col1)/get(..col2)]

with the .. prefix meaning "up one level". It's a recent feature (from the last year) and part of a broader ongoing discussion #2655 (comment)

Side note: if you use names(DT) on its own, it will be modified along with DT:

DT = data.table(A = 1, B = 2)
nn = names(DT)
setnames(DT, c("C", "D"))
nn
# [1] "C" "D"

#512 and https://stackoverflow.com/questions/15913417/why-does-data-table-update-namesdt-by-reference-even-if-i-assign-to-another-v?noredirect=1&lq=1

@nolanp2
Copy link
Author

nolanp2 commented Sep 20, 2018

That names feature is bizarre, good to know.

As for the issue raised, it seems to me it makes get() a potentially dangerous way of referencing variables. get(var) on its own is often suggested on stackoverflow as a solution, but could easily result in a situation where a function breaks when a DT with the variable name in its colnames is passed in. Your alternative using get(..var) is safer, but I've never seen it mentioned. Maybe some documentation around the safest wayd of handling these situations would save people some pain in the future?

@jangorecki
Copy link
Member

jangorecki commented Sep 20, 2018

@nolanp2 There is another way using base R feature called computing on the language. Personally I actually prefer this method, as you use feature of a language you coding in.

library(data.table)
dt <- data.table(col1 = 1:10,b = 1:10)
col1 = names(dt)[1]
col2 = names(dt)[2]
qj = as.call(list(
  as.name(":="),
  as.name("newcol"),
  call("/",
       as.name(col1),
       as.name(col2))
))
print(qj) # qj stands for quoted `j` argument, lets print it to see expression we built
#`:=`(newcol, col1/b)
dt[,eval(qj)]

Worth to mention it is quite exceptional feature of R language, you can read more about it in official R language manual in Computing on the language chapter.

@MichaelChirico
Copy link
Member

Basically data.table evaluation means that when col1 is found as a column name, that column (not its name) is passed to get, which doesn't know what to do when passed a vector of values.

In addition to Jan's & Frank's recommendations, you should also be able to use .SDcols:

dt[ , newcol := .SD[[1L]]/.SD[[2L]], .SDcols = c(col1, col2)]

Depends on what makes the most sense to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants