Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weird exception when by contains get #1985

Open
valentas-kurauskas opened this issue Jan 11, 2017 · 5 comments
Open

weird exception when by contains get #1985

valentas-kurauskas opened this issue Jan 11, 2017 · 5 comments
Labels
programming parameterizing queries: get, mget, eval, env

Comments

@valentas-kurauskas
Copy link

valentas-kurauskas commented Jan 11, 2017

Just copied my question on stackoverflow:

get function usually works without problems in data.table, but I am not able to understand the reason of this bug.

library(data.table)
tb<-data.table(x=c(1,2), y=c(3,4), z=c(5,6), w=c("a","b"))
tb[w != "b", .(x=sum(x)), by=.(y, zz=z)] #OK
#    y zz x
# 1: 3  5 1
tb[, .(x=sum(x)), by=.(y, zz=get("z"))] #OK
#    y zz x
# 1: 3  5 1
# 2: 4  6 2
tb[w != "b", .(x=sum(x)), by=.(y, zz=get("z"))] #not OK?!

Error in get("z") : object 'z' not found

(I use R version 3.3.2 and data.table version 1.9.6.)

@valentas-kurauskas valentas-kurauskas changed the title weird R data.table bug when by contains get weird R data.table exception when by contains get Jan 11, 2017
@valentas-kurauskas valentas-kurauskas changed the title weird R data.table exception when by contains get weird exception when by contains get Jan 11, 2017
@valentas-kurauskas
Copy link
Author

This arose for me in a real but more complex setting, where I wanted to update and aggregate in one go, and pass a column name as a parameter, e.g. by=.(y=y+1, zz=get(arg)). In my opinion, it would be nice and more consistent if an exception didn't arise here.

@mhdann
Copy link

mhdann commented Mar 18, 2020

This bug occurred for me in a real-world situation today.

I would like to add my minimum reproducible example that I typed up before finding your post.

library(data.table)
dtIris <- as.data.table(iris)

speciesVar <- "Species"

# Working:
head(dtIris[, .N, by = .(var = get(speciesVar), Petal.Width)])

# Add  a conditional to I clause and it does not work:
dtIris[Sepal.Length > 4, .N, by = .(var = get(speciesVar), Petal.Width)]

# REMOVE one of the by list and it works again!
dtIris[Sepal.Length > 4, .N, by = .(var = get(speciesVar))]

@jangorecki
Copy link
Member

jangorecki commented Mar 19, 2020

@mhdann thank you for providing your example, as a workaround you can currently use

ans = dtIris[Sepal.Length > 4, .N, by = c(speciesVar, "Petal.Width")]
setnames(ans, speciesVar, "var")

This kind of parameter substitution is well addressed by pending PR #4304

dtIris[Sepal.Length > 4, .N, by = .(var = .speciesVar, Petal.Width),
       env = list(.speciesVar = speciesVar)]

@jangorecki jangorecki added the programming parameterizing queries: get, mget, eval, env label Apr 5, 2020
@dshilane
Copy link

dshilane commented Jan 9, 2024

Thank you for raising this issue. I can offer a few responses based on the example that @mhdann provided. First, let's start with the portion of code that leads to the error message:

library(data.table)
dtIris <- as.data.table(iris)

speciesVar <- "Species"

# Working:
head(dtIris[, .N, by = .(var = get(speciesVar), Petal.Width)])

# Add  a conditional to I clause and it does not work:
dtIris[Sepal.Length > 4, .N, by = .(var = get(speciesVar), Petal.Width)]

The last line produces the following error message:

Error in get(speciesVar) : object 'Species' not found

I think this error is triggering because the data.table is looking for speciesVar in the outside environment. You would trigger the same kind of error message with:

list(get(speciesVar), dtIris[, Petal.Width])
Error in get(speciesVar) : object 'Species' not found

In addition to the suggestions above, I can offer a few other workarounds:

dtIris[Sepal.Length > 4, .N, by = .(var = get(speciesVar), pw = get("Petal.Width"))]

In this example, using get() statements for each of the variables in the by step ensures that both sub-objects of the list are the same length.

Alternatively, the getDTeval package can be used to translate the get() statements before evaluation.

library(getDTeval)
getDTeval(the.statement = expression(dtIris[Sepal.Length > 4, .N, by = .(var = get(speciesVar), Petal.Width)]), return.as = "all")

Finally, you could also work around the issue with:

dtIris[Sepal.Length > 4, .N, by = .(var = dtIris[, get(speciesVar)], Petal.Width)]

In this case, the by step is invoking an outside call to dtiris, which is not efficient.

@jangorecki
Copy link
Member

jangorecki commented Jan 9, 2024

Check out new env argument, addresses such problems in a much cleaner way

dtIris[Sepal.Length > 4, .N, by = .(var = speciesVar, Petal.Width), env=list(speciesVar="Species")]

you can even pass column name var as parameter as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
programming parameterizing queries: get, mget, eval, env
Projects
None yet
Development

No branches or pull requests

4 participants