Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overbroad issue about .. being defined in calling scope? #2988

Open
MichaelChirico opened this issue Jul 31, 2018 · 14 comments
Open

Overbroad issue about .. being defined in calling scope? #2988

MichaelChirico opened this issue Jul 31, 2018 · 14 comments
Labels
programming parameterizing queries: get, mget, eval, env
Milestone

Comments

@MichaelChirico
Copy link
Member

MichaelChirico commented Jul 31, 2018

Just encountered this...

idx = 5
DT = data.table(
  ID = rep(1:3,each = 4),
  flag = as.logical(rep(0:1, 6)),
  N = 1:12,
  count = 12:1
)

DT[ , dcast(.SD, ID ~ flag + N, value.var = 'count')
    ][ , {
      idx = grep('^TRUE', names(.SD))
      y1 = .SD[ , ..idx]
      y2 = .SD[ , -..idx]
      NULL
    }]

Warning messages:
1: In [.data.table(.SD, , ..idx) :
Both 'idx' and '..idx' exist in calling scope. Please remove the '..idx' variable in calling scope for clarity.
2: In [.data.table(.SD, , -..idx) :
Both 'idx' and '..idx' exist in calling scope. Please remove the '..idx' variable in calling scope for clarity.

Yes, idx and ..idx exist in calling scope, but ..idx appears "down a level" (reverting to the analogy of .. as "up a level"), so the code as written seems pretty natural to me.

I get where the message comes from, and I'm not sure how much effort it's worth to fully account for this type of possible nesting, but wanted to raise attention anyway.

To get rid of this warning, I'm resorting to with = FALSE, so this is related to #2826

This becomes an error if there's no idx in globalenv:

rm(idx)
DT[ , dcast(.SD, ID ~ flag + N, value.var = 'count')
    ][ , {
      idx = grep('^TRUE', names(.SD))
      y1 = .SD[ , ..idx]
      y2 = .SD[ , -..idx]
      NULL
    }]

Error in [.data.table(DT[, dcast(.SD, ID ~ flag + N, value.var = "count")], :
Variable 'idx' is not found in calling scope. Looking in calling scope because this symbol was prefixed with .. in the j= parameter.

@jangorecki

This comment has been minimized.

@MichaelChirico

This comment has been minimized.

@liqg
Copy link

liqg commented Sep 14, 2018

I wrote a .. function as an alternative to .. prefix that can/may explicitly get variable outside the data.table scope.

.. <- function (x, env = parent.frame()) 
{
    stopifnot(inherits(x, "character"))
    stopifnot(length(x) == 1)
    get(x, envir = parent.env(env))
}

It is useful when the variable names are duplicate with data.table col names.
For example, after we fread a file, the col names or number could be not fixed and changeable,
with .. function, we do not have to worry about it may be same with a col name when we create a new variable.

for example.

library(data.table)
a <- data.table(x=1:2,y=2:1,z=c("x","y"))
x <- "y"
# works in `i`
a[get(..("x")) == 1] #  a subset when `y==1`
a[..x==1] # Error in eval(.massage i(isub), x, parent.frame()) :   object '..x' not found
a[get(..x)==1] # Error in get(..x) : object '..x' not found

# works in `j`
a[, get(..("x"))] # a vector of the col `y` 
a[, ..x] # a data.table with col `y`

Although we need use the additional get to obtain a col of data.table, it has advantages when we want to use it alone:

a[z==..("x")] # a subset data.table when `z=="y"`
a[, z==..("x")] # a logical vector for `z=="y"`

Back to the question of @MichaelChirico, if we replace .. prefix with the .. function, the conflict will be eliminated.

I also found that the .. function can not be use in the LHS of := assignment, but we can use the get , for example.

a[, ..("x"):=9] # Error in get(x, envir = parent.env(env)) : object 'x' not found
a[, get("x"):=9] # the `y` will be replaced with 9

And it does not work in by , for example,

a[, .N, by=..("x")]# Error

Last year, I put this solution in SO, The old version of .. function is:

`..old` <- function(x){
    stopifnot(inherits(x, "character"))
    stopifnot(length(x)==1)
    get(x, parent.frame(4))
}

The old version worked fine in data.table 1.10.4, but when I updated to 1.11.4, it failed in i, the I wrote the new one, which is correct in both versions. However, I am not a skilled R programer, I am not sure this approach is a reasonable way for such jobs.

In my opinion, maybe the .. prefix is too complicated, however, .. function explicitly tell R that it is a function calling but not a data.table col , which also allows the col names of data.table to start with "..".

Could we have an official support for .. function @mattdowle, @arunsrinivasan , and support it in i, j, and by?

Thanks!

@mattdowle
Copy link
Member

mattdowle commented Sep 22, 2018

..() function is very interesting idea, thanks.
The reason to choose .. prefix was imagine a situation where an expression is passed to ..(some expression) and that can make the internal output of all.vars() harder to deal with, and harder to know how to optimize. Whenever get is used (or something like it, like ..()) we have to turn off optimizations because get might be getting something that varies for each group. In contrast, a .. prefix on a variable means that it's for sure a symbol (baked into the syntax) which seems a bit more robust to me, and we know it's a constant. Although, yes, harder to implement. And the current situation where .. prefix is only available in j is of course confusing.
It will be difficult to put the toothpaste back in the tube on column names starting ... Since .. prefix on symbols seems to be liked and now used in the wild. We just need to implement .. prefix everywhere basically so it works for all the cases you showed.
We could have ..() too as well. Especially since it solves Michael's issue here. I haven't grasped that yet.
Maybe .. prefix needs a fix for compound queries.

@mattdowle mattdowle added this to the 1.12.0 milestone Sep 22, 2018
@Henrik-P
Copy link

Henrik-P commented Oct 26, 2018

I stumbled over the same warning message as Michael. I doubt my example contains any additional information to this issue, but since it was so simple I thought I just as well could add it here for illustration.

I wanted to update certain rows in certain columns with values from other rows in the same columns. Stripped down version:

d <- data.table(x = 1:3, y = 4:6, z = c("a", "b", "c"))
cols <- c("x", "z")
d[2, (cols) := d[3, ..cols]] 

Warning message:
In [.data.table(d, 3, ..cols) :
Both 'cols' and '..cols' exist in calling scope. Please remove the '..cols' variable in calling scope for clarity.

@franknarf1
Copy link
Contributor

franknarf1 commented Nov 6, 2018

SO post to update (exactly the same as Henrik's above): https://stackoverflow.com/a/53180736

Another: https://stackoverflow.com/q/53227216

@mattdowle mattdowle modified the milestones: 1.12.0, 1.12.2 Jan 6, 2019
@jangorecki jangorecki modified the milestones: 1.12.2, 1.12.4 Jan 24, 2019
@sanjmeh
Copy link

sanjmeh commented Mar 26, 2019

Any progress on this issue? I also get the warning Both 'colname' and '..colname' exist in calling scope. Please remove the '..colname' variable in calling scope for clarity.
While I cannot see any use of colname anywhere else. ``

@andrewrech
Copy link

I will add my two cents having also run into these issues...

@liqg, .. as function makes a lot of sense to me and I would use it in i if available, and it additionally seems to solve the issue astutely raised by @MichaelChirico and others here.

However, it seems to me that .. and ..() existing is confusing. They would be used in non-overlapping places to non-overlapping effect. Given that .. is liked and used, it does seem that the path forward is implementing .. elsewhere vs. adding ..(). I think additional complexity to variable scoping in data.table would be a burden.

@jangorecki
Copy link
Member

jangorecki commented Sep 2, 2019

related: https://stackoverflow.com/questions/57737697/how-to-use-data-table-a-reactive-expression-in-shiny
trying to use .. not on a symbol but on a call $(input, var)

DT[, ..input$var]

@arvindkumarc
Copy link

arvindkumarc commented Nov 1, 2019

Although, there is an alternate.

person[, {
   cols = names(.SD)
   .SD[, cols, with=F]
   }, .(age)]```

@mattdowle mattdowle modified the milestones: 1.12.7, 1.12.9 Dec 8, 2019
@dracodoc
Copy link
Contributor

dracodoc commented Apr 1, 2020

I feel .. has its limitations as it didn't specify the right side boundary, thus the call like ..input$var could have problem.

I'm very familiar with data.table and use it every day, still I often need to check reference or my own notes when I need to use variable for column names.

The i usage (like dt[(col_name), ]) and j usage are different, .. sometimes doesn't work, in the end I often need to use dt[[col_name]] which is quite ugly.

Recently I saw glue package and was wondering if we can use something like {col_name}. {} may not work but some symbol to wrap around an expression should be fine. Then make it work universally across i, j will be really great.

@jangorecki
Copy link
Member

@dracodoc there is a new feature coming to address what ..prefix is meant to help with. You write variable name as is, and then just redefine it via new arg env = list(var = "mpg").

@dracodoc
Copy link
Contributor

dracodoc commented Apr 2, 2020

Thanks, do you mean something like this?
dt[, sum(var), env = list(var = "mpg")]

I may have the variable defined somewhere already, for example I have a function to process a column in dt

convert_col <- function(dt, col_name) {
  # how should I write this?
  dt[, col_name := xxxx, env = list(col_name = col_name)]
}

All we need is just to recognize that some variables are from outside of dt scope...

@jangorecki
Copy link
Member

If you are passing character scalar into col_name, then yes, it will work exactly like this. Of course in this example you could easily wrap variable into ( and it would work as well.

@jangorecki jangorecki added the programming parameterizing queries: get, mget, eval, env label Apr 5, 2020
@mattdowle mattdowle modified the milestones: 1.13.1, 1.13.3 Oct 17, 2020
@jangorecki jangorecki removed this from the 1.14.3 milestone Jul 19, 2022
@jangorecki jangorecki added this to the 1.14.5 milestone Jul 19, 2022
@jangorecki jangorecki modified the milestones: 1.14.11, 1.15.1 Oct 29, 2023
@MichaelChirico MichaelChirico modified the milestones: 1.16.0, 1.17.0 Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
programming parameterizing queries: get, mget, eval, env
Projects
None yet
Development

No branches or pull requests

10 participants