Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation fails for list columns with closures that don't have class "function" #4814

Closed
mb706 opened this issue Nov 18, 2020 · 8 comments · Fixed by #4815
Closed

Aggregation fails for list columns with closures that don't have class "function" #4814

mb706 opened this issue Nov 18, 2020 · 8 comments · Fixed by #4815
Labels
non-atomic column e.g. list columns, S4 vector columns
Milestone

Comments

@mb706
Copy link

mb706 commented Nov 18, 2020

f <- function(x) x
f2 <- f
class(f2) <- "x"
dt <- data.table(f, f2)
f <- function(x) x
f2 <- f
class(f2) <- "x"
dt <- data.table(id = 1, f, f2)
dt[, .(f), by = id]
#>    id             f
#> 1:  1 <function[1]>
dt[, .(f2), by = id]
#> Error in as.vector(x, "list") : 
#>   cannot coerce type 'closure' to vector of type 'list'

I would expect the result of .(f2) aggregation to be similar to the .(f) aggregation.

(Maybe this is related to the fact that as.list(f) gives a result and as.list(f2) fails? But I don't think it should fail since the content of the f column is not the result of as.list(f).)

> sessionInfo()
R Under development (unstable) (2020-11-14 r79432)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Fedora 32 (Thirty Two)

Matrix products: default
BLAS:   /home/user/R-devel/lib64/R/lib/libRblas.so
LAPACK: /home/user/R-devel/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
 [7] LC_PAPER=en_US.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.13.2

loaded via a namespace (and not attached):
[1] compiler_4.1.0
@ecoRoland2
Copy link

ecoRoland2 commented Nov 18, 2020

> traceback()
10: as.list.default(X)
9: as.list(X)
8: lapply(x, runlock, current_depth = current_depth + 1L)
7: FUN(X[[i]], ...)
6: lapply(x, runlock, current_depth = current_depth + 1L)
5: FUN(X[[i]], ...)
4: lapply(x, runlock, current_depth = current_depth + 1L)
3: runlock(ans)
2: `[.data.table`(dt, , .(f2), by = id)
1: dt[, .(f2), by = id]

The runlock function:

  MAX_DEPTH = 5L
  runlock = function(x, current_depth = 1L) {
    if (is.recursive(x) && current_depth <= MAX_DEPTH) {
      if (inherits(x, 'data.table')) .Call(C_unlock, x)
      else return(lapply(x, runlock, current_depth = current_depth + 1L))
    }
    return(invisible())
  }
  runlock(ans)

Now this illustrates the root cause:

lapply(f, print)
#x
#$x
#
#
#[[2]]
#x
lapply(f2, print)
#Error in as.vector(x, "list") : 
 # cannot coerce type 'closure' to vector of type 'list'

Closures are recursive. I think you need a !is.function(x) in the if condition.

@ecoRoland2
Copy link

Or unclass(x)?

@jangorecki jangorecki added the non-atomic column e.g. list columns, S4 vector columns label Nov 18, 2020
@MichaelChirico
Copy link
Member

MichaelChirico commented Nov 19, 2020

MRE:

as.list(f)
# $x
#  [[2]]
# x
as.list(f2)
# Error in as.vector(x, "list") : 
#   cannot coerce type 'closure' to vector of type 'list'

This issue is that you've erased function from the classes of your object so S3 dispatch has failed:

class(f2) = c("x", "function")
dt <- data.table(id = 1, f, f2)
dt[, .(f2), by = id]
#       id     f2
#    <num> <list>
# 1:     1 <x[1]>

as.list.function was being applied to f while as.list.default was being applied to f2. as.list is being applied via lapply which is how we descend recursive objects in the j return looking for data.tables that need to be unlocked. Chalk this up as another case where re-writing runlock in C might make sense 🤔

Still I think it is low overhead for us to add !is.function(x) to runlock so I'll do that.

@Coorsaa
Copy link

Coorsaa commented Dec 1, 2020

e = environment()
class(e) = "foo"

dt = data.table(id = 1, funs = list(e))
dt[, list(funs), by = "id"]

throws a similar error:

Error in as.vector(x, "list") : 
  cannot coerce type 'environment' to vector of type 'list'

@ecoRoland2
Copy link

I think these issues can all be avoided by using a for loop instead of lapply.

@jangorecki
Copy link
Member

@Coorsaa could you please check if related PR fixed your use case?

@Coorsaa
Copy link

Coorsaa commented Dec 1, 2020

hi @jangorecki ,
I installed from the PR's underlying branch runlock-function which didn't work, so I don't think so.

@mattdowle
Copy link
Member

mattdowle commented Dec 1, 2020

Yes agree a loop instead of lapply would be better, especially at C level. For another day.
I added @Coorsaa's environment example, thanks, to the test, and fixed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-atomic column e.g. list columns, S4 vector columns
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants