Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input error with sapply of anonymous function #155

Closed
lbenz-mdsol opened this issue Apr 8, 2020 · 5 comments · Fixed by #163
Closed

Input error with sapply of anonymous function #155

lbenz-mdsol opened this issue Apr 8, 2020 · 5 comments · Fixed by #163
Labels
reprex needs a minimal reproducible example

Comments

@lbenz-mdsol
Copy link

I have been running into issues when using dtplyr on mutating columns with custom functions, sometimes named and sometimes anonymous. The error I get is usually a non-descriptive Error: Invalid input . I've created one toy example where I get this error using an anonymous function but simply naming the function makes the error go away. I'm not sure that whether the error is necessarily related to anonymous functions because I

  • have gotten this error when calling sapply() on customed, named functions
  • don't always get this error when calling sapply() on anonymous functions

Any thoughts?

library(data.table)
library(dplyr)
library(dtplyr)

set.seed(100)
df <- data.frame("x" = rnorm(100),
                 "y" = rnorm(100))

dt <- lazy_dt(df)

### This doesn't work for dt
df %>%
  mutate("filtered_mean" = sapply(x, function(x, y) {mean(y[y <= abs(x)])},  y)) %>%
  head()

            x          y filtered_mean
1 -0.50219235 -0.3329234    -0.3115360
2  0.13153117  1.3631137    -0.5083479
3 -0.07891709 -0.4691473    -0.5310105
4  0.88678481  0.8428756    -0.1957230
5  0.11697127 -1.4579937    -0.5310105
6  0.31863009 -0.4003059    -0.3851345

dt %>%
  mutate("filtered_mean" = sapply(x, function(x, y) {mean(y[y <= abs(x)])},  y)) %>%
  as_tibble() %>%
  head()

Error: Invalid input

### But this works for dt
my_function <- function(x, y) {
  return(mean(y[y <= abs(x)]))
}

df %>%
  mutate("filtered_mean" = sapply(x, my_function, y)) %>%
  head()

            x          y filtered_mean
1 -0.50219235 -0.3329234    -0.3115360
2  0.13153117  1.3631137    -0.5083479
3 -0.07891709 -0.4691473    -0.5310105
4  0.88678481  0.8428756    -0.1957230
5  0.11697127 -1.4579937    -0.5310105
6  0.31863009 -0.4003059    -0.3851345

dt %>%
  mutate("filtered_mean" = sapply(x, my_function, y)) %>%
  as_tibble() %>%
  head()

# A tibble: 6 x 3
        x      y filtered_mean
    <dbl>  <dbl>         <dbl>
1 -0.502  -0.333        -0.312
2  0.132   1.36         -0.508
3 -0.0789 -0.469        -0.531
4  0.887   0.843        -0.196
5  0.117  -1.46         -0.531
6  0.319  -0.400        -0.385
@hadley
Copy link
Member

hadley commented Apr 21, 2020

Could you please rework your reproducible example to use the reprex package ? That makes it easier to see both the input and the output, formatted in such a way that I can easily re-run in a local session.

@hadley hadley added the reprex needs a minimal reproducible example label Apr 21, 2020
@lbenz-mdsol
Copy link
Author

library(data.table)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:data.table':
#> 
#>     between, first, last
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(dtplyr)

set.seed(100)
df <- data.frame("x" = rnorm(100),
                 "y" = rnorm(100))

dt <- lazy_dt(df)

### This doesn't work for dt
df %>%
  mutate("filtered_mean" = sapply(x, function(x, y) {mean(y[y <= abs(x)])},  y)) %>%
  head()
#>             x          y filtered_mean
#> 1 -0.50219235 -0.3329234    -0.3115360
#> 2  0.13153117  1.3631137    -0.5083479
#> 3 -0.07891709 -0.4691473    -0.5310105
#> 4  0.88678481  0.8428756    -0.1957230
#> 5  0.11697127 -1.4579937    -0.5310105
#> 6  0.31863009 -0.4003059    -0.3851345

dt %>%
  mutate("filtered_mean" = sapply(x, function(x, y) {mean(y[y <= abs(x)])},  y)) %>%
  as_tibble() %>%
  head()
#> Error: Invalid input


### But this works for dt
my_function <- function(x, y) {
  return(mean(y[y <= abs(x)]))
}

df %>%
  mutate("filtered_mean" = sapply(x, my_function, y)) %>%
  head()
#>             x          y filtered_mean
#> 1 -0.50219235 -0.3329234    -0.3115360
#> 2  0.13153117  1.3631137    -0.5083479
#> 3 -0.07891709 -0.4691473    -0.5310105
#> 4  0.88678481  0.8428756    -0.1957230
#> 5  0.11697127 -1.4579937    -0.5310105
#> 6  0.31863009 -0.4003059    -0.3851345

dt %>%
  mutate("filtered_mean" = sapply(x, my_function, y)) %>%
  as_tibble() %>%
  head()
#> # A tibble: 6 x 3
#>         x      y filtered_mean
#>     <dbl>  <dbl>         <dbl>
#> 1 -0.502  -0.333        -0.312
#> 2  0.132   1.36         -0.508
#> 3 -0.0789 -0.469        -0.531
#> 4  0.887   0.843        -0.196
#> 5  0.117  -1.46         -0.531
#> 6  0.319  -0.400        -0.385

Created on 2020-04-21 by the reprex package (v0.3.0)

@smingerson
Copy link
Contributor

@lbenz-mdsol, can you give reprexes of these cases:

  • have gotten this error when calling sapply() on customed, named functions
  • don't always get this error when calling sapply() on anonymous functions

I have a patch that fixes your original example, but would like to address the other situations you mention.

@lbenz-mdsol
Copy link
Author

@smingerson Sorry can't seem to come up with examples of what I had meant there.

@hadley
Copy link
Member

hadley commented Jan 26, 2021

Minimal reprex:

library(dtplyr)
library(dplyr, warn.conflicts = FALSE)
df <- data.frame(x = 1:5)
dt <- lazy_dt(df)

dt %>%
  mutate(f = sapply(x, function(x) x + runif(1))) %>%
  collect()
#> Error: Invalid input

Created on 2021-01-26 by the reprex package (v0.3.0.9001)

hadley pushed a commit that referenced this issue Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reprex needs a minimal reproducible example
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants