Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pmap, anonymous functions, and same-named variables #280

Closed
r2evans opened this issue Jan 6, 2017 · 4 comments
Closed

pmap, anonymous functions, and same-named variables #280

r2evans opened this issue Jan 6, 2017 · 4 comments

Comments

@r2evans
Copy link

r2evans commented Jan 6, 2017

Using same-named variables in a encompassing data.frame appears to be problematic. As a contrived example:

library(dplyr)
library(purrr)

data_frame(cyl = c(4, 6, 8), mpg = c(30, 25, 20)) %>%
  mutate(x = pmap(.l = list(cyl, mpg),
                  function(cc, mm) filter(mtcars, cyl == cc, mpg < mm)))
# Error: incorrect length (3), expecting: 32

This can be remedied by using a named function:

f <- function(cc, mm) filter(mtcars, cyl == cc, mpg < mm)

data_frame(cyl = c(4, 6, 8), mpg = c(30, 25, 20)) %>%
  mutate(x = pmap(.l = list(cyl, mpg), f))
# # A tibble: 3 × 3
#     cyl   mpg                      x
#   <dbl> <dbl>                 <list>
# 1     4    30  <data.frame [7 × 11]>
# 2     6    25  <data.frame [7 × 11]>
# 3     8    20 <data.frame [14 × 11]>

Please discard the stylistic preference of using better better variable naming conventions to avoid collisions (there are times when I am using the same original data.frame on the mid-pmap filter, so the names will be identical).

I know it gets complex, but is there a better way to understand how the internal pipe (filter(mtcars, ...)) is confusing the variable names of its data.frame with the names of the encompassing environment? I thought that scope would bias finding the variables more-closely stored.

@jennybc
Copy link
Member

jennybc commented Jan 6, 2017

Your first example works for me. I think there are some improvements to NSE in the development version of dplyr that come into play here. As a sidebar, it seems like a good idea to use names in your on-the-fly list construction and not rely on resolution by position. That's kind of a tangent, but I did so below and switched argument order, just for kicks.

library(dplyr)
library(purrr)

data_frame(cyl = c(4, 6, 8), mpg = c(30, 25, 20)) %>%
  mutate(x = pmap(.l = list(cyl, mpg),
                  function(cc, mm) filter(mtcars, cyl == cc, mpg < mm)))
#> # A tibble: 3 × 3
#>     cyl   mpg                      x
#>   <dbl> <dbl>                 <list>
#> 1     4    30  <data.frame [7 × 11]>
#> 2     6    25  <data.frame [7 × 11]>
#> 3     8    20 <data.frame [14 × 11]>

data_frame(cyl = c(4, 6, 8), mpg = c(30, 25, 20)) %>%
  mutate(x = pmap(.l = list(cc = cyl, mm = mpg),
                  function(mm, cc) filter(mtcars, cyl == cc, mpg < mm)))
#> # A tibble: 3 × 3
#>     cyl   mpg                      x
#>   <dbl> <dbl>                 <list>
#> 1     4    30  <data.frame [7 × 11]>
#> 2     6    25  <data.frame [7 × 11]>
#> 3     8    20 <data.frame [14 × 11]>

@r2evans
Copy link
Author

r2evans commented Jan 6, 2017

Thanks for your response. Based on your comment, I rebased my local fork of dplyr on the current upstream master, and that seems to have resolved the issue. (I need to keep a fork until they merge tidyverse/dplyr#2289). NSE is such a devil sometimes!

As to your sidebar, please help me to understand: looking briefly at the code (purrr/src/pmap.c), it appears that it assigns arguments positionally, not by-name. The list names may technically be visible in the scope (not verified), but since they don't completely mask the names of the parent data.frame/environment, I don't see how that would ameliorate the name issue.

Regardless, this problem is resolved, thanks!

@r2evans r2evans closed this as completed Jan 6, 2017
@jennybc
Copy link
Member

jennybc commented Jan 6, 2017

I'm no expert at the C side of purrr, but I believe here is where list names, if they exist, are applied to the .f(.l[[i,1]], .l[[i,2]], ...) calls:

https://github.com/hadley/purrr/blob/a7afcca128ab9e44bb544086b1465d44c2931fb1/src/map.c#L164-L165

@r2evans
Copy link
Author

r2evans commented Jan 6, 2017

Gotcha, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants