Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

magrittr dot (.) always shows as a missing import #320

Closed
AlexAxthelm opened this issue Mar 13, 2018 · 6 comments
Closed

magrittr dot (.) always shows as a missing import #320

AlexAxthelm opened this issue Mar 13, 2018 · 6 comments

Comments

@AlexAxthelm
Copy link
Collaborator

whenever I use magrittr pipe %>% to chain functions in a command, if I explicitly use the dot argument, either for clarity or because it is necessary, drake always shows . as a missing import.

Exampl;e commands where this has happened:

raw_data = read_csv("data.csv") %>% mutate(.data = ., foo = bar + baz)
complete_data = raw_data %>% filter(complete.cases(.))
model = complete_data %>% lm("x ~ y", data = .)

suggested action

Since drake imports the pipe anyway, it should be on the lookout for the dot argument, and should recognize that it is a fictitious target. Any command which uses this will already have the real dependency already, so we can safely ignore the dot.

@wlandau
Copy link
Member

wlandau commented Mar 13, 2018

Good point. What we really need is the right code analysis to separate the tidyverse dot from a possible user-defined dot. Fortunately, drake uses CodeDepends, which already excludes NSE variables.

library(CodeDepends)
x <- getInputs(quote({
  library(ggplot2)
  ggplot(mtcars) + geom_point(aes(x = wt, y = mpg))
}))
x@inputs
#> [1] "mtcars"
x@nsevalVars
#> [1] "wt"  "mpg"

Since you brought up this issue, I assumed CodeDepends was not catching the dot, but apparently it is.

library(CodeDepends)
x <- getInputs(quote({
  read_csv("data.csv") %>% mutate(.data = ., foo = bar + baz)
}))
x@inputs
#> character(0)
x@nsevalVars
#> [1] "."   "bar" "baz"

And so is drake.

library(drake)
drake:::code_dependencies(quote(
  read_csv("data.csv") %>% mutate(.data = ., foo = bar + baz)
))
#> $globals
#> [1] "read_csv" "mutate"

So I am surprised this is a problem. Are you using the latest commit? Could you post a full reprex?

@AlexAxthelm
Copy link
Collaborator Author

Coincidentally, I reinstalled both drake and CodeDepends today (related to #268)

library("drake")
#> Warning: replacing previous import 'graph::addNode' by 'XML::addNode' when
#> loading 'CodeDepends'
#> Warning: replacing previous import 'graph::plot' by 'graphics::plot' when
#> loading 'CodeDepends'
library("dplyr")
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:drake':
#> 
#>     contains, ends_with, everything, matches, num_range, one_of,
#>     starts_with
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
setwd(tempdir())
installed.packages()[c("drake", "CodeDepends"), "Version"]
#>        drake  CodeDepends 
#> "5.0.1.9002"    "0.5-5.2"
keep_complete = function(.data){.data %>% filter(complete.cases(.))}
myplan <- drake_plan(strings_in_dots = "literals",
  data_seq = seq(1, 10) %>%
    c(NA, .) %>%
    sample(x = ., replace = TRUE, size = 50),
  raw_data = data_seq %>%
    data.frame(x = ., y = rev(.)),
  complete_data = raw_data %>%
    filter(complete.cases(.)),
  model = complete_data %>%
    lm(formula = "x ~ y", data = .)
) %>%
  print()
#> # A tibble: 4 x 2
#>   target        command                                                   
#>   <chr>         <chr>                                                     
#> 1 data_seq      seq(1, 10) %>% c(NA, .) %>% sample(x = ., replace = TRUE,~
#> 2 raw_data      data_seq %>% data.frame(x = ., y = rev(.))                
#> 3 complete_data raw_data %>% filter(complete.cases(.))                    
#> 4 model         "complete_data %>% lm(formula = \"x ~ y\", data = .)"
#vis_drake_graph(drake_config(myplan))
make(myplan)
#> All targets are already up to date.
Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.3 (2017-11-30)
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  tz       America/New_York            
#>  date     2018-03-13
#> Packages -----------------------------------------------------------------
#>  package      * version    date      
#>  assertthat     0.2.0      2017-04-11
#>  backports      1.1.2      2017-12-13
#>  base         * 3.4.3      2017-12-06
#>  bindr          0.1.1      2018-03-13
#>  bindrcpp       0.2        2017-06-17
#>  BiocGenerics   0.24.0     2017-10-31
#>  cli            1.0.0      2017-11-05
#>  CodeDepends    0.5-5.2    2018-03-13
#>  codetools      0.2-15     2016-10-05
#>  compiler       3.4.3      2017-12-06
#>  crayon         1.3.4      2017-09-16
#>  datasets     * 3.4.3      2017-12-06
#>  devtools       1.13.5     2018-02-18
#>  digest         0.6.15     2018-01-28
#>  dplyr        * 0.7.4      2017-09-28
#>  drake        * 5.0.1.9002 2018-03-13
#>  evaluate       0.10.1     2017-06-24
#>  formatR        1.5        2017-04-25
#>  future         1.7.0      2018-02-11
#>  future.apply   0.1.0      2018-01-15
#>  globals        0.11.0     2018-01-10
#>  glue           1.2.0      2017-10-29
#>  graph          1.56.0     2017-10-31
#>  graphics     * 3.4.3      2017-12-06
#>  grDevices    * 3.4.3      2017-12-06
#>  htmltools      0.3.6      2017-04-28
#>  htmlwidgets    1.0        2018-01-20
#>  igraph         1.2.1      2018-03-10
#>  jsonlite       1.5        2017-06-01
#>  knitr          1.20       2018-02-20
#>  listenv        0.7.0      2018-01-21
#>  lubridate      1.7.3      2018-02-27
#>  magrittr       1.5        2014-11-22
#>  memoise        1.1.0      2017-04-21
#>  methods      * 3.4.3      2017-12-06
#>  parallel       3.4.3      2017-12-06
#>  pillar         1.2.1      2018-02-27
#>  pkgconfig      2.0.1      2017-03-21
#>  plyr           1.8.4      2016-06-08
#>  purrr          0.2.4      2017-10-18
#>  R.methodsS3    1.7.1      2016-02-16
#>  R.oo           1.21.0     2016-11-01
#>  R.utils        2.6.0      2017-11-05
#>  R6             2.2.2      2017-06-17
#>  Rcpp           0.12.15    2018-01-20
#>  rlang          0.2.0      2018-02-20
#>  rmarkdown      1.9        2018-03-01
#>  rprojroot      1.3-2      2018-01-03
#>  stats        * 3.4.3      2017-12-06
#>  stats4         3.4.3      2017-12-06
#>  storr          1.1.3      2017-12-15
#>  stringi        1.1.7      2018-03-12
#>  stringr        1.3.0      2018-02-19
#>  testthat       2.0.0      2017-12-13
#>  tibble         1.4.2      2018-01-22
#>  tidyselect     0.2.4      2018-02-26
#>  tools          3.4.3      2017-12-06
#>  utf8           1.1.3      2018-01-03
#>  utils        * 3.4.3      2017-12-06
#>  visNetwork     2.0.3      2018-01-09
#>  withr          2.1.1      2017-12-19
#>  XML            3.98-1.10  2018-02-19
#>  yaml           2.1.18     2018-03-08
#>  source                               
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  local                                
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  Bioconductor                         
#>  CRAN (R 3.4.3)                       
#>  Github (duncantl/CodeDepends@fe50582)
#>  CRAN (R 3.4.3)                       
#>  local                                
#>  CRAN (R 3.4.3)                       
#>  local                                
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  Github (ropensci/drake@f438cfa)      
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  Bioconductor                         
#>  local                                
#>  local                                
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  local                                
#>  local                                
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.1)                       
#>  CRAN (R 3.4.1)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  local                                
#>  local                                
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  local                                
#>  CRAN (R 3.4.3)                       
#>  local                                
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)                       
#>  CRAN (R 3.4.3)

image

@AlexAxthelm
Copy link
Collaborator Author

It's worth noting that the missing dot doesn't prevent a successful make, but it does make me take a look at the purple import when it's there, to make sure it's not something more important.

@wlandau
Copy link
Member

wlandau commented Mar 13, 2018

Hmm... I think I can trace this back to CodeDepends after all. I think CodeDepends is the right place for a patch. It seems like there may still be a little friction with filter(). cc @gmbecker.

CodeDepends::getInputs(quote(
  read_csv("data.csv") %>% mutate(.data = ., foo = bar + baz)
))@inputs
#> Warning: replacing previous import 'graph::addNode' by 'XML::addNode' when
#> loading 'CodeDepends'
#> Warning: replacing previous import 'graph::plot' by 'graphics::plot' when
#> loading 'CodeDepends'
#> character(0)

CodeDepends::getInputs(quote(
  raw_data %>% select(., starts_with("xyz"))
))@inputs
#> [1] "raw_data"

CodeDepends::getInputs(quote(
  raw_data %>% filter(.)
))@inputs
#> [1] "."        "raw_data"

CodeDepends::getInputs(quote(
  raw_data %>% filter(complete.cases(.))
))@inputs
#> [1] "."        "raw_data"

@wlandau
Copy link
Member

wlandau commented Mar 14, 2018

Reopening, re: duncantl/CodeDepends#26 (comment). I just learned that the detection of the dot as an NSE variable can depend on whether dplyr is loaded, so targets could become invalidated unpredictably. Since the dot is almost never a true dependency, I think we can forcibly remove it from the dependency graph and warn people in the caution vignette. Any objections?

@wlandau wlandau added this to the CRAN release 5.1.0 milestone Mar 14, 2018
@wlandau wlandau reopened this Mar 14, 2018
@wlandau
Copy link
Member

wlandau commented Mar 14, 2018

Correction: detection does not depend on whether dplyr is loaded, but on the heuristic that draws on a whole script for context. Thanks for clarifying, Gabe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants