Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accommodation of script-based imperative workflows #994

Closed
2 tasks done
wlandau opened this issue Aug 22, 2019 · 45 comments
Closed
2 tasks done

Accommodation of script-based imperative workflows #994

wlandau opened this issue Aug 22, 2019 · 45 comments

Comments

@wlandau
Copy link
Member

wlandau commented Aug 22, 2019

Prework

Idea

Suggested by @thebioengineer. May improve the migration of old projects to drake and contribute to ropensci-books/drake#41.

script_in("file.R") could be shorthand for source("file.R")$value at runtime. At code analysis time, script_in() could tell drake to analyze the code in file.R for dependencies in the usual way.

drake_plan(
  data = script_in("01-data.R"),
  munge = script_in("02-munge.R"),
  model = script_in("03-model.R"),
  results = script_in("04-results.R")
)

Concerns

This way of doing things goes against drake's function-oriented style, and it makes more room for suboptimal programming practices. Plus, users can already achieve script-based behavior like this:

drake_plan(
  data = source(file_in("01-data.R"))$value,
  munge = {
    data # mentioned as an explicit dependency
    source(file_in("02-munge.R"))$value
  },
  model = {
    munge
    source(file_in("03-model.R"))$value
  },
  results = {
    model
    source(file_in("04-results.R"))$value
  }
)

I am eager to discuss, and my mind could be changed. However, my current opinion is that we should not make script-based imperative workflows easier. I think we should keep nudging users to write functions.

@thebioengineer
Copy link
Contributor

I agree that promoting the usage of scripts rather than functions should try to be avoided. My thought was trying to allow for users to add pre-existing workflows they have into drake. In addition, using the source(file_in("my_script.R"))$value only allows for the last value that is generated to the captured.

If source_in were to be used and promoted the usage of the other "traditional" drake functions (ie file_in, file_out, loadd, readd) to find and document dependencies, I think it would be less friction. In addition, loud warnings and complaining about using external scripts could be added.

@wlandau
Copy link
Member Author

wlandau commented Aug 22, 2019

I agree that promoting the usage of scripts rather than functions should try to be avoided. My thought was trying to allow for users to add pre-existing workflows they have into drake. In addition, using the source(file_in("my_script.R"))$value only allows for the last value that is generated to the captured.

I totally agree.

If source_in were to be used and promoted the usage of the other "traditional" drake functions (ie file_in, file_out, loadd, readd) to find and document dependencies, I think it would be less friction. In addition, loud warnings and complaining about using external scripts could be added.

It seems odd to make things smoother with script_in()/source_in() only to make them rougher again with warnings, all for something we would rather avoid anyway.

If the issue is converting old imperative workflows to drake, maybe we should focus on the conversion itself.

library(drake)

parse_script <- function(file) {
  lines <- paste(readLines(file), sep = "\n")
  code <- parse(text = lines, keep.source = FALSE)
  out <- call("{")
  for (i in seq_along(code)) {
    out[[i + 1]] <- code[[i]]
  }
  out
}

script1 <- tempfile()
script2 <- tempfile()
writeLines(c("data <- my_data()", "munge(data)"), script1)
writeLines("analyze(munged)", script2)

plan <- drake_plan(
  munged = !!parse_script(script1),
  analysis = !!parse_script(script2)
)

drake_plan_source(plan)
#> drake_plan(
#>   munged = {
#>     data <- my_data()
#>     munge(data)
#>   },
#>   analysis = {
#>     analyze(munged)
#>   }
#> )

config <- drake_config(plan)
vis_drake_graph(config)

Created on 2019-08-22 by the reprex package (v0.3.0)

@wlandau wlandau reopened this Aug 22, 2019
@wlandau
Copy link
Member Author

wlandau commented Aug 22, 2019

All we have to do is add this new parse_script() function into drake. (Hopefully with a better name. Thoughts?)

I consider this approach far lighter than script_in().

@wlandau
Copy link
Member Author

wlandau commented Aug 22, 2019

To emphasize: drake_plan_source() completes the conversion. It gives you code to create the plan in a drake-approved way. Glad we already have that part.

@thebioengineer
Copy link
Contributor

thebioengineer commented Aug 22, 2019

I was thinking along the same lines? My working version is:

script_in<-function(path){
  str2expression(c("{",readLines(path),"}"))[[1]]
}

I was looking for ways to add in the file tracking that you have with file_in too?

@wlandau
Copy link
Member Author

wlandau commented Aug 23, 2019

Speed

This feature will probably not cause a bottleneck, but speed is still worth a look. It seems like the speed gains of str2expression() come from skipping the source ref info, which drake skips anyway in safe_parse(). So maybe str2expression() is not necessary in drake after all?

library(microbenchmark)
lines <- readLines(url("http://bioconductor.org/biocLite.R"))

# Used throughout drake's internals:
drake:::safe_parse("# 123")
#> expression()

microbenchmark(
  parse_src = parse(text = lines, keep.source = TRUE),
  parse = parse(text = lines, keep.source = FALSE),
  str = str2expression(lines),
  drake = drake:::safe_parse(lines)
)
#> Unit: microseconds
#>       expr     min      lq     mean   median       uq      max neval
#>  parse_src 862.956 907.467 969.0071 918.4455 992.3515 1484.220   100
#>      parse 568.086 596.110 622.2085 602.0705 629.5830  844.246   100
#>        str 567.813 596.388 626.6053 601.8305 646.4530  881.024   100
#>      drake 579.209 604.267 636.5746 612.3220 655.9230  845.641   100

Created on 2019-08-23 by the reprex package (v0.3.0)

Implementation

Your one-liner is way better than my parse_script() from #994 (comment). We have to deal with the file text anyway, so we might as well put the curly braces on there. My only suggestion is to use drake's safe_parse() instead of str2expression(). It skips the source ref, and it avoids taking [[1]] on objects of length 0.

script_in <- function(path){
  safe_parse(c("{", readLines(path), "}"))
}

More thoughts on names

This is the hardest part.

script_in()

On its own, this would be an excellent name. I just worry that it is too much like the existing file_in() and knitr_in(), which behave differently.

source_script()

You brought up a great point yesterday that from the user's perspective, the script will ultimately be sourced for all intents and purposes when the user calls make(). However, I still think it misrepresents what is actually happening when the function is called. There is a real separation between parsing and evaluation, and I hope we can reflect this in the name.

inline_script()

Inline functions are already an established concept in C/C++, and the Rcpp and inline have at least planted the seed in at least some parts of the R community. More importantly, I think it better reflects what the function is doing. On the other hand, the term "inline" is more esoteric than "source".

insert_script()

This one is my favorite. It evokes what is really going on, and I think people will know what we mean by insertion.

include_script()

I thought about it, but I don't like it that much. Inclusion does not necessarily mean we literally insert the code (e.g. # include <c++_header.h>).

file_in()-like tracking

The code will already be tracked in the plan, so I believe this will not be necessary.

@wlandau
Copy link
Member Author

wlandau commented Aug 23, 2019

A big problem with this whole approach is that we will end up with an enormous plan that users will ultimately need to refactor. How about script_to_function()? Details shortly.

@wlandau
Copy link
Member Author

wlandau commented Aug 23, 2019

library(drake)

script_to_function <- function(path) {
  lines <- readLines(path)
  lines <- c("function(...) {", lines, "}")
  text <- paste(lines, sep = "\n")
  drake:::safe_parse(text)
}

munge_script <- tempfile()
analysis_script <- tempfile()
writeLines(c("data <- my_data()", "munge(data)"), munge_script)
writeLines("analyze(munged)", analysis_script)

do_munging <- script_to_function(munge_script)
do_analysis <- script_to_function(analysis_script)

do_munging
#> function(...) {
#>     data <- my_data()
#>     munge(data)
#> }

do_analysis
#> function(...) {
#>     analyze(munged)
#> }

drake_plan(
  munged_value = do_munging(),
  analysis_value = do_analysis(munged_value)
)
#> # A tibble: 2 x 2
#>   target         command                  
#>   <chr>          <expr>                   
#> 1 munged_value   do_munging()             
#> 2 analysis_value do_analysis(munged_value)

Created on 2019-08-23 by the reprex package (v0.3.0)

@wlandau
Copy link
Member Author

wlandau commented Aug 23, 2019

Benefits

  • Shorter plans.
  • An extra nudge toward functions.
  • Users can learn how function call arguments control dependency relationships in the plan.
  • As before, users can leave their original code alone. If they don’t like drake, they have egress.
  • No need for tidy eval (!!).

Concerns

  • Arguments to those functions don’t do anything except connect targets together for drake. Not terrible, but a bit awkward. Can we get people closer to something more ideal for drake?

@brendanf
Copy link
Contributor

brendanf commented Aug 23, 2019

Arguments to those functions don’t do anything except connect targets together for drake.

It looks like your example wouldn't run, because the body of do_analysis calls analysis(munged) but the target name is munged_value. This is exactly the sort of error that I expect many people (myself included) would make if the functions produced by script_to_function accept only dummy arguments that don't actually pass data to the computation.

What about:

script_to_function <- function(path, args) {
  #need to check that args is an unnamed character vector of valid names
  lines <- readLines(path)
  lines <- c("function(", paste(args, sep = ", "), ") {", lines, "}")
  text <- paste(lines, sep = "\n")
  drake:::safe_parse(text)
}

?

@brendanf
Copy link
Contributor

Another issue that I expect would bite new users of this method is that file dependencies in the script would be untracked by drake unless the user adds a call to file_in.

@wlandau
Copy link
Member Author

wlandau commented Aug 23, 2019

In this scenario, the target names are unrelated to the preexisting scripts of a non-drake project. From what I have seen, these scripts usually save intermediate results to files, and those files establish the dependency relationships. The scripts do not use file_in() or file_out(), so drake cannot detect file-induced dependency relationships automatically. I believe ... solves this problem because it lets people set those relationships ad-hoc dummy arguments: not only file_in() and file_out() files, but also target names. It's a quick way to get a project working, and it's a quick way to learn how drake thinks about dependencies. That's the goal, and I do not think we need formal arguments to achieve it.

In other words, script_to_function() just does the simplest thing, and it is just a starting point. It is the hint I think users need to start refactoring their code. A new chapter in the manual will walk through the post-script_to_function() refactoring process, re ropensci-books/drake#41.

@wlandau
Copy link
Member Author

wlandau commented Aug 23, 2019

Just occurred to me: people use R Markdown for script-oriented workflows: 01_data.Rmd, 02_munge.Rmd, 03_analysis.Rmd, etc. So either

  1. We make script_to_function() understand R Markdown (my vote) or
  2. We need a new rmd_to_function().

code_to_plan() already does (1):

code_to_plan <- function(path) {
stopifnot(file.exists(path))
txt <- readLines(path)
# From CodeDepends: https://github.com/duncantl/CodeDepends/blob/7c9cf7eceffaea1d26fe25836c7a455f059e13c1/R/frags.R#L74 # nolint
# Checks if the file is a knitr report.
if (any(grepl("^(### chunk number|<<[^>]*>>=|```\\{r.*\\})", txt))) { # nolint
txt <- get_tangled_text(path)
}
nodes <- parse(text = txt)
out <- lapply(nodes, node_plan)
out <- do.call(rbind, out)
out <- parse_custom_plan_columns(out)
sanitize_plan(out)
}

where

drake/R/analyze_code.R

Lines 301 to 312 in 6fca3fd

# From https://github.com/duncantl/CodeDepends/blob/master/R/sweave.R#L15
get_tangled_text <- function(doc) {
assert_pkg("knitr")
id <- make.names(tempfile(), unique = FALSE, allow_ = TRUE)
con <- textConnection(id, "w", local = TRUE)
on.exit(close(con))
with_options(
new = list(knitr.purl.inline = TRUE),
code = knitr::knit(doc, output = con, tangle = TRUE, quiet = TRUE)
)
textConnectionValue(con)
}

If we go with (1), should we go with code_to_function() to be more consistent with code_to_plan()?

We could add an argument to get_tangled_frags() to suppress parsing ()

@thebioengineer
Copy link
Contributor

We might want to make code more similar to the code to plan. Are you thinking that the rmd would be able to be rendered still at each step too?

@wlandau
Copy link
Member Author

wlandau commented Aug 23, 2019

I was thinking we could just extract the code from the active chunks and stick it in a function. No rendering required. Sound good?

@wlandau
Copy link
Member Author

wlandau commented Aug 23, 2019

Another thing: we need to think about the return values of the functions. For example, if all the functions generated by code_to_function() return NULL (which could easily happen) then updates to downstream targets will not trigger. Quick-and-dirty catch-all solution: return a hash of the function's own body. Sketch:

code_to_function <- function(path) {
  lines <- readLines(path)
  knitr_pattern <- "^(### chunk number|<<[^>]*>>=|```\\{r.*\\})"
  if (any(grepl(knitr_pattern, lines))) {
    lines <- get_tangled_text(path) 
  }
  lines <- c(
    "function(...) {",
    lines,
    "standardize_function(sys.function())", # From drake. Calls deparse(), but this shouldn't be a bottleneck...
    "}"
  )
  text <- paste(lines, sep = "\n")
  eval(safe_parse(text))
}

@wlandau
Copy link
Member Author

wlandau commented Aug 23, 2019

So then user-side refactoring might not be simple. We really need to talk about the sophisticated dependency tracking you get when you start with functions. But at least we can slap drake onto an arbitrary project and get people through the door.

@thebioengineer
Copy link
Contributor

what do you think about adding code_to_function to the escaped functions from the tidy_eval? (like target and transform are now, this would allow users to use the function inline in the drake plan. Not sure if this goes against your thoery that we don't want to promote usage of this technique explicitly. But it would reduce friction for users so they don't need to remember !! if they use it directly in the plan.

@wlandau
Copy link
Member Author

wlandau commented Aug 24, 2019

Would you sketch what you are thinking? Not 100% sure I follow exactly.

I could be convinced otherwise, but my current (and strong) preference is to connect the concept of a script to the concept of a function that gets defined outside the plan, and then all the plan has to do is connect the predefined pieces together (#994 (comment)). This is when drake workflows are cleanest and most manageable.

@wlandau
Copy link
Member Author

wlandau commented Aug 24, 2019

Also cc @pat-s, re #193. Do you think this would help get script-loving users on board? To be honest, I have trouble relating to this personally because I have always preferred functions.

@thebioengineer
Copy link
Contributor

thebioengineer commented Aug 24, 2019

I see. My thought is this:

code_to_function <- function(path,...) {
  stopifnot(file.exists(path))
  path<-gsub("\\","\\\\",path,fixed=TRUE) # needed to allow for using path in a string to eval in windows systems

  dependson<- match.call(expand.dots = FALSE)$...

  expr<-paste0(c("{",
                 paste0("file_in(\"",path, "\")"),
                 dependson,
                 paste0("source_function<-eval(parse_source(\"",path,"\"))"),
                 "source_function()",
                 "}"),collapse = "\n")
  safe_parse(expr)
}

parse_source <- function(path) {
  lines <- readLines(path)
  knitr_pattern <- "^(### chunk number|<<[^>]*>>=|```\\{r.*\\})"
  if (any(grepl(knitr_pattern, lines))) {
    lines <- get_tangled_text(path)
  }
  lines <- c(
    "function(){",
    lines,
    "digest::digest(lines, algo = config$cache$hash_algorithm,serialize = FALSE)",
    "}"
  )
  text <- paste(lines, sep = "\n")
  safe_parse(text)
}

munge_script <- file.path(tempdir(),"01-munge.R")
analysis_script <- file.path(tempdir(),"02-analyze.R")
vis_script <- file.path(tempdir(),"03-plot.R")
writeLines(c("data <- my_data()", "munge(data)"), munge_script)
writeLines("analyze(munged)", analysis_script)
writeLines("plot(analyzed)", vis_script)

# currently we need the !! to allow the evaluation and return of the parsed script. My proposal is to remove that
plan<-drake_plan(
  munged_value = !!code_to_function(munge_script),
  analysis_value = !!code_to_function(analysis_script,munged_value),
  plot_value = !!code_to_function(vis_script,analysis_value)
)

drake_plan_source(plan)

plan
config <- drake_config(plan)
vis_drake_graph(config)
 
#next are theorhetical, but should function
make(plan)
vis_drake_graph(config)

writeLines(c("analyze(munged)", "newthing(data)"), analysis_script)

#analysis and vis scripts are now outdated 
vis_drake_graph(config)

#reruns analysis and vis scripts only
make(plan)
vis_drake_graph(config)

Is this totally against how you want to incorporate external R scripts? This was another shower thought that I wanted to explore the functionality.

In addition, this set up allows for triggering rebuilding itself if the file changes and updates will be captured. This function setup does not need to be done within the plan, but it was a thought

@pat-s
Copy link
Member

pat-s commented Aug 24, 2019

Also cc @pat-s, re #193. Do you think this would help get script-loving users on board? To be honest, I have trouble relating to this personally because I have always preferred functions.

TL;DR - also seems you're still in deep discussion.
Q: What is the difference between script_in() and code_to_plan()? Was not really obvious to me by reading the first few comments.
(HTH if I can.)

@wlandau
Copy link
Member Author

wlandau commented Aug 24, 2019

@thebioengineer, clever, but I would rather not go that route because it tracks scripts as files instead of totally relying on the parsed code in the R session. I think code_to_function() should be like source("R/functions.R"). The goal is to get people closer to using drake properly, and I also think we should keep the implementation simple.

@pat-s, I am curious if #994 (comment) + #994 (comment) + a walkthrough in the manual would have helped you better understand drake back when you posted #193.

@thebioengineer
Copy link
Contributor

@wlandau Gotcha. I was approaching it as if the user would want to just update their script and then run make(plan), without having to think about needing to re-generate the plan object. Thinking more, the file_in really didn't need to exist., it just added tracking for me.

The steps to add drake to a pre-existing workflow would be as follows then:

  1. make a 00-plan.R file.
  2. generate a function for each of the scripts/Rmd that need to be run, using code_to_function()
  3. generate the plan object using drake_plan, where each script dependency is linked to the prior script by using the output object (need to work on the wording here)
  4. to visualize the network of dependencies, generate a config object via drake_config(), and then plot it via vis_drake_graph()
  5. To update the plan to incorporate the update that happen to your script/Rmd, regenerate the scripts object by rerunning code_to_function(), and rerun make(plan) to have drake rerun all dependencies

Does that workflow jive with your mental model?

@ooo
Copy link

ooo bot commented Aug 25, 2019

👋 Hey @thebioengineer...

Letting you know, @wlandau is currently OOO until Thursday, September 12th 2019. ❤️

@wlandau
Copy link
Member Author

wlandau commented Aug 26, 2019

Yeah, that's basically it! Here's how I would explain it. Suppose we have a workflow with traditional scripts.

run_everything.R
R/
├─ 01_data.R
├─ 02_munge.R
└─ 03_analyze.R

where run_everything.R looks like this:

source("01_data.R")
source("02_munge.R")
source("03_analyze.R")

I propose we change run_everything.R to this:

library(drake)

do_data <- code_to_function("01_data.R")
do_munge <- code_to_function("02_munge.R")
do_analyze <- code_to_function("03_analyze.R")

plan <- drake_plan(
  data = do_data(),
  munged = do_munge(data),
  analysis = do_analyze(munged)
)

make(plan)

Take-home messages:

  1. Functions replace scripts.
  2. The return values are important.
  3. The commands in the plan reference the functions and targets.
  4. The symbols in the plan determine the runtime order of the targets.

This is what I am planning to flesh out for ropensci-books/drake#41.

@wlandau
Copy link
Member Author

wlandau commented Aug 26, 2019

The goal is to make the conceptional leap from imperative (script-oriented) workflows to function-oriented workflows. Does #994 (comment) accomplish this? I have trouble understanding why function-oriented workflows are so confusing to people, so I do not always know how to help.

@thebioengineer
Copy link
Contributor

Gotcha. I think most people are trained and typically approach workflows as a series of steps, and use functions as more of a "DRY" principal, rather than seeing each step as an opportunity to write a function. At least that has been my typical approach.

I think working in tandem with the manual, developing the idea of how to translate the workflow steps into functions is the key.

@mik3y64
Copy link

mik3y64 commented Aug 26, 2019

I think this is good #994 (comment). It's clean approach for users to try out drake.

Additionally, when users finally decide to use drake approach, code_to_function can be further extended, perhaps called write_code_to_function to convert and write out existing scripts in new folder (keeping the old scripts as backup). Many extra arguments/features can be implemented to help smooth transition, for example, allowing users to specify name of functions, otherwise default to existing name of scripts.

# existing script
analyze(munged)

write_code_to_function(path = 'analyze.R')

# new script
analyze <- function(...) {
  analyze(munged)
}

From there on, users can start using the drake function approach.

@wlandau
Copy link
Member Author

wlandau commented Aug 26, 2019

Yeah, let's see if there are other parts of the transition we can automate. write_code_to_function() certainly helps get the idea across, but duplicated code in the same workspace can be difficult to maintain, especially if users need to go back and forth between the drake and non-drake stuff. I will mull it over.

@pat-s
Copy link
Member

pat-s commented Aug 26, 2019

@pat-s, I am curious if #994 (comment) + #994 (comment) + a walkthrough in the manual would have helped you better understand drake back when you posted #193

#994 (comment) is certainly a nice start that makes things more clear.

From my side I would recommend to make the function-based approach from drake much more clear right from the start and link to the appropriate examples in the manual.
Maybe get them top-level with something like "are you a script-based person -> read here; are you a function-based person? -> read here".

Leaving beside how drake deals with script-based workflows in the end, users need to be briefed right from the start that their view on generating a workflow might need to change (or drake might even softly push them to change).

In my field (ecological modelling) almost all people come from a script-based workflow, simply because they lack the skill of writing functions (or just the experience).
When telling them about drake, the script-vs-function oriented question gets raised a lot ("what is is", "where is the advantage", etc.).

@thebioengineer
Copy link
Contributor

@pat-s Good points. Getting drake to "accept" script based workflows might not be the more difficult part, but more convincing people the value of workflow management. I am going to pour over the current manual to see how we can make the value add to script based workflows more apparent, and how to move to a script based flow!

@wlandau
Copy link
Member Author

wlandau commented Aug 29, 2019

I recommend chapter 5 ("drake projects") which explains how best to organize code into files for drake.

@thebioengineer, are you saying you want traditional imperative script-based workflows to be a final solution/destination for drake use? Do you think we can do that in a way that stays true to drake's core values?

@thebioengineer
Copy link
Contributor

thebioengineer commented Aug 29, 2019

My comment was based on @pat-s 's comment that a number of people in his field have difficulty seeing the value in function-based workflows. Making drake to handle, but have a little friction like we do now, might be an opportunity to then espouse the value of converting to more of a function based workflow.

In the manual we could provide some advice on how to perform the conversion, and what exactly the value add is. Not just from it is easier for drake, but it is easier on the person maintaining the workflow to have discrete functions to perform specific tasks.

@wlandau
Copy link
Member Author

wlandau commented Aug 29, 2019

Ah, got it, thanks for clarifying. I agree that we should argue for a function-based approach in and of itself, drake or no drake.

@wlandau wlandau mentioned this issue Aug 30, 2019
3 tasks
@wlandau
Copy link
Member Author

wlandau commented Sep 11, 2019

@thebioengineer, do you still want to submit a PR with code_to_function() + docs, or should I? I realize you may have been waiting for me to return to the office.

@thebioengineer
Copy link
Contributor

hey @wlandau, yes, sorry this has been in my queue to submit. I will do it tonight!

I have been thinking on how to best incorporate drake into my existing script-based workflows to be able to speak more from experience on how to use and purpose of this function in the manual. Hopefully what I write ends up coherent :)

@ooo
Copy link

ooo bot commented Sep 11, 2019

👋 Hey @thebioengineer...

Letting you know, @wlandau is currently OOO until Thursday, September 12th 2019. ❤️

@thebioengineer thebioengineer mentioned this issue Sep 12, 2019
4 tasks
@wlandau
Copy link
Member Author

wlandau commented Sep 13, 2019

Re #1007 (comment), I propose a different code_to_function() that returns a timestamp and tempfile instead of a hash.

code_to_function <- function(path) {
  lines <- readLines(path)
  knitr_pattern <- "^(### chunk number|<<[^>]*>>=|```\\{r.*\\})"
  if (any(grepl(knitr_pattern, lines))) {
    lines <- get_tangled_text(path)
  }
  lines <- c(
    "function(...) {",
    lines,
    "c(format(Sys.time(), \"%Y-%m-%d %H:%M:%OS9 %z GMT\"), tempfile())",
    "}"
  )
  text <- paste(lines, sep = "\n")
  eval(parse(text = text))
}

@wlandau wlandau changed the title script_in()? Accommodation of script-based imperative workflows Sep 13, 2019
@thebioengineer
Copy link
Contributor

thebioengineer commented Sep 14, 2019

I like that solution, as you said over in #1007, it should then act like a basic make system. I was flying yesterday, so I had come up with this solution: before I saw your response:

code_to_function <- function(path,...) {
  lines <- readLines(path)
  knitr_pattern <- "^(### chunk number|<<[^>]*>>=|```\\{r.*\\})"
  if (any(grepl(knitr_pattern, lines))) {
    lines <- get_tangled_text(path)
  }
  lines <- c(
    "function(...){",
    # This may look a little funky, but this is grabbing the resulting environment that generated the output
    "script<-function(){",lines,"sys.frame(sys.nframe())","}",
    "evalenv<-script()",
    # Then I take a digest of the objects found in the script to identify if things changed
    "digest::digest(as.list(evalenv),algo = \"xxhash64\")",
    "}"
  )
  text <- paste(lines, sep = "\n")

  eval(safe_parse(text))
}

It evaluates the environment that gets created when the script would be run and returns that. If the environment changes, the output would then change and trigger downstream builds.

@wlandau
Copy link
Member Author

wlandau commented Sep 14, 2019

Interesting. Unfortunately, though, I think #994 (comment) is likely to get us into trouble.

We are bypassing drake's elaborate tracking system here, so it is safer to always invalidate downstream targets whenever something gets run. Timestamps are good at accomplishing this, and tempfile() makes up for the platform-dependent imprecision we often find in timestamps (see #4).

@thebioengineer
Copy link
Contributor

I see. That makes sense, I had forgot about the cases where raw pointers can change. Like I had said, I had come up with that solution before I had seen yours and wanted to propose it.

I will incorporate the proposed solution and add additional tests!

@thebioengineer
Copy link
Contributor

thebioengineer commented Sep 23, 2019

@wlandau, I have added a test that checks what I think was checking the idea. I have resolved a number of the lintr comments and added tests for using RMD as scripts as well.

lintr is complaining about the line of code that Identifies if the input is an RMD so it can execute get_tangled_text() on the input. - do you have any suggestions for how to handle that?

I resolved this by adding #nolint to the end of the line. Based on the "Project Configuration" section in lintr

@wlandau
Copy link
Member Author

wlandau commented Sep 23, 2019

Awesome! I will look at your tests. #nolint is totally fine for that long grep pattern.

@wlandau
Copy link
Member Author

wlandau commented Sep 30, 2019

Thank you @thebioengineer for implementing this in #1007!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants