Meta-make #304

krlmlr · 2018-03-06T23:54:02Z

I'd like to propose three very simple verbs that might get us halfway towards the goal of a DSL (#233). This proposal addresses the low-level technical part, which I think is required for any DSL.

The proposal is too long for an issue (and issues aren't version-controlled), I've added it to the wiki: https://github.com/ropensci/drake/wiki/Meta-make. Perhaps a 5 to 10 minute read.

I haven't closely followed previous related discussions, there might be overlap I'm not aware of. In particular, #79 might be a similar approach. Sharing this early for feedback and discussion, CC @AlexAxthelm @dapperjapper @rkrug @kendonB.

rkrug · 2018-03-07T07:46:11Z

I like the idea of having a wiki page to discuss possible verbs.

I am not to clear about your verbs - could you please add the tibble / plan you would expect at the end?

krlmlr · 2018-03-07T08:11:56Z

Thanks for your feedback. I updated the wiki and pinged you in the commit message, would be interested to learn if you received a notification. Please let me know if things are clearer now.

rkrug · 2018-03-07T08:47:04Z

Thanks. Makes things much easier.

The pack() and unpack() sound extremely useful.

Looking at meta_plan(), it would address one question I raised at the workshop: the iterative plans. But now there might be one important issue coming in: the uniqueness of target names. Unique to cache, directory, or plan?

meta_plan() would be extremely powerful, but also very error prone due to the uniqueness of the target names. Don't get me wrong - I would love a feature like that, but there needs to be some mechanism to deal with the same target names in the meta_plan - maybe adding a suffix to the target names of the meta-plans?

rkrug · 2018-03-07T08:48:36Z

In the classical make, the target name has to be unique to the makefile - and I think drake should really aim at achieving the same.

krlmlr · 2018-03-07T08:49:14Z

Done adding reprexes. Switching to rmarkdown would have been smarter ;-)

I'd suggest that drake just refuses to run plans with duplicate names, just the way it works currently. That feels like the safest option.

Iterative plans will be covered by the DSL. This is just about low-level tooling to make the DSL possible.

rkrug · 2018-03-07T08:59:31Z

@krlmlr OK - so meta_plan() does not know anything about the main plan - makes perfect sense.

But than the creation of the plan should include the check, not the make(). If make() gets a plan with duplicate target names, it should refuse to run it - that is obvious. B

But the error message should come during creation of the plan in the drake_plan() step. It is a helper function anyway, to populate a simple tibble / data.frame.

krlmlr · 2018-03-07T09:33:16Z

Yes, meta_plan() will usually receive a tracked target, or something computed from tracked targets, as argument.

wlandau-lilly · 2018-03-07T13:41:27Z

Thanks so much, Kirill! I read your proposal, but it will take me more time to come up with feedback.

AlexAxthelm · 2018-03-07T15:00:35Z

I'm really liking the modularity of the design here, and I want to see if I'm understanding correctly. If I have a problem along the lines of "take a big data set, split by an unknown set of factors, and run a arbitrart code on each" (something for which I would use dplyr::do), woudl that be something along the lines of:

drake_plan(
  data = generic_data_importing_function(file_in("data.csv")),
  factor_levels = unique(pull(data, "factor_column") ),
  meta_plan = drake_plan(
    this_factor = unpack(factor_levels),
    data_subset = filter(data, factor_column == this_factor),
    analysis = analysis_function(data_subset)
  ),
  results = meta_make(meta_plan)
)

or would this require some sort of 4th verb (along the lines of a drake_foreach)?

krlmlr · 2018-03-07T16:19:19Z

Thanks. I don't intend to change the semantics of drake_plan() for this proposal. Your analysis might work with the following plan:

drake_plan(
  data = generic_data_importing_function(file_in("data.csv")),
  splits = data %>% nest(-factor_column) %>% deframe(),
  meta_plan = tibble(
    target = paste0("analyzed_", names(splits)),
    command = paste0("analysis_function(splits[[\"", names(splits), "\"]])")
  ),
  results = meta_make(meta_plan)
)

This will create targets of the form analyzed_xxx in the meta_plan, where xxx come from the factor levels. If you'd like to test the plan on a real example (iris?), you can double-check the embedded meta_plan. Does this make sense?

The targets in meta_plan will only be available as components of results, they are only brought into the main plan with unpack(), if necessary.

The command to create meta_plan is utterly awkward, that's where a DSL might help.

wlandau-lilly · 2018-03-08T17:06:25Z

Should make() just be "meta" natively? I have played around with

plan <- drake::drake_plan(
  x = 1,
  y = 2,
  meta_plan = drake_plan(
    a = x,
    b = y
  ),
  results = drake::make(meta_plan)
)
make(plan)

It fails because target a could not find its dependency x. Solving this problem might be simpler and cleaner than implementing new meta_make(), especially since meta_make() may need to deal with self-reference anyway.

plan1 <- drake_plan(
  x1 = 1,
  results1 = make(plan1),
  plan1 = drake_plan(
    x2 = x1,
    results2 = make(plan2),
    plan2 = drake_plan(
      x3 = x2,
      results3 = make(plan3),
      plan3 = drake_plan(
        ...
      )
    )
  )
)

In any case, I do like where this is headed. The "meta" piece will make the hardest parts of the DSL possible.

wlandau-lilly · 2018-03-08T17:14:58Z

I think pack() is similar to gather_plan(). pack() and unpack() are great choices of words, but I also wonder if the standard gather() and spread() would be more recognizable.

krlmlr · 2018-03-08T17:28:23Z

Thanks. I'd rather give an informative error message if make() is called from within a plan. We can see later if we can let make() converge towards meta_make(), or the other way round.

meta_make() is declarative only, it returns a data structure that contains the new plan but doesn't execute anything. Therefore, nesting shouldn't be a problem. There should be only one scheduler, even when some plans are constructed dynamically as part of the execution of the main plan.

The way spread() and gather() are defined in tidyr seems to be fairly different from the intended semantics of pack() and unpack(). I'm entirely open to other names.

wlandau · 2018-03-08T17:42:55Z

meta_make() is declarative only, it returns a data structure that contains the new plan but doesn't execute anything. Therefore, nesting shouldn't be a problem. There should be only one scheduler, even when some plans are constructed dynamically as part of the execution of the main plan.

Ah, so make() and meta_make() are fundamentally different anyway.

The way spread() and gather() are defined in tidyr seems to be fairly different from the intended semantics of pack() and unpack(). I'm entirely open to other names.

Sure, that's fine. If spread() and gather() are not the right words, I think pack() and unpack() are great.

violetcereza · 2018-03-08T17:44:33Z

I think it would be nice to unify the making interface under one function make, but I agree with krlmlr that they serve different purposes for now...

I'm confused about the return value of meta_make(). What does the data structure look like? When do the meta-targets (?) get made? And how might one loadd a meta-target?

Maybe I need some clarity around what declarative means and how this affects working with drake targets both within plans and outside of them.

Going off of your analogy in the wiki, meta_make targets return a list of made targets when loadded. Thus you would access meta-targets like drake::readd(results)$a. However, with the targets I use, it would use wayyy too much memory to load every target within a meta_make just to access one of them.

krlmlr · 2018-03-08T18:47:31Z

@dapperjapper: Thanks. I've added an implementation sketch to the wiki (https://github.com/ropensci/drake/wiki/Meta-make#implementation-sketch), and also tweaked a bit -- see differences. Does this help answer your questions?

krlmlr · 2018-03-08T18:55:40Z

If we follow the implementation sketch, it feels easiest to start with pack(), because meta_make() will apply very similar ideas (or maybe even call a version of pack() under the hood), and unpack() alone doesn't feel that useful without meta_make().

rkrug · 2018-03-09T07:47:14Z

@krlmlr I am confused a little bit now - if meta_make() does not execute anything and returns a plan, what is than the difference to drake_plan()? If there is a difference (I assume the execution in an isolated environment), why not simply add an argument subplan = TRUE to drake_plan() which is FALSE by default and can be set when used in this context here?

krlmlr · 2018-03-09T07:50:21Z

@rkrug: Great question. I think we want to give users a way to create "sub-plans" from home-grown tibbles created outside of drake_plan(), as in

meta_make(tibble(
  target = ...
  command = ...
))

rkrug · 2018-03-09T08:06:58Z

So meta_make() would take a tibble (which might be homegrown or created by drake_plan() and turns it into a sub-plan - correct? But in then why is it called _make, as it is doing the same as drake_plan() - wouldn't drake_subplan() than be the better word?

Actually - do we really need this? Isnt't it possible to simply use drake_plan() inside drake_plan() which than automatically creates a sub-plan? I mean a n argument subplan could be set automatically when the function is called iteratively?

krlmlr · 2018-03-09T08:17:06Z

I want to implement this in a transparent way, using as little magic as possible. I don't see how meta_make() does the same as drake_plan() does, it takes a drake plan as input.?

I'm open to other names for the meta_make() operation, but it should match the effect, which is that it effectively executes commands defined dynamically when used inside make(). (At least this is the perceived effect; the real effect is that it creates a proxy object that is interpreted differently from normal targets by make(). It feels that we need to start writing documentation or tutorials before we can really come up with a good way to teach/communicate this idea, and the naming choice could be based on that.)

rkrug · 2018-03-09T08:36:00Z

I don't see how meta_make() does the same as drake_plan() does, it takes a drake plan as input?

True - but the output is identical to a plan, only that it has an attribute sub-plan?

....[quite a bit of thinking in the background ....]

Ahhhh - I think I see what you are aiming at. This would make the assembling of complex plans much easier, as they can be split into sub-plans and assembled at a later stage. OK - that is a different thing (and you are right - we have to talk about the naming at the later stage).

Using these sub-plans is very powerful, as you say, but one question needs to be addressed, which is the naming of the targets.

target names have to be unique within each plan, within each sub-plan (which is obvious), but I think a sub-plan in a plan should be able to have the same names.

plan <- drake::drake_plan(
  x = 1,
  y = 2,
  meta_plan = drake_plan(
    x = x * 2,
    y = y / x,
    z = p(x)
  ),
  results = meta_make(meta_plan)
)

should be possible. The targets would be x, y, results$x, results$y, result$z.
In addition, the x in the subplan should refer to the x in the sub-plan.
While p(x) would refer to the target x in the parent plan.

rkrug · 2018-03-09T11:59:27Z

It seems easiest if we impose the restriction that the meta-plan can only access outside targets.

I think that would limit the usefulness quite a bit and lead to the combination of many targets into a single one in the meta-plan. To have be able to have inside the meta-plan a tree structure of dependent targets (inside as well as parent plan) would be, as I see it, the main advantage of a meta-plan. One could encapsulate e.g. in one meta-plan the simulation, cleaning and different analysis of the results, while the plan contains the generation of the parameter, which are used by the meta-plan.
So changing the parameter, would only require changing the (relatively simple) plan, and not the (possibly complex) meta-plan.

krlmlr · 2018-03-09T12:59:10Z

I hear you, but I'd prefer to keep complexity as low as possible at this point -- even the simplest solution will take some time and careful thought to implement.

Could you please share a concrete example, maybe a plan that would work for one particular parameter value? We can use this to discuss how this can be implemented based on the current proposal.

rkrug · 2018-03-09T13:14:39Z

OK - here comes one (I hope this is what you are looking for - if yes, I could throw in the missing function definitions):

Assuming this one function to create a parameter set:

parms <- function(n0,  r) {
  return(
    list(
      n0 = n0,
      r = r,
      K = 100
    )
  )
}

to simulate one has to do:

all1 <- drake_plan(
  parameter = parms(10, 1.5),
  sim = sim(parameter),
  analysis1 = analysis1(sim),
  analysis2 = analysis2(sim),
  compare = comp(analysis1, analysis2)    
)

So if one want's to chage the parameter, one has to edit a longish plan.

In contrast:

simAndAnalysis <- drake_plan(
  sim = sim(parameter),
  analysis1 = analysis1(sim),
  analysis2 = analysis2(sim),
  compare = comp(analysis1, analysis2)    
)

all2 <- drake_plan(
  parameter = parms(10, 1.5),
  result = meta_make(simAndAnalysis)
)

here one only has to edit the all2 plan, while the logic of the simulation and analysis stays untouched, therefore much less errorprone.

But the requirement is to have access to the all the targets.

krlmlr · 2018-03-09T13:28:52Z

No need to provide implementations, we can also use functions from cooking ;-)

With the current proposal, implementing this plan with multiple parameters is likely to involve multiple meta_make() targets constructed in a very ugly way:

drake_plan(
  parameters = list(paramA = parms(...), paramB = parms(...), ...),
  sim = meta_make(sim_from_parameters(parameters)),
  analysis1 = meta_make(analysis1_from_sim(sim)),
  analysis2 = meta_make(analysis2_from_sim(sim)),
  compare = meta_make(compare_from_analyses(analysis1, analysis2))
)

sim_from_parameters <- function(parameters) {
  tibble(
    target = paste0("sim_", names(parameters)),
    command = paste0("sim(parameters[[\"", names(parameters), "\"]])")
  )
}

analysis1_from_sim <- function(sim) {
  tibble(
    target = paste0("analysis1_", names(sim)),
    command = paste0("analysis1(sim[[\"", names(sim), "\"]])")
  )
}

...

Only the DSL (#233) aims at providing a nice syntax.

krlmlr · 2018-03-22T08:05:49Z

We could also use meta-plans to handle knitr_in()! Exploring the dependencies becomes an intermediate step:

drake_plan(
  rmarkdown::render(knitr_in("report.Rmd"), ...)
)

then becomes (through an internal transformation)

  `_knitr_deps_report.Rmd` = analyze_knitr_in("report.Rmd"),
  { pack(`_knitr_deps_report.Rmd`); rmarkdown::render(file_in("report.Rmd"), ...) }

Would that simplify or complicate things?

wlandau · 2018-03-22T17:34:34Z

Yes, I think this is one of the problems the DSL is meant to solve. One edge case I have been thinking about is if the user has commands to write the Rmd source.

drake_plan(
  write_my_report(); file_out("report.Rmd"),
  rmarkdown::render(knitr_in("report.Rmd"), file_out("report.html"))
)

If report.Rmd does not already exist before make() is called, drake cannot analyze the report for dependencies. And once the first make() creates the report and those dependencies are found, the target "report.html" is invalidated. In other words, both the first and second make()s do stuff. If the DSL's internal transformation is delayed, this should no longer be a problem.

wlandau · 2019-01-17T06:14:52Z

I wonder, do we still need this nested plan structure? Could a lazy application of the transformation/grouping interface from #233 (comment) accomplish the dynamic branching we want?

wlandau · 2019-01-18T20:57:00Z

We may come back to meta-make eventually, but I think #233 (comment) brings us closer to dynamic branching and job groups.

wlandau · 2019-01-18T21:38:48Z

See #685.

ropensci deleted a comment from rkrug Mar 7, 2018

This was referenced Mar 7, 2018

output file as target #297

Closed

drake_plan() should look for duplicate targets #305

Closed

wlandau-lilly added type: new feature difficulty: advanced topic: api labels Mar 7, 2018

wlandau added the status: priority label Mar 8, 2018

wlandau mentioned this issue Mar 13, 2018

Offload wildcard templating to the wildcard package #240

Closed

krlmlr mentioned this issue Mar 15, 2018

New function reduce_plan() #326

Closed

wlandau added status: priority and removed type: new feature status: priority labels Mar 20, 2018

This was referenced Mar 26, 2018

Allow multiple output files for each command #283

Closed

Implicit and dynamic file dependencies #347

Closed

wlandau mentioned this issue Apr 3, 2018

evaluate file.path and variables in file_out and friends #353

Closed

krlmlr mentioned this issue Apr 3, 2018

Consider expanding on the target() pseudo-function in drake_plan() #351

Closed

2 tasks

This was referenced May 1, 2018

How to add .R files to drake_plan() #193

Closed

Wildcard alternative to gather/reduce_plan #376

Closed

wlandau mentioned this issue May 31, 2018

Pass parameters to evaluate_plan through a grid, rather than a series of vectors #235

Closed

wlandau removed the status: priority label Jun 11, 2018

wlandau mentioned this issue Oct 12, 2018

implement exponential backoff in mc_master? #537

Closed

wlandau mentioned this issue Oct 27, 2018

A quicker end to staged parallelism #369

Closed

13 tasks

krlmlr mentioned this issue Nov 7, 2018

Bare-bones scheduling and caching #575

Closed

wlandau mentioned this issue Dec 24, 2018

Remove all non-clustermq parallel backends? #561

Closed

wlandau closed this as completed Jan 18, 2019

wlandau mentioned this issue Feb 20, 2019

Dynamic branching #685

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta-make #304

Meta-make #304

krlmlr commented Mar 6, 2018

rkrug commented Mar 7, 2018 •

edited

Loading

krlmlr commented Mar 7, 2018

rkrug commented Mar 7, 2018 •

edited

Loading

rkrug commented Mar 7, 2018

krlmlr commented Mar 7, 2018

rkrug commented Mar 7, 2018

krlmlr commented Mar 7, 2018

wlandau-lilly commented Mar 7, 2018

AlexAxthelm commented Mar 7, 2018

krlmlr commented Mar 7, 2018 •

edited

Loading

wlandau-lilly commented Mar 8, 2018

wlandau-lilly commented Mar 8, 2018

krlmlr commented Mar 8, 2018

wlandau commented Mar 8, 2018

violetcereza commented Mar 8, 2018

krlmlr commented Mar 8, 2018 •

edited

Loading

krlmlr commented Mar 8, 2018 •

edited

Loading

rkrug commented Mar 9, 2018

krlmlr commented Mar 9, 2018

rkrug commented Mar 9, 2018

krlmlr commented Mar 9, 2018 •

edited

Loading

rkrug commented Mar 9, 2018

rkrug commented Mar 9, 2018

krlmlr commented Mar 9, 2018

rkrug commented Mar 9, 2018

krlmlr commented Mar 9, 2018

krlmlr commented Mar 22, 2018

wlandau commented Mar 22, 2018

wlandau commented Jan 17, 2019

wlandau commented Jan 18, 2019

wlandau commented Jan 18, 2019

Meta-make #304

Meta-make #304

Comments

krlmlr commented Mar 6, 2018

rkrug commented Mar 7, 2018 • edited Loading

krlmlr commented Mar 7, 2018

rkrug commented Mar 7, 2018 • edited Loading

rkrug commented Mar 7, 2018

krlmlr commented Mar 7, 2018

rkrug commented Mar 7, 2018

krlmlr commented Mar 7, 2018

wlandau-lilly commented Mar 7, 2018

AlexAxthelm commented Mar 7, 2018

krlmlr commented Mar 7, 2018 • edited Loading

wlandau-lilly commented Mar 8, 2018

wlandau-lilly commented Mar 8, 2018

krlmlr commented Mar 8, 2018

wlandau commented Mar 8, 2018

violetcereza commented Mar 8, 2018

krlmlr commented Mar 8, 2018 • edited Loading

krlmlr commented Mar 8, 2018 • edited Loading

rkrug commented Mar 9, 2018

krlmlr commented Mar 9, 2018

rkrug commented Mar 9, 2018

krlmlr commented Mar 9, 2018 • edited Loading

rkrug commented Mar 9, 2018

rkrug commented Mar 9, 2018

krlmlr commented Mar 9, 2018

rkrug commented Mar 9, 2018

krlmlr commented Mar 9, 2018

krlmlr commented Mar 22, 2018

wlandau commented Mar 22, 2018

wlandau commented Jan 17, 2019

wlandau commented Jan 18, 2019

wlandau commented Jan 18, 2019

rkrug commented Mar 7, 2018 •

edited

Loading

rkrug commented Mar 7, 2018 •

edited

Loading

krlmlr commented Mar 7, 2018 •

edited

Loading

krlmlr commented Mar 8, 2018 •

edited

Loading

krlmlr commented Mar 8, 2018 •

edited

Loading

krlmlr commented Mar 9, 2018 •

edited

Loading