-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Meta-make #304
Comments
I like the idea of having a wiki page to discuss possible verbs. I am not to clear about your verbs - could you please add the tibble / plan you would expect at the end? |
Thanks for your feedback. I updated the wiki and pinged you in the commit message, would be interested to learn if you received a notification. Please let me know if things are clearer now. |
Thanks. Makes things much easier. The Looking at
|
In the classical |
Done adding reprexes. Switching to rmarkdown would have been smarter ;-) I'd suggest that drake just refuses to run plans with duplicate names, just the way it works currently. That feels like the safest option. Iterative plans will be covered by the DSL. This is just about low-level tooling to make the DSL possible. |
@krlmlr OK - so But than the creation of the plan should include the check, not the But the error message should come during creation of the plan in the |
Yes, |
Thanks so much, Kirill! I read your proposal, but it will take me more time to come up with feedback. |
I'm really liking the modularity of the design here, and I want to see if I'm understanding correctly. If I have a problem along the lines of "take a big data set, split by an unknown set of factors, and run a arbitrart code on each" (something for which I would use dplyr::do), woudl that be something along the lines of: drake_plan(
data = generic_data_importing_function(file_in("data.csv")),
factor_levels = unique(pull(data, "factor_column") ),
meta_plan = drake_plan(
this_factor = unpack(factor_levels),
data_subset = filter(data, factor_column == this_factor),
analysis = analysis_function(data_subset)
),
results = meta_make(meta_plan)
) or would this require some sort of 4th verb (along the lines of a |
Thanks. I don't intend to change the semantics of drake_plan(
data = generic_data_importing_function(file_in("data.csv")),
splits = data %>% nest(-factor_column) %>% deframe(),
meta_plan = tibble(
target = paste0("analyzed_", names(splits)),
command = paste0("analysis_function(splits[[\"", names(splits), "\"]])")
),
results = meta_make(meta_plan)
) This will create targets of the form The targets in The command to create |
Should plan <- drake::drake_plan(
x = 1,
y = 2,
meta_plan = drake_plan(
a = x,
b = y
),
results = drake::make(meta_plan)
)
make(plan) It fails because target plan1 <- drake_plan(
x1 = 1,
results1 = make(plan1),
plan1 = drake_plan(
x2 = x1,
results2 = make(plan2),
plan2 = drake_plan(
x3 = x2,
results3 = make(plan3),
plan3 = drake_plan(
...
)
)
)
) In any case, I do like where this is headed. The "meta" piece will make the hardest parts of the DSL possible. |
I think |
Thanks. I'd rather give an informative error message if
The way |
Ah, so
Sure, that's fine. If |
I think it would be nice to unify the making interface under one function I'm confused about the return value of Maybe I need some clarity around what declarative means and how this affects working with drake targets both within plans and outside of them. Going off of your analogy in the wiki, |
@dapperjapper: Thanks. I've added an implementation sketch to the wiki (https://github.com/ropensci/drake/wiki/Meta-make#implementation-sketch), and also tweaked a bit -- see differences. Does this help answer your questions? |
If we follow the implementation sketch, it feels easiest to start with |
@krlmlr I am confused a little bit now - if |
@rkrug: Great question. I think we want to give users a way to create "sub-plans" from home-grown tibbles created outside of meta_make(tibble(
target = ...
command = ...
)) |
So Actually - do we really need this? Isnt't it possible to simply use |
I want to implement this in a transparent way, using as little magic as possible. I don't see how I'm open to other names for the |
True - but the output is identical to a plan, only that it has an attribute sub-plan? ....[quite a bit of thinking in the background ....] Ahhhh - I think I see what you are aiming at. This would make the assembling of complex plans much easier, as they can be split into sub-plans and assembled at a later stage. OK - that is a different thing (and you are right - we have to talk about the naming at the later stage). Using these sub-plans is very powerful, as you say, but one question needs to be addressed, which is the naming of the targets. target names have to be unique within each plan, within each sub-plan (which is obvious), but I think a sub-plan in a plan should be able to have the same names.
should be possible. The targets would be x, y, results$x, results$y, result$z. |
I think that would limit the usefulness quite a bit and lead to the combination of many targets into a single one in the meta-plan. To have be able to have inside the meta-plan a tree structure of dependent targets (inside as well as parent plan) would be, as I see it, the main advantage of a meta-plan. One could encapsulate e.g. in one meta-plan the simulation, cleaning and different analysis of the results, while the plan contains the generation of the parameter, which are used by the meta-plan. |
I hear you, but I'd prefer to keep complexity as low as possible at this point -- even the simplest solution will take some time and careful thought to implement. Could you please share a concrete example, maybe a plan that would work for one particular parameter value? We can use this to discuss how this can be implemented based on the current proposal. |
OK - here comes one (I hope this is what you are looking for - if yes, I could throw in the missing function definitions): Assuming this one function to create a parameter set:
to simulate one has to do:
So if one want's to chage the parameter, one has to edit a longish plan. In contrast:
here one only has to edit the But the requirement is to have access to the all the targets. |
No need to provide implementations, we can also use functions from cooking ;-) With the current proposal, implementing this plan with multiple parameters is likely to involve multiple drake_plan(
parameters = list(paramA = parms(...), paramB = parms(...), ...),
sim = meta_make(sim_from_parameters(parameters)),
analysis1 = meta_make(analysis1_from_sim(sim)),
analysis2 = meta_make(analysis2_from_sim(sim)),
compare = meta_make(compare_from_analyses(analysis1, analysis2))
)
sim_from_parameters <- function(parameters) {
tibble(
target = paste0("sim_", names(parameters)),
command = paste0("sim(parameters[[\"", names(parameters), "\"]])")
)
}
analysis1_from_sim <- function(sim) {
tibble(
target = paste0("analysis1_", names(sim)),
command = paste0("analysis1(sim[[\"", names(sim), "\"]])")
)
}
... Only the DSL (#233) aims at providing a nice syntax. |
We could also use meta-plans to handle drake_plan(
rmarkdown::render(knitr_in("report.Rmd"), ...)
) then becomes (through an internal transformation) `_knitr_deps_report.Rmd` = analyze_knitr_in("report.Rmd"),
{ pack(`_knitr_deps_report.Rmd`); rmarkdown::render(file_in("report.Rmd"), ...) } Would that simplify or complicate things? |
Yes, I think this is one of the problems the DSL is meant to solve. One edge case I have been thinking about is if the user has commands to write the drake_plan(
write_my_report(); file_out("report.Rmd"),
rmarkdown::render(knitr_in("report.Rmd"), file_out("report.html"))
) If |
I wonder, do we still need this nested plan structure? Could a lazy application of the transformation/grouping interface from #233 (comment) accomplish the dynamic branching we want? |
We may come back to meta-make eventually, but I think #233 (comment) brings us closer to dynamic branching and job groups. |
See #685. |
I'd like to propose three very simple verbs that might get us halfway towards the goal of a DSL (#233). This proposal addresses the low-level technical part, which I think is required for any DSL.
The proposal is too long for an issue (and issues aren't version-controlled), I've added it to the wiki: https://github.com/ropensci/drake/wiki/Meta-make. Perhaps a 5 to 10 minute read.
I haven't closely followed previous related discussions, there might be overlap I'm not aware of. In particular, #79 might be a similar approach. Sharing this early for feedback and discussion, CC @AlexAxthelm @dapperjapper @rkrug @kendonB.
The text was updated successfully, but these errors were encountered: