-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to add .R files to drake_plan() #193
Comments
Great question, @pat-s! Questions like these are starting to come up a lot, and yours is the first in an FAQ I am starting. EDITThe following examples show how to set up the files for drake projects:
Get the code with drake_example("basic")
drake_example("gsp")
drake_example("packages") Each of the above writes a folder with code files. To make sure Original responseThe feature you want is designed for Why not detect ls()
## character(0)
source("my_functions.R")
ls() # The simulate() function is defined in my_functions.R and treated as an import.
## [1] simulate
simulate
## function(n) {
## data.frame(x = stats::rnorm(n), y = rpois(n, 1))
## }
deps(simulate)
## [1] "data.frame" "rpois" "stats::rnorm" Now that you loaded your imports with analyze <- data.frame(
target = c("test_dataset", "'test.md'"),
command = c(
"simulate(5)",
"knit('03_scripts/01_test.Rmd', output = \"test.md\", quiet = TRUE)"
)
)
analyze
## target command
## 1 test_dataset simulate(5)
## 2 'test.md' knit('03_scripts/01_test.Rmd', output = "test.md", quiet = TRUE)
make(analyze) where
A second
Besides Does that help? Does this way of doing things meet your needs, or does your use case require you to |
Hi @wlandau-lilly, thanks for the quick and extensive answer! So if I understand correctly, all analysis scripts should be Rmarkdown files while .R scripts should only contain functions that are used as imports and then analyzed for their dependencies?
Right, in my case, I have a mixture of However, the modeling stuff is stored in As I said, these files contain modeling code and therefore depend on the preprocessing Afterwards, there are again additional analyze <- drake_plan(
preprocessing.md = knit('03_scripts/01_preprocessing.Rmd', quiet = TRUE), # 1
EDA.md = knit('03_scripts/02_EDA.Rmd', quiet = TRUE), # 2
study_area.md = knit('03_scripts/03_study_area.Rmd', quiet = TRUE), # 2
brt_sp_non = source('03_scripts/server/brt_sp_non.R'), # 3
[...]
brt_nsp_nsp = source('03_scripts/server/brt_nsp_nsp.R'), # 3
cv_vis.md = knit('03_scripts/04_spcv_vis.Rmd', quiet = TRUE) # 4
) where
Hope that does not confuse you too much 😄 . |
Just wanted to chime in my support for accommodating a mixture of Similar to @pat-s , the distinction for my projects is usually:
|
@pat-s and @tiernanmartin, thank you for clarifying. I think I can do a better job of explaining now.
You don't need any analysis scripts. R Markdown files are just a special accommodation, and For example, instead of analyze <- drake_plan(
...
preprocessing.md = knit('03_scripts/01_preprocessing.Rmd', quiet = TRUE),
brt_sp_non = source('03_scripts/server/brt_sp_non.R'),
...
)
make(analyze) you might try something like source("03_scripts/server/brt_sp_non.R")
analyze <- drake_plan(
...,
preprocessed_data = preprocess_data(read_my_data('data.csv')),
brt_sp_non = build_brt_sp_non(preprocessed_data),
...
)
make(analyze) and define functions The commands in your workflow plan data frame are just arbitrary chunks of R code that return values. Other than the special accommodations for Here is an example of a workflow that juggles a bunch of numbers. my_plan <- drake_plan(
a = 1 + 1,
b = {
x <- pi + a
y <- sqrt(x)
rand <- rnorm(10, sd = y)
mean(rand)
},
c = a - 5,
d = c(b, c)
)
config <- drake_config(my_plan)
vis_drake_graph(config) make(my_plan)
readd(d)
## [1] 1.886363 -3.000000 Now, if you rely on R Markdown reports for sharing results, you have the option to create one and then |
I forgot to mention: library(drake)
my_plan <- drake_plan(
a = f(1 + 1)
)
f <- function(x){
g(x + 1)
}
g <- function(x){
x + sqrt(4)
}
config <- drake_config(my_plan)
vis_drake_graph(config) make(my_plan)
## ...
## target a
readd(a)
## [1] 5
make(my_plan)
## ...
## All targets are already up to date.
# Let's change function g().
g <- function(x){
x + 3
}
# f() depends on g(), and target `a` depends on f().
# So target `a` is out of date, and `make()` recomputes it.
make(my_plan)
## ...
## target a
# The value of `a` changed.
readd(a)
## [1] 6 |
Thanks @wlandau-lilly, I tried your suggestion with the following setup now: source("03_scripts/01_preprocessing.R")
source("03_scripts/server/modeling_functions.R")
methods <- drake_plan(
preprocessed_data = preprocess(
pathogens = "/data/patrick/raw/survey_data/diseases240112_mod.csv",
points = "/data/patrick/mod/survey_data/points_mod.shp",
slope = "/data/patrick/mod/DEM/slope/slope_5m.tif",
ph = new("GDALReadOnlyDataset", "/data/patrick/raw/ph_europe/ph_cacl2"),
lithology = "/data/patrick/raw/lithology/CT_LITOLOGICO_25000_ETRS89.shp",
hail = "/data/patrick/raw/hail/Prob_GAM_square_area.tif",
elevation = "/data/patrick/mod/DEM/dem_5m.tif",
soil = "/data/patrick/raw/soil/ISRIC_world_soil_information/TAXNWRB_250m_ll.tif",
study_area = "/data/patrick/raw/boundaries/basque-country/Study_area.shp"),
brt_sp_non = brt_sp_nsp(data = preprocessed_data, iterations = 200),
preprocess.md = knit("03_scripts/01_preprocessing.Rmd"),
strings_in_dots = "literals" I put the I generate However, when following your quick example, to generate a report for all the preprocessing I have to add something like So currently I would need an Hm, this really gets kinda complicated now and it seems that have to modify all my scripts..
drake_plan(
1 = report("01_preprocessing.Rmd", depends_on = NULL),
2 = report("02_EDA.Rmd", depends_on = 1),
3 = report("03_study_area.Rmd", depends_on = 1),
4 = script(list("04_scripts/20 files here.R"), depends_on = 1)
5 = report("05_spcvis.Rmd", depends on = c(1, 4)
) |
I think we're making progress here.
No additional
Yes, drake_plan(
preprocessed_data = preprocessing_01(),
EDA = EDA_02(preprocessed_data),
study_area = study_area_03(preprocessed_data),
other_work = other_work_04(preprocessed_data),
spcvis = spcvis_05(preprocessed_data, other_work)
) You don't need anything like |
There was some confusion about the role of R Markdown reports in drake workflows. They are not actually necessary, and the only targets they generate are report files such as `.md` and `.html` files. I have changed the report in the basic example to try to explain. @pat-s, I hope this helps.
Something else I forgot to clarify: if one of your commands is |
@pat-s please see the new best practices guide. I try to lay out the issue there as best I can, and I will likely add more in the future. I think I have done enough to close here, but let's keep talking on the thread. |
I have been thinking a lot more about this thread, and I have restructured |
Always good to hear when a discussion ends fruitful! Haven't had time in the last days to take a look in detail, will do it later and report my thoughts! |
And I look forward to learning what you think. This is such an important piece for |
Read it and briefly wanna say that its more clear to me now than before. Maybe you can still add a section only focusing on the differences in the function oriented vs the script oriented approach. Imo this would fit into the "Get started" page so that new readers understand the difference right from the start. Congrats for being accepted at ropensci 👍 |
Thanks, @pat-s! And I am glad this is making more sense. I would like to keep the "Get started" and |
For me it seems nice, to be able to have the output of .Rmd as possible targets. I often use .Rmd to query databases via the {sql}-engine. It would also seem nice, to have .Rmd-reports in the middle of a pipeline to have a more streamlined "report" into some intermediate results maybe. With the current functionality, design - how would I best incorporate existing rmarkdown scripts into the Just started experimenting with the package though (and I am often shy to put my scripts into functions anyways, so far, which will probably change :)) |
For existing R Markdown scripts, make sure those files exist, and then use plan <- drake_plan(
rmarkdown::render(
knitr_in("report.Rmd"),
output_file = file_out("report.html"),
quiet = TRUE
),
...
)
make(plan) Inside the active code chunks in your report, you can use
You can generate |
Hi @wlandau, |
@ablack3 great question. In the latest release (version 6.1.0), you can actually start with scratch work in a script or notebook and then use We're fix our code (the variable names should have capital letters). And maybe we add some more code. You can put multi-line commands in curly braces. And then when we are ready, we can convert it to a install.packges("BiocManager")
BiocManager::install("CodeDepends") Then we call library(drake)
plan <- code_to_plan("scratch.Rmd")
config <- drake_config(plan)
vis_drake_graph(config) make(plan)
#> target model
#> target coef
#> target pos_coef Created on 2018-10-28 by the reprex package (v0.2.1) If we find a mistake later, we can convert the plan into a notebook or a script to go back and tinker with stuff again. # install.packages("styler")
plan_to_notebook(plan, "my_notebook.Rmd")
plan_to_code(plan, "my_script.R") |
A couple alternatives to this approach are |
I finally converted my projects to use I think the I think it would be worth to have a dedicated section (in the manual?) describing its behavior because I think people might be scared when first reading this long issue. |
So glad to hear
Even so, part of my intent is to nudge people to use functions because it makes data analysis code cleaner and easier to maintain.
Is this section sufficient?
I tagged this issue with the "frequently asked question" label. Whenever the manual is deployed, the build script scrapes the issue tracker and lists all these labeled issues in this FAQ. Details here. |
Hi there,
maybe I overlooked something but I fail to properly add
.R
files todrake_plan
so that the dependencies are detected.Reprex:
And in
test2.R
I have the following:loadd(test.md)
However, when visualizing with
dependencies are not detected for the .R file.
If I use a .Rmd file, everything works as expected.
Tried a few different approaches but could not successfully add a .R script. Help 😄
The text was updated successfully, but these errors were encountered: