-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convention to skip codes in Rmarkdown when running in DSC #93
Comments
I'm inclined to say we might want to be able to do the opposite - go from
DSC code
to an interactive R markdown file that contains the code to runs all the
modules in a pipeline.
This could be very useful for debugging... (in a pure R setting)
…On Fri, Mar 2, 2018 at 11:50 PM, gaow ***@***.***> wrote:
Previously we have implemented support to RMD file as DSC executable
modules. Thinking about it a bit more, I think we can make some efforts to
bridge between interactive analyses and running in DSC if we invent some
conventions for RMD indented to be used in DSC. Since I'm not very sure how
RMD is typically used I hope you could suggest me what to do.
Basically, one possibility could be something like this, eg, in DSC:
normal: normal.Rmd
n: 1000
...
And the Rmd file might look like:
Now let's prototype this method. First set parameters:
` ` `{r}n = 1000` ` `
Then run this method
` ` `{r chunck_name}x = rnorm(n)` ` `
where (explain what everything is)
Now let's make a diagnostic plot:
` ` `{r}plot(x)` ` `
You see DSC will only need the 2nd code chunk. Does it seem to be useful
to have something along the lines of:
normal: normal.Rmd:name
n:1000
Or even eg,
trunk_normal: normal.Rmd:simulate + normal.Rmd:truncate
n: 1000
This way will in some way tie DSC with exploratory analysis. If this
propose looks useful in practice, could we decide on a syntax for it?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#93>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABt4xe6zEzT7nrcioJMwcxC9kaISeyjRks5tai8JgaJpZM4Sayzm>
.
|
This is provided in plain HTML form as part of the debug tutorial, see this example. It is easy to provide as Rmd format. However, i provide HTML due to the nature of mixed language. I would argue that making sure codes work in interactive environment before sending to DSC is still very important. The proposed mechanism in my first post at least prevents users to have to manually extract (and maintain) codes from interactive R markdown into R files for use with DSC. |
@gaow I don't see R Markdown files being useful for DSC, or certainly they do not need any special attention beyond regular R scripts. I agree with Matthew that they could be useful for summarizing or visualizing the results of a DSC experiment. |
Currently there are 3 types of HTML summaries DSC produces for overview of scripts / results outline, all in HTML format:
The goal of my proposal RE taking Rmarkdown chunks is to allow the possibility to turn a vignette into DSC with hardly any effort. That way, all DSC modules can live in the same Rmd file as results of interactively developing them. But with Rmd file it is possible to specify a module to only use a chunk in the Rmd file, ie, the core methods. |
In an R Markdown file, the chunks are intended to be run sequentially, but in DSC they will not be. Plus, the variables defined in one chunk are automatically available in subsequent chunks, whereas in DSC they will not be. So I would say that this is not a good approach to defining modules in DSC. |
Perhaps it is easier to illustrate with an example. I implemented it and made an example here. The Rmd is diff --git a/vignettes/mash/intro_mash.Rmd b/home/gaow/GIT/software/mashr/vignettes/intro_mash.Rmd
index f025add..f95f756 100644
--- a/vignettes/mash/intro_mash.Rmd
+++ b/home/gaow/GIT/software/mashr/vignettes/intro_mash.Rmd
@@ -36,16 +36,12 @@ There are essentially four steps to a mash analysis
Here we do each of these step-by-step. However, first we simulate
some data for illustration.
` ``{r}
library(ashr)
library(mashr)
set.seed(1)
+n_effects = 500
+n_cond = 5
+` ``
+
+` ``{r simulate_1}
+simdata = simple_sims(n_effects,n_cond,1)
-simdata = simple_sims(500,5,1)
` ``
This simulation routine creates a dataset with 5 conditions, and four
@@ -68,7 +64,7 @@ The simulation above created both these matrices for us (in
`mash` you must first use `mash_set_data` to create a data object
witho those two pieces of information:
+` ``{r simulate_2}
-` ``{r}
data = mash_set_data(simdata$Bhat, simdata$Shat)
` ``
@@ -86,7 +82,7 @@ The function to set up canonical covariance matries is
(we used `.c` to indicate canonical), which is a named list of
matrices.
+` ``{r cov}
-` ``{r}
U.c = cov_canonical(data)
print(names(U.c))
` ``
@@ -96,7 +92,7 @@ print(names(U.c))
Having set up the data and covariance matrices you are ready to fit
the model using the `mash` function:
+` ``{r fit}
-` ``{r}
m.c = mash(data, U.c)
` `` This is a trivial example but the point is that one can do a "quick and dirty" DSC this way, although we discourage it (by print warnings from DSC) for use in finalized benchmark for best practice considerations. |
@gaow In this particular example, when the |
@gaow My general worry here is that if we provide this as an option in DSC, we are implicitly encouraging this. This will inevitably lead to confusion and users sending you emails with questions such as, "why do I get an error when I run DSC on my R Markdown file?" Also, my personal opinion is that this is Bad Practice, and I'm hesistant to provide options that would allow for Bad Practice. |
@pcarbo on the DSC interface it specifies
No, because the executable definition reads: |
@gaow Also, do you think it is worthwhile adding more complexity to the module syntax for this special case? |
I acknowledge this, as have explained in the documentation, with warning messages in the runs: It is perhaps a judgement call, but in general I tend to be liberal at the point where too rigorous requirement of best practice sacrifices productivity (drawing a line is tricky though). Other than what we are discussing here, I can already see ways people might abuse DSC that we may not have the energy to support. My only motivation is that it might be convenient to people who know what they are doing, eg, to lab members the most important target users of DSC. One can also make feature requests to support
I agree, too. My argument is that this could be a case / syntax some people never use, if they do not like it; or decide to like and use it after 10min reading. I guess it is acceptable if it did not make existing simple case complicated and confusing. Plus we can always work on syntax (to the limit I can properly handle), which is part of the goal of this ticket. The main issue I agree is perhaps still on best practice, which I shared my view above. |
Of course it increases burdens in engineering (in this one I did not use |
@gaow I would say just the opposite---rigorous coding requirements, when appropriate, improve productivity. |
Now moved to unsupported features #107 |
Previously we have implemented support to RMD file as DSC executable modules. Thinking about it a bit more, I think we can make some efforts to bridge between interactive analyses and running in DSC if we invent some conventions for RMD indented to be used in DSC. Since I'm not very sure how RMD is typically used I hope you could suggest me what to do.
Basically, one possibility could be something like this, eg, in DSC:
And the Rmd file might look like:
You see DSC will only need the 2nd code chunk. Does it seem to be useful to have something along the lines of:
Or even eg,
This way will in some way tie DSC with exploratory analysis. If this propose looks useful in practice, could we decide on a syntax for it?
The text was updated successfully, but these errors were encountered: