Convention to skip codes in Rmarkdown when running in DSC #93

gaow · 2018-03-03T05:50:00Z

Previously we have implemented support to RMD file as DSC executable modules. Thinking about it a bit more, I think we can make some efforts to bridge between interactive analyses and running in DSC if we invent some conventions for RMD indented to be used in DSC. Since I'm not very sure how RMD is typically used I hope you could suggest me what to do.

Basically, one possibility could be something like this, eg, in DSC:

normal: normal.Rmd
  n: 1000
  ...

And the Rmd file might look like:

Now let's prototype this method. First set parameters:

` ` `{r}
n = 1000
` ` ` 
Then run this method

` ` `{r chunck_name}
x = rnorm(n)
` ` `

where (explain what everything is)

Now let's make a diagnostic plot:

` ` `{r}
plot(x)
` ` `

You see DSC will only need the 2nd code chunk. Does it seem to be useful to have something along the lines of:

normal: normal.Rmd:name
  n:1000

Or even eg,

trunk_normal: normal.Rmd:simulate + normal.Rmd:truncate
  n: 1000

This way will in some way tie DSC with exploratory analysis. If this propose looks useful in practice, could we decide on a syntax for it?

stephens999 · 2018-03-03T21:05:55Z

I'm inclined to say we might want to be able to do the opposite - go from DSC code to an interactive R markdown file that contains the code to runs all the modules in a pipeline. This could be very useful for debugging... (in a pure R setting)

…

On Fri, Mar 2, 2018 at 11:50 PM, gaow ***@***.***> wrote: Previously we have implemented support to RMD file as DSC executable modules. Thinking about it a bit more, I think we can make some efforts to bridge between interactive analyses and running in DSC if we invent some conventions for RMD indented to be used in DSC. Since I'm not very sure how RMD is typically used I hope you could suggest me what to do. Basically, one possibility could be something like this, eg, in DSC: normal: normal.Rmd n: 1000 ... And the Rmd file might look like: Now let's prototype this method. First set parameters: ` ` `{r}n = 1000` ` ` Then run this method ` ` `{r chunck_name}x = rnorm(n)` ` ` where (explain what everything is) Now let's make a diagnostic plot: ` ` `{r}plot(x)` ` ` You see DSC will only need the 2nd code chunk. Does it seem to be useful to have something along the lines of: normal: normal.Rmd:name n:1000 Or even eg, trunk_normal: normal.Rmd:simulate + normal.Rmd:truncate n: 1000 This way will in some way tie DSC with exploratory analysis. If this propose looks useful in practice, could we decide on a syntax for it? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#93>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABt4xe6zEzT7nrcioJMwcxC9kaISeyjRks5tai8JgaJpZM4Sayzm> .

gaow · 2018-03-03T21:46:53Z

This is provided in plain HTML form as part of the debug tutorial, see this example. It is easy to provide as Rmd format. However, i provide HTML due to the nature of mixed language.

I would argue that making sure codes work in interactive environment before sending to DSC is still very important. The proposed mechanism in my first post at least prevents users to have to manually extract (and maintain) codes from interactive R markdown into R files for use with DSC.

pcarbo · 2018-03-04T19:58:38Z

@gaow I don't see R Markdown files being useful for DSC, or certainly they do not need any special attention beyond regular R scripts. I agree with Matthew that they could be useful for summarizing or visualizing the results of a DSC experiment.

gaow · 2018-03-04T21:22:28Z

Currently there are 3 types of HTML summaries DSC produces for overview of scripts / results outline, all in HTML format:

The script browser
The output meta info and dsc-query summary
The complete transcript for debug, which to my understanding is what Matthew suggested, but in HTML format not specific to any scripting language. I assume it is acceptable because it is straightforward to copy relevant chunk into Rmarkdown file to run?

The goal of my proposal RE taking Rmarkdown chunks is to allow the possibility to turn a vignette into DSC with hardly any effort. That way, all DSC modules can live in the same Rmd file as results of interactively developing them. But with Rmd file it is possible to specify a module to only use a chunk in the Rmd file, ie, the core methods.

pcarbo · 2018-03-04T21:26:36Z

All DSC modules can live in the same Rmd file as results of interactively developing them.

In an R Markdown file, the chunks are intended to be run sequentially, but in DSC they will not be. Plus, the variables defined in one chunk are automatically available in subsequent chunks, whereas in DSC they will not be. So I would say that this is not a good approach to defining modules in DSC.

gaow · 2018-03-05T17:17:06Z

Perhaps it is easier to illustrate with an example. I implemented it and made an example here. The Rmd is mashr intro but I had to do the following to make it work with DSC:

diff --git a/vignettes/mash/intro_mash.Rmd b/home/gaow/GIT/software/mashr/vignettes/intro_mash.Rmd
index f025add..f95f756 100644
--- a/vignettes/mash/intro_mash.Rmd
+++ b/home/gaow/GIT/software/mashr/vignettes/intro_mash.Rmd
@@ -36,16 +36,12 @@ There are essentially four steps to a mash analysis
 
 Here we do each of these step-by-step. However, first we simulate
 some data for illustration.

 ` ``{r}
 library(ashr)
 library(mashr)
 set.seed(1)
+n_effects = 500
+n_cond = 5
+` ``
+
+` ``{r simulate_1}
+simdata = simple_sims(n_effects,n_cond,1)
-simdata = simple_sims(500,5,1)
 ` ``
 
 This simulation routine creates a dataset with 5 conditions, and four
@@ -68,7 +64,7 @@ The simulation above created both these matrices for us (in
 `mash` you must first use `mash_set_data` to create a data object
 witho those two pieces of information:
 
+` ``{r simulate_2}
-` ``{r}
 data = mash_set_data(simdata$Bhat, simdata$Shat)
 ` ``
 
@@ -86,7 +82,7 @@ The function to set up canonical covariance matries is
 (we used `.c` to indicate canonical), which is a named list of
 matrices.
 
+` ``{r cov}
-` ``{r}
 U.c = cov_canonical(data)  
 print(names(U.c))
 ` ``
@@ -96,7 +92,7 @@ print(names(U.c))
 Having set up the data and covariance matrices you are ready to fit
 the model using the `mash` function:
 
+` ``{r fit}
-` ``{r}
 m.c = mash(data, U.c)
 ` ``

This is a trivial example but the point is that one can do a "quick and dirty" DSC this way, although we discourage it (by print warnings from DSC) for use in finalized benchmark for best practice considerations.

pcarbo · 2018-03-05T18:03:20Z

@gaow In this particular example, when the simulate_2 code chunk is run from DSC, does it give an error that mash_set_data is not defined? How would DSC know to load the mashr library when running this code chunk?

pcarbo · 2018-03-05T18:43:01Z

@gaow My general worry here is that if we provide this as an option in DSC, we are implicitly encouraging this. This will inevitably lead to confusion and users sending you emails with questions such as, "why do I get an error when I run DSC on my R Markdown file?" Also, my personal opinion is that this is Bad Practice, and I'm hesistant to provide options that would allow for Bad Practice.

gaow · 2018-03-05T18:43:33Z

@pcarbo on the DSC interface it specifies simulate: simulate_*@intro.Rmd. So simulate_1 and simulate_2 will both be executed according to the sequence they are discovered by this wildcard pattern in the Rmd file. If it says simulate: simulate_2@intro.Rmd then users will run into an error, and only at run time. I acknowledge this is a more fragile interface but if users are careful about what they do, I imaging it could be handy for prototyping for Rmd users.

does it give an error that mash_set_data is not defined?

No, because the executable definition reads: R(library(mashr)) + simulate_*@intro.Rmd, although if I write the document I'd use mashr:: at all times rather than calling library().

pcarbo · 2018-03-05T18:50:22Z

@gaow Also, do you think it is worthwhile adding more complexity to the module syntax for this special case?

gaow · 2018-03-05T19:01:46Z

Also, my personal opinion is that this is Bad Practice, and I'm hesistant to provide options that would allow for Bad Practice.

I acknowledge this, as have explained in the documentation, with warning messages in the runs:

It is perhaps a judgement call, but in general I tend to be liberal at the point where too rigorous requirement of best practice sacrifices productivity (drawing a line is tricky though). Other than what we are discussing here, I can already see ways people might abuse DSC that we may not have the energy to support. My only motivation is that it might be convenient to people who know what they are doing, eg, to lab members the most important target users of DSC. One can also make feature requests to support ipynb which, given the mechanism we already have, is not hard, but I'd not spend the energy unless the request comes from the lab.

Also, do you think it is worthwhile adding more complexity to the module syntax for this special case?

I agree, too. My argument is that this could be a case / syntax some people never use, if they do not like it; or decide to like and use it after 10min reading. I guess it is acceptable if it did not make existing simple case complicated and confusing. Plus we can always work on syntax (to the limit I can properly handle), which is part of the goal of this ticket. The main issue I agree is perhaps still on best practice, which I shared my view above.

gaow · 2018-03-05T19:04:31Z

Of course it increases burdens in engineering (in this one I did not use knitr but parsed Rmd with python codes. ) and test support ... I expect catching up with comprehensive test suite take time.

pcarbo · 2018-03-05T19:12:16Z

It is perhaps a judgement call, but in general I tend to be liberal at the point where too rigorous requirement of best practice sacrifices productivity (drawing a line is tricky though).

@gaow I would say just the opposite---rigorous coding requirements, when appropriate, improve productivity.

gaow · 2018-04-01T03:57:38Z

Now moved to unsupported features #107

gaow added the discussion label Mar 30, 2018

gaow closed this as completed Apr 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convention to skip codes in Rmarkdown when running in DSC #93

Convention to skip codes in Rmarkdown when running in DSC #93

gaow commented Mar 3, 2018

stephens999 commented Mar 3, 2018 via email

gaow commented Mar 3, 2018

pcarbo commented Mar 4, 2018

gaow commented Mar 4, 2018

pcarbo commented Mar 4, 2018

gaow commented Mar 5, 2018

pcarbo commented Mar 5, 2018 •

edited

Loading

pcarbo commented Mar 5, 2018

gaow commented Mar 5, 2018

pcarbo commented Mar 5, 2018

gaow commented Mar 5, 2018 •

edited

Loading

gaow commented Mar 5, 2018

pcarbo commented Mar 5, 2018

gaow commented Apr 1, 2018

Convention to skip codes in Rmarkdown when running in DSC #93

Convention to skip codes in Rmarkdown when running in DSC #93

Comments

gaow commented Mar 3, 2018

stephens999 commented Mar 3, 2018 via email

gaow commented Mar 3, 2018

pcarbo commented Mar 4, 2018

gaow commented Mar 4, 2018

pcarbo commented Mar 4, 2018

gaow commented Mar 5, 2018

pcarbo commented Mar 5, 2018 • edited Loading

pcarbo commented Mar 5, 2018

gaow commented Mar 5, 2018

pcarbo commented Mar 5, 2018

gaow commented Mar 5, 2018 • edited Loading

gaow commented Mar 5, 2018

pcarbo commented Mar 5, 2018

gaow commented Apr 1, 2018

pcarbo commented Mar 5, 2018 •

edited

Loading

gaow commented Mar 5, 2018 •

edited

Loading