Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convention to skip codes in Rmarkdown when running in DSC #93

Closed
gaow opened this issue Mar 3, 2018 · 14 comments
Closed

Convention to skip codes in Rmarkdown when running in DSC #93

gaow opened this issue Mar 3, 2018 · 14 comments

Comments

@gaow
Copy link
Member

gaow commented Mar 3, 2018

Previously we have implemented support to RMD file as DSC executable modules. Thinking about it a bit more, I think we can make some efforts to bridge between interactive analyses and running in DSC if we invent some conventions for RMD indented to be used in DSC. Since I'm not very sure how RMD is typically used I hope you could suggest me what to do.

Basically, one possibility could be something like this, eg, in DSC:

normal: normal.Rmd
  n: 1000
  ...

And the Rmd file might look like:

Now let's prototype this method. First set parameters:

` ` `{r}
n = 1000
` ` ` 
Then run this method

` ` `{r chunck_name}
x = rnorm(n)
` ` `

where (explain what everything is)

Now let's make a diagnostic plot:

` ` `{r}
plot(x)
` ` `

You see DSC will only need the 2nd code chunk. Does it seem to be useful to have something along the lines of:

normal: normal.Rmd:name
  n:1000

Or even eg,

trunk_normal: normal.Rmd:simulate + normal.Rmd:truncate
  n: 1000

This way will in some way tie DSC with exploratory analysis. If this propose looks useful in practice, could we decide on a syntax for it?

@stephens999
Copy link
Contributor

stephens999 commented Mar 3, 2018 via email

@gaow
Copy link
Member Author

gaow commented Mar 3, 2018

This is provided in plain HTML form as part of the debug tutorial, see this example. It is easy to provide as Rmd format. However, i provide HTML due to the nature of mixed language.

I would argue that making sure codes work in interactive environment before sending to DSC is still very important. The proposed mechanism in my first post at least prevents users to have to manually extract (and maintain) codes from interactive R markdown into R files for use with DSC.

@pcarbo
Copy link
Member

pcarbo commented Mar 4, 2018

@gaow I don't see R Markdown files being useful for DSC, or certainly they do not need any special attention beyond regular R scripts. I agree with Matthew that they could be useful for summarizing or visualizing the results of a DSC experiment.

@gaow
Copy link
Member Author

gaow commented Mar 4, 2018

Currently there are 3 types of HTML summaries DSC produces for overview of scripts / results outline, all in HTML format:

  1. The script browser
  2. The output meta info and dsc-query summary
  3. The complete transcript for debug, which to my understanding is what Matthew suggested, but in HTML format not specific to any scripting language. I assume it is acceptable because it is straightforward to copy relevant chunk into Rmarkdown file to run?

The goal of my proposal RE taking Rmarkdown chunks is to allow the possibility to turn a vignette into DSC with hardly any effort. That way, all DSC modules can live in the same Rmd file as results of interactively developing them. But with Rmd file it is possible to specify a module to only use a chunk in the Rmd file, ie, the core methods.

@pcarbo
Copy link
Member

pcarbo commented Mar 4, 2018

All DSC modules can live in the same Rmd file as results of interactively developing them.

In an R Markdown file, the chunks are intended to be run sequentially, but in DSC they will not be. Plus, the variables defined in one chunk are automatically available in subsequent chunks, whereas in DSC they will not be. So I would say that this is not a good approach to defining modules in DSC.

@gaow
Copy link
Member Author

gaow commented Mar 5, 2018

Perhaps it is easier to illustrate with an example. I implemented it and made an example here. The Rmd is mashr intro but I had to do the following to make it work with DSC:

diff --git a/vignettes/mash/intro_mash.Rmd b/home/gaow/GIT/software/mashr/vignettes/intro_mash.Rmd
index f025add..f95f756 100644
--- a/vignettes/mash/intro_mash.Rmd
+++ b/home/gaow/GIT/software/mashr/vignettes/intro_mash.Rmd
@@ -36,16 +36,12 @@ There are essentially four steps to a mash analysis
 
 Here we do each of these step-by-step. However, first we simulate
 some data for illustration.

 ` ``{r}
 library(ashr)
 library(mashr)
 set.seed(1)
+n_effects = 500
+n_cond = 5
+` ``
+
+` ``{r simulate_1}
+simdata = simple_sims(n_effects,n_cond,1)
-simdata = simple_sims(500,5,1)
 ` ``
 
 This simulation routine creates a dataset with 5 conditions, and four
@@ -68,7 +64,7 @@ The simulation above created both these matrices for us (in
 `mash` you must first use `mash_set_data` to create a data object
 witho those two pieces of information:
 
+` ``{r simulate_2}
-` ``{r}
 data = mash_set_data(simdata$Bhat, simdata$Shat)
 ` ``
 
@@ -86,7 +82,7 @@ The function to set up canonical covariance matries is
 (we used `.c` to indicate canonical), which is a named list of
 matrices.
 
+` ``{r cov}
-` ``{r}
 U.c = cov_canonical(data)  
 print(names(U.c))
 ` ``
@@ -96,7 +92,7 @@ print(names(U.c))
 Having set up the data and covariance matrices you are ready to fit
 the model using the `mash` function:
 
+` ``{r fit}
-` ``{r}
 m.c = mash(data, U.c)
 ` ``

This is a trivial example but the point is that one can do a "quick and dirty" DSC this way, although we discourage it (by print warnings from DSC) for use in finalized benchmark for best practice considerations.

@pcarbo
Copy link
Member

pcarbo commented Mar 5, 2018

@gaow In this particular example, when the simulate_2 code chunk is run from DSC, does it give an error that mash_set_data is not defined? How would DSC know to load the mashr library when running this code chunk?

@pcarbo
Copy link
Member

pcarbo commented Mar 5, 2018

@gaow My general worry here is that if we provide this as an option in DSC, we are implicitly encouraging this. This will inevitably lead to confusion and users sending you emails with questions such as, "why do I get an error when I run DSC on my R Markdown file?" Also, my personal opinion is that this is Bad Practice, and I'm hesistant to provide options that would allow for Bad Practice.

@gaow
Copy link
Member Author

gaow commented Mar 5, 2018

@pcarbo on the DSC interface it specifies simulate: simulate_*@intro.Rmd. So simulate_1 and simulate_2 will both be executed according to the sequence they are discovered by this wildcard pattern in the Rmd file. If it says simulate: simulate_2@intro.Rmd then users will run into an error, and only at run time. I acknowledge this is a more fragile interface but if users are careful about what they do, I imaging it could be handy for prototyping for Rmd users.

does it give an error that mash_set_data is not defined?

No, because the executable definition reads: R(library(mashr)) + simulate_*@intro.Rmd, although if I write the document I'd use mashr:: at all times rather than calling library().

@pcarbo
Copy link
Member

pcarbo commented Mar 5, 2018

@gaow Also, do you think it is worthwhile adding more complexity to the module syntax for this special case?

@gaow
Copy link
Member Author

gaow commented Mar 5, 2018

Also, my personal opinion is that this is Bad Practice, and I'm hesistant to provide options that would allow for Bad Practice.

I acknowledge this, as have explained in the documentation, with warning messages in the runs:

2018-03-05-12-44-47_scrot

It is perhaps a judgement call, but in general I tend to be liberal at the point where too rigorous requirement of best practice sacrifices productivity (drawing a line is tricky though). Other than what we are discussing here, I can already see ways people might abuse DSC that we may not have the energy to support. My only motivation is that it might be convenient to people who know what they are doing, eg, to lab members the most important target users of DSC. One can also make feature requests to support ipynb which, given the mechanism we already have, is not hard, but I'd not spend the energy unless the request comes from the lab.

Also, do you think it is worthwhile adding more complexity to the module syntax for this special case?

I agree, too. My argument is that this could be a case / syntax some people never use, if they do not like it; or decide to like and use it after 10min reading. I guess it is acceptable if it did not make existing simple case complicated and confusing. Plus we can always work on syntax (to the limit I can properly handle), which is part of the goal of this ticket. The main issue I agree is perhaps still on best practice, which I shared my view above.

@gaow
Copy link
Member Author

gaow commented Mar 5, 2018

Of course it increases burdens in engineering (in this one I did not use knitr but parsed Rmd with python codes. ) and test support ... I expect catching up with comprehensive test suite take time.

@pcarbo
Copy link
Member

pcarbo commented Mar 5, 2018

It is perhaps a judgement call, but in general I tend to be liberal at the point where too rigorous requirement of best practice sacrifices productivity (drawing a line is tricky though).

@gaow I would say just the opposite---rigorous coding requirements, when appropriate, improve productivity.

@gaow
Copy link
Member Author

gaow commented Apr 1, 2018

Now moved to unsupported features #107

@gaow gaow closed this as completed Apr 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants