From 3184d072927753b9a2883140682cc23f18569e45 Mon Sep 17 00:00:00 2001 From: unknown Date: Mon, 12 Mar 2018 20:16:39 +0000 Subject: [PATCH 1/4] merge package management doc --- samples/package_management/README.md | 68 +--------------------------- 1 file changed, 1 insertion(+), 67 deletions(-) diff --git a/samples/package_management/README.md b/samples/package_management/README.md index b8d478f2..fe05ea91 100644 --- a/samples/package_management/README.md +++ b/samples/package_management/README.md @@ -1,69 +1,3 @@ # Using package management -doAzureParallel supports installing packages at either the cluster level or during the execution of the foreach loop. Packages installed at the cluster level benefit from only needing to be installed once per node. Each iteration of the foreach can load the library without needing to install them again. Packages installed in the foreach benefit from specifying any specific dependencies required only for that instance of the loop. - -## Cluster level packages - -Cluster level packages support CRAN, GitHub and BioConductor packages. The packages are installed in a shared directory on the node. It is important to note that it is required to explicitly load any packages installed at the cluster level within the foreach loop. For example, if you installed xml2 on the cluster, you must explicityly load it before using it. - -```R -foreach (i = 1:4) %dopar% { - # Load the libraries you want to use. - library(xml2) - xml2::as_list(...) -} -``` - -### CRAN - -CRAN packages can be insatlled on the cluster by adding them to the collection of _cran_ packages in the cluster specification. - -```json -"rPackages": { - "cran": ["package1", "package2", "..."], - "github": [], - "bioconductor": [] - } -``` - -### GitHub - -GitHub packages can be insatlled on the cluster by adding them to the collection of _github_ packages in the cluster specification. - -```json -"rPackages": { - "cran": [], - "github": ["repo1/name1", "repo1/name2", "repo2/name1", "..."], - "bioconductor": [] - } -``` - -**NOTE** When using packages from a private GitHub repository, you must add your GitHub authentication token to your credentials.json file. - -### BioConductor - -Installing bioconductor packages is now supported via the cluster configuration. Simply add the list of packages you want to have installed in the cluster configuration file and they will get automatically applied - -```json -"rPackages": { - "cran": [], - "github": [], - "bioconductor": ["IRanges", "GenomeInofDb"] - } -``` - -**IMPORTANT** doAzureParallel uses the rocker/tidyverse Docker images by default, which comes with BioConductor pre-installed. If you use a different container image, make sure that bioconductor is installed on it. - - -## Foreach level packages - -Foreach level packages currently only support CRAN packages. Unlike cluster level pacakges, when specifying packages on the foreach loop, packages will be automatically installed _and loaded_ for use. - -### CRAN - -```R -foreach(i = 1:4, .packages = c("xml2")) %dopar% { - # xml2 is automatically loaded an can be used without calling library(xml2) - xml2::as_list(...) -} -``` +Please see documentation[(link)](../../docs/20-package-management.md) for more details on packagement management. \ No newline at end of file From be7efa5b549090d65f31e132cdd1477c547fc5b8 Mon Sep 17 00:00:00 2001 From: unknown Date: Mon, 12 Mar 2018 20:33:30 +0000 Subject: [PATCH 2/4] merge packagement docs --- docs/20-package-management.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/docs/20-package-management.md b/docs/20-package-management.md index 8ad43401..c25c06fe 100644 --- a/docs/20-package-management.md +++ b/docs/20-package-management.md @@ -4,7 +4,19 @@ The doAzureParallel package allows you to install packages to your pool in two w - Installing on pool creation - Installing per-*foreach* loop +Packages installed at the pool level benefit from only needing to be installed once per node. Each iteration of the foreach can load the library without needing to install them again. Packages installed in the foreach benefit from specifying any specific dependencies required only for that instance of the loop. + ## Installing Packages on Pool Creation + +Pool level packages support CRAN, GitHub and BioConductor packages. The packages are installed in a shared directory on the node. It is important to note that it is required to explicitly load any packages installed at the cluster level within the foreach loop. For example, if you installed xml2 on the cluster, you must explicityly load it before using it. + +```R +foreach (i = 1:4) %dopar% { + # Load the libraries you want to use. + library(xml2) + xml2::as_list(...) +} +``` You can install packages by specifying the package(s) in your JSON pool configuration file. This will then install the specified packages at the time of pool creation. ```R From d12b5dbaf455ed17eb6ecfdb72d1b875236d2141 Mon Sep 17 00:00:00 2001 From: zfengms Date: Tue, 13 Mar 2018 17:02:01 -0700 Subject: [PATCH 3/4] address review feedback (#232) * address review feedback * add reference to github and bioconductor packages in worker * update package management sample --- docs/20-package-management.md | 2 +- inst/startup/worker.R | 8 ++++++++ samples/package_management/README.md | 3 --- samples/package_management/bioconductor.r | 16 ++++++++++++---- 4 files changed, 21 insertions(+), 8 deletions(-) delete mode 100644 samples/package_management/README.md diff --git a/docs/20-package-management.md b/docs/20-package-management.md index c25c06fe..70f7d45c 100644 --- a/docs/20-package-management.md +++ b/docs/20-package-management.md @@ -8,7 +8,7 @@ Packages installed at the pool level benefit from only needing to be installed o ## Installing Packages on Pool Creation -Pool level packages support CRAN, GitHub and BioConductor packages. The packages are installed in a shared directory on the node. It is important to note that it is required to explicitly load any packages installed at the cluster level within the foreach loop. For example, if you installed xml2 on the cluster, you must explicityly load it before using it. +Pool level packages support CRAN, GitHub and BioConductor packages. The packages are installed in a shared directory on the node. It is important to note that it is required to explicitly load any packages installed at the cluster level within the foreach loop. For example, if you installed xml2 on the cluster, you must explicitly load it before using it. ```R foreach (i = 1:4) %dopar% { diff --git a/inst/startup/worker.R b/inst/startup/worker.R index cabe1ccb..d3726b72 100644 --- a/inst/startup/worker.R +++ b/inst/startup/worker.R @@ -83,6 +83,14 @@ for (package in azbatchenv$packages) { library(package, character.only = TRUE) } +for (package in azbatchenv$github) { + library(package, character.only = TRUE) +} + +for (package in azbatchenv$bioconductor) { + library(package, character.only = TRUE) +} + ls(azbatchenv) parent.env(azbatchenv$exportenv) <- getparentenv(azbatchenv$pkgName) diff --git a/samples/package_management/README.md b/samples/package_management/README.md deleted file mode 100644 index fe05ea91..00000000 --- a/samples/package_management/README.md +++ /dev/null @@ -1,3 +0,0 @@ -# Using package management - -Please see documentation[(link)](../../docs/20-package-management.md) for more details on packagement management. \ No newline at end of file diff --git a/samples/package_management/bioconductor.r b/samples/package_management/bioconductor.r index f364ef6a..d74fe422 100755 --- a/samples/package_management/bioconductor.r +++ b/samples/package_management/bioconductor.r @@ -1,3 +1,5 @@ +#Please see documentation at docs/20-package-management.md for more details on packagement management. + # install packages library(devtools) install_github("azure/doazureparallel") @@ -6,16 +8,16 @@ install_github("azure/doazureparallel") library(doAzureParallel) # set your credentials -setCredentials("credentials.json") +doAzureParallel::setCredentials("credentials.json") # Create your cluster if not exist -cluster <- makeCluster("bioconductor_cluster.json") +cluster <- doAzureParallel::makeCluster("bioconductor_cluster.json") # register your parallel backend -registerDoAzureParallel(cluster) +doAzureParallel::registerDoAzureParallel(cluster) # check that your workers are up -getDoParWorkers() +doAzureParallel::getDoParWorkers() summary <- foreach(i = 1:1) %dopar% { library(GenomeInfoDb) # Already installed as part of the cluster configuration @@ -23,7 +25,13 @@ summary <- foreach(i = 1:1) %dopar% { sessionInfo() # Your algorithm +} + +summary +summary <- foreach(i = 1:1, bioconductor=c('GenomeInfoDb', 'IRanges')) %dopar% { + sessionInfo() + # Your algorithm } summary From da656e85e909456e4a3d2505f955c8503347ab05 Mon Sep 17 00:00:00 2001 From: zfengms Date: Thu, 15 Mar 2018 12:18:41 -0700 Subject: [PATCH 4/4] update package management doc (#233) * address review feedback * add reference to github and bioconductor packages in worker * update package management sample * update package management doc * remove cluster.json * remove package installation --- docs/20-package-management.md | 11 +++++++++-- samples/package_management/bioconductor.r | 4 ---- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/docs/20-package-management.md b/docs/20-package-management.md index 70f7d45c..43a54dff 100644 --- a/docs/20-package-management.md +++ b/docs/20-package-management.md @@ -4,11 +4,11 @@ The doAzureParallel package allows you to install packages to your pool in two w - Installing on pool creation - Installing per-*foreach* loop -Packages installed at the pool level benefit from only needing to be installed once per node. Each iteration of the foreach can load the library without needing to install them again. Packages installed in the foreach benefit from specifying any specific dependencies required only for that instance of the loop. +Packages installed at the pool level benefit from only needing to be installed once per node. Each iteration of the foreach can load the library without needing to install them again. Packages installed in the foreach benefit from specifying any dependencies required only for that instance of the loop. ## Installing Packages on Pool Creation -Pool level packages support CRAN, GitHub and BioConductor packages. The packages are installed in a shared directory on the node. It is important to note that it is required to explicitly load any packages installed at the cluster level within the foreach loop. For example, if you installed xml2 on the cluster, you must explicitly load it before using it. +Pool level packages support CRAN, GitHub and BioConductor packages. The packages are installed in a shared directory on the node. It is important to note that it is required to add it to .packages parameter (or github or bioconductor for github or bioconductor packages), or explicitly load any packages installed at the pool level within the foreach loop. For example, if you installed xml2 on the cluster, you must explicitly load it or add it to .packages before using it. ```R foreach (i = 1:4) %dopar% { @@ -17,6 +17,13 @@ foreach (i = 1:4) %dopar% { xml2::as_list(...) } ``` +or +```R +foreach (i = 1:4, .packages=c('xml2')) %dopar% { + xml2::as_list(...) +} +``` + You can install packages by specifying the package(s) in your JSON pool configuration file. This will then install the specified packages at the time of pool creation. ```R diff --git a/samples/package_management/bioconductor.r b/samples/package_management/bioconductor.r index d74fe422..a5074fdf 100755 --- a/samples/package_management/bioconductor.r +++ b/samples/package_management/bioconductor.r @@ -1,9 +1,5 @@ #Please see documentation at docs/20-package-management.md for more details on packagement management. -# install packages -library(devtools) -install_github("azure/doazureparallel") - # import the doAzureParallel library and its dependencies library(doAzureParallel)