Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

Commit

Permalink
Feature/pkgmgmtdoc (#231)
Browse files Browse the repository at this point in the history
* merge package management doc

* merge packagement docs

* address review feedback (#232)

* address review feedback

* add reference to github and bioconductor packages in worker

* update package management sample

* update package management doc (#233)

* address review feedback

* add reference to github and bioconductor packages in worker

* update package management sample

* update package management doc

* remove cluster.json

* remove package installation
  • Loading branch information
zfengms authored Apr 26, 2018
1 parent fa75afb commit 0fbfd4c
Show file tree
Hide file tree
Showing 4 changed files with 38 additions and 76 deletions.
19 changes: 19 additions & 0 deletions docs/20-package-management.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,26 @@ The doAzureParallel package allows you to install packages to your pool in two w
- Installing on pool creation
- Installing per-*foreach* loop

Packages installed at the pool level benefit from only needing to be installed once per node. Each iteration of the foreach can load the library without needing to install them again. Packages installed in the foreach benefit from specifying any dependencies required only for that instance of the loop.

## Installing Packages on Pool Creation

Pool level packages support CRAN, GitHub and BioConductor packages. The packages are installed in a shared directory on the node. It is important to note that it is required to add it to .packages parameter (or github or bioconductor for github or bioconductor packages), or explicitly load any packages installed at the pool level within the foreach loop. For example, if you installed xml2 on the cluster, you must explicitly load it or add it to .packages before using it.

```R
foreach (i = 1:4) %dopar% {
# Load the libraries you want to use.
library(xml2)
xml2::as_list(...)
}
```
or
```R
foreach (i = 1:4, .packages=c('xml2')) %dopar% {
xml2::as_list(...)
}
```

You can install packages by specifying the package(s) in your JSON pool configuration file. This will then install the specified packages at the time of pool creation.

```R
Expand Down
8 changes: 8 additions & 0 deletions inst/startup/worker.R
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,14 @@ for (package in azbatchenv$packages) {
library(package, character.only = TRUE)
}

for (package in azbatchenv$github) {
library(package, character.only = TRUE)
}

for (package in azbatchenv$bioconductor) {
library(package, character.only = TRUE)
}

ls(azbatchenv)
parent.env(azbatchenv$exportenv) <- getparentenv(azbatchenv$pkgName)

Expand Down
69 changes: 0 additions & 69 deletions samples/package_management/README.md

This file was deleted.

18 changes: 11 additions & 7 deletions samples/package_management/bioconductor.r
Original file line number Diff line number Diff line change
@@ -1,29 +1,33 @@
# install packages
library(devtools)
install_github("azure/doazureparallel")
#Please see documentation at docs/20-package-management.md for more details on packagement management.

# import the doAzureParallel library and its dependencies
library(doAzureParallel)

# set your credentials
setCredentials("credentials.json")
doAzureParallel::setCredentials("credentials.json")

# Create your cluster if not exist
cluster <- makeCluster("bioconductor_cluster.json")
cluster <- doAzureParallel::makeCluster("bioconductor_cluster.json")

# register your parallel backend
registerDoAzureParallel(cluster)
doAzureParallel::registerDoAzureParallel(cluster)

# check that your workers are up
getDoParWorkers()
doAzureParallel::getDoParWorkers()

summary <- foreach(i = 1:1) %dopar% {
library(GenomeInfoDb) # Already installed as part of the cluster configuration
library(IRanges) # Already installed as part of the cluster configuration

sessionInfo()
# Your algorithm
}

summary

summary <- foreach(i = 1:1, bioconductor=c('GenomeInfoDb', 'IRanges')) %dopar% {
sessionInfo()
# Your algorithm
}

summary

0 comments on commit 0fbfd4c

Please sign in to comment.