You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 12, 2023. It is now read-only.
Copy file name to clipboardExpand all lines: docs/20-package-management.md
+48-23Lines changed: 48 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,29 +38,37 @@ You can install packages by specifying the package(s) in your JSON pool configur
38
38
}
39
39
```
40
40
41
+
## Installing Packages per-*foreach* Loop
42
+
43
+
You can also install cran packages by using the **.packages** option in the *foreach* loop. You can also install github/bioconductor packages by using the **github** and **bioconductor" option in the *foreach* loop. Instead of installing packages during pool creation, packages (and its dependencies) can be installed before each iteration in the loop is run on your Azure cluster.
44
+
45
+
### Installing a Github Package
46
+
47
+
doAzureParallel supports github package with the **github** option.
48
+
49
+
Please do not use "https://github.com/" as prefix for the github package name above.
50
+
41
51
## Installing packages from a private GitHub repository
42
52
43
-
Clusters can be configured to install packages from a private GitHub repository by setting the __githubAuthenticationToken__ property. If this property is blank only public repositories can be used. If a token is added then public and the private github repo can be used together.
53
+
Clusters can be configured to install packages from a private GitHub repository by setting the __githubAuthenticationToken__ property in the credentials file. If this property is blank only public repositories can be used. If a token is added then public and the private github repo can be used together.
44
54
45
55
When the cluster is created the token is passed in as an environment variable called GITHUB\_PAT on start-up which lasts the life of the cluster and is looked up whenever devtools::install_github is called.
46
56
57
+
Credentials File for github authentication token
58
+
```json
59
+
{
60
+
...
61
+
"githubAuthenticationToken": "",
62
+
...
63
+
}
64
+
65
+
```
66
+
67
+
Cluster File
47
68
```json
48
69
{
49
70
{
50
-
"name": <your pool name>,
51
-
"vmSize": <your pool VM size name>,
52
-
"maxTasksPerNode": <num tasks to allocate to each node>,
53
-
"poolSize": {
54
-
"dedicatedNodes": {
55
-
"min": 2,
56
-
"max": 2
57
-
},
58
-
"lowPriorityNodes": {
59
-
"min": 1,
60
-
"max": 10
61
-
},
62
-
"autoscaleFormula": "QUEUE"
63
-
},
71
+
...
64
72
"rPackages": {
65
73
"cran": [],
66
74
"github": ["<project/some_private_repository>"],
@@ -71,10 +79,18 @@ When the cluster is created the token is passed in as an environment variable ca
71
79
}
72
80
```
73
81
74
-
_More information regarding github authentication tokens can be found [here](https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/)_
82
+
_More information regarding github authentication tokens can be found [here](https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/)
75
83
76
-
## Installing Packages per-*foreach* Loop
77
-
You can also install cran packages by using the **.packages** option in the *foreach* loop. You can also install github/bioconductor packages by using the **github** and **bioconductor" option in the *foreach* loop. Instead of installing packages during pool creation, packages (and its dependencies) can be installed before each iteration in the loop is run on your Azure cluster.
The default deployment of R used in the cluster (see [Customizing the cluster](./30-customize-cluster.md) for more information) includes the Bioconductor installer by default. Simply add packages to the cluster by adding packages in the array.
119
134
120
135
```json
@@ -134,17 +149,27 @@ The default deployment of R used in the cluster (see [Customizing the cluster](.
134
149
},
135
150
"autoscaleFormula": "QUEUE"
136
151
},
152
+
"containerImage:""rocker/tidyverse:latest",
137
153
"rPackages": {
138
154
"cran": [],
139
155
"github": [],
140
156
"bioconductor": ["IRanges"]
141
157
},
142
-
"commandLine": []
158
+
"commandLine": [],
159
+
"subnetId": ""
143
160
}
144
161
}
145
162
```
146
163
147
-
Note: Container references that are not provided by tidyverse do not support Bioconductor installs. If you choose another container, you must make sure that Biocondunctor is installed.
164
+
Note: Container references that are not provided by tidyverse do not support Bioconductor installs. If you choose another container, you must make sure that Bioconductor is installed.
165
+
166
+
## Installing Custom Packages
167
+
doAzureParallel supports custom package installation in the cluster. Custom packages installation on the per-*foreach* loop level is not supported.
168
+
169
+
For steps on installing custom packages, it can be found [here](../samples/package_management/custom/README.md).
170
+
171
+
Note: If the package requires a compilation such as apt-get installations, users will be required
172
+
to build their own containers.
148
173
149
-
## Uninstalling packages
174
+
## Uninstalling a Package
150
175
Uninstalling packages from your pool is not supported. However, you may consider rebuilding your pool.
Copy file name to clipboardExpand all lines: samples/azure_files/readme.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,4 +12,4 @@ This samples shows how to update the cluster configuration to create a new mount
12
12
13
13
For large data sets or large traffic applications be sure to review the Azure Files [scalability and performance targets](https://docs.microsoft.com/en-us/azure/storage/common/storage-scalability-targets#scalability-targets-for-blobs-queues-tables-and-files).
14
14
15
-
For very large data sets we recommend using Azure Blobs. You can learn more in the [persistent storage](../../docs/23-persistent-storage.md) and [distrubuted data](../../docs/21-distributing-data.md) docs.
15
+
For very large data sets we recommend using Azure Blobs. You can learn more in the [persistent storage](../../docs/23-persistent-storage.md) and [distributing data](../../docs/21-distributing-data.md) docs.
doAzureParallel supports custom package installation in the cluster. Custom packages are R packages that cannot be hosted on Github or be built on a docker image. The recommended approach for custom packages is building them from source and uploading them to an Azure File Share.
3
+
4
+
Note: If the package requires a compilation such as apt-get installations, users will be required
5
+
to build their own containers.
6
+
7
+
### Building Package from Source in RStudio
8
+
1. Open *RStudio*
9
+
2. Go to *Build* on the navigation bar
10
+
3. Go to *Build From Source*
11
+
12
+
### Uploading Custom Package to Azure Files
13
+
For detailed steps on uploading files to Azure Files in the Portal can be found
1) In order to build the custom packages' dependencies, we need to untar the R packages and build them within their directories. By default, we will build custom packages in the *$AZ_BATCH_NODE_SHARED_DIR/tmp* directory.
18
+
2) By default, the custom package cluster configuration file will install any packages that are a *.tar.gz file in the file share. If users want to specify R packages, they must change this line in the cluster configuration file.
19
+
20
+
Finds files that end with *.tar.gz in the current Azure File Share directory
21
+
```json
22
+
{
23
+
...
24
+
"commandLine": [
25
+
...
26
+
"mkdir $AZ_BATCH_NODE_STARTUP_DIR/tmp | for i in `ls $AZ_BATCH_NODE_SHARED_DIR/data/*.tar.gz | awk '{print $NF}'`; do tar -xvf $i -C $AZ_BATCH_NODE_STARTUP_DIR/tmp; done",
27
+
...
28
+
]
29
+
}
30
+
```
31
+
3) For more information on using Azure Files on Batch, follow our other [sample](./azure_files/readme.md) of using Azure Files
32
+
4) Replace your Storage Account name, endpoint and key in the cluster configuration file
0 commit comments