You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add command-line option to R and Rscript for specifying number of parallel processes to use/allocate. Although it is possible for individual packages to define this themselves, it would be handy to have one standard option for this across the board. In R itself, we already have two related options with different names for this.
Alternatives? What do other software use? Here are some example:
make -j <n> / make --jobs=<n>
parallel -j <n>
xargs -P <n> / xargs --max-procs=<n>
julia -p, --procs {N|auto}
Examples:
Rscript foo.R (default; equivalent to Rscript -p $R_PROCESSES foo.R)
Rscript -p 1 foo.R - R uses a single process.
Rscript -p 2 foo.R - R uses two processes; the main process plus one more, cf. options(mc.cores=1L).
Rscript -p 3 foo.R - R uses three processes; the main process plus two more, cf. options(mc.cores=2L).
Also, we might want an analogous environment variable, e.g. R_PROCESSES.
What should it do?
Specifying this command line option should set related R options, e.g.
options(mc.cores=n-1) - Number of additional cores/processes used by the parallel::mclapply() and friends.
options(Ncpus=n) (or n-1?) - Number of processes used by install.packages() to install packages in parallel. HB: Is this including the main R processes or additional ones?
Additional comments
Note that mc.cores as defined by the parallel package specifies additional R processes. In other words, the total number of processes used is one more than mc.cores. Then why change definition from additional (n-1) to total number of processes/cores (n)? Because specifying additional cores is confusing and also not known to many. One proof of this is that you see examples using options(mc.cores=detectCores()) when they really meant options(mc.cores=detectCores()-1). More importantly, mc.cores is really defined when using parallel::mclapply() and friends which calls a function on additional set of process and uses the main R process to wait/poll for results. However, you can imagine other implementations that uses also the main process for full processing (and poll only occasionally). This is for instance supported by the future package.
Wish
Add command-line option to
R
andRscript
for specifying number of parallel processes to use/allocate. Although it is possible for individual packages to define this themselves, it would be handy to have one standard option for this across the board. In R itself, we already have two related options with different names for this.Suggested names for options (same for
R
):Alternatives? What do other software use? Here are some example:
Examples:
Rscript foo.R
(default; equivalent toRscript -p $R_PROCESSES foo.R
)Rscript -p 1 foo.R
- R uses a single process.Rscript -p 2 foo.R
- R uses two processes; the main process plus one more, cf.options(mc.cores=1L)
.Rscript -p 3 foo.R
- R uses three processes; the main process plus two more, cf.options(mc.cores=2L)
.Also, we might want an analogous environment variable, e.g.
R_PROCESSES
.What should it do?
Specifying this command line option should set related R options, e.g.
options(mc.cores=n-1)
- Number of additional cores/processes used by theparallel::mclapply()
and friends.options(Ncpus=n)
(orn-1
?) - Number of processes used byinstall.packages()
to install packages in parallel. HB: Is this including the main R processes or additional ones?Additional comments
Note that
mc.cores
as defined by the parallel package specifies additional R processes. In other words, the total number of processes used is one more thanmc.cores
. Then why change definition from additional (n-1
) to total number of processes/cores (n
)? Because specifying additional cores is confusing and also not known to many. One proof of this is that you see examples usingoptions(mc.cores=detectCores())
when they really meantoptions(mc.cores=detectCores()-1)
. More importantly,mc.cores
is really defined when usingparallel::mclapply()
and friends which calls a function on additional set of process and uses the main R process to wait/poll for results. However, you can imagine other implementations that uses also the main process for full processing (and poll only occasionally). This is for instance supported by the future package.Related
R-devel thread 'SUGGESTION: Environment variable R_MAX_MC_CORES for maximum number of cores', 2013-11-11(obsolete)The text was updated successfully, but these errors were encountered: