Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WISH: Rscript -p <n> to specify number of parallel R processes #14

Open
HenrikBengtsson opened this issue Mar 5, 2016 · 0 comments
Open

Comments

@HenrikBengtsson
Copy link
Owner

HenrikBengtsson commented Mar 5, 2016

Wish

Add command-line option to R and Rscript for specifying number of parallel processes to use/allocate. Although it is possible for individual packages to define this themselves, it would be handy to have one standard option for this across the board. In R itself, we already have two related options with different names for this.

Suggested names for options (same for R):

Rscript -p <n>
Rscript --processes=<n>
Rscript --max-processes=<n>
Rscript --cores=<n>
Rscript --max-cores=<n>

Alternatives? What do other software use? Here are some example:

make -j <n> / make --jobs=<n>
parallel -j <n>
xargs -P <n> / xargs --max-procs=<n>
julia -p, --procs {N|auto}

Examples:

  • Rscript foo.R (default; equivalent to Rscript -p $R_PROCESSES foo.R)
  • Rscript -p 1 foo.R - R uses a single process.
  • Rscript -p 2 foo.R - R uses two processes; the main process plus one more, cf. options(mc.cores=1L).
  • Rscript -p 3 foo.R - R uses three processes; the main process plus two more, cf. options(mc.cores=2L).

Also, we might want an analogous environment variable, e.g. R_PROCESSES.

What should it do?

Specifying this command line option should set related R options, e.g.

  • options(mc.cores=n-1) - Number of additional cores/processes used by the parallel::mclapply() and friends.
  • options(Ncpus=n) (or n-1?) - Number of processes used by install.packages() to install packages in parallel. HB: Is this including the main R processes or additional ones?

Additional comments

Note that mc.cores as defined by the parallel package specifies additional R processes. In other words, the total number of processes used is one more than mc.cores. Then why change definition from additional (n-1) to total number of processes/cores (n)? Because specifying additional cores is confusing and also not known to many. One proof of this is that you see examples using options(mc.cores=detectCores()) when they really meant options(mc.cores=detectCores()-1). More importantly, mc.cores is really defined when using parallel::mclapply() and friends which calls a function on additional set of process and uses the main R process to wait/poll for results. However, you can imagine other implementations that uses also the main process for full processing (and poll only occasionally). This is for instance supported by the future package.

Related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants
@HenrikBengtsson and others