-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WISH: Allow for mc.cores=0 in parallel::mclapply() and friends #7
Comments
I agree the documentation and behavior is confusing. When run with Line 117 in mclapply(): if (cores < 2L) return(lapply(X = X, FUN = FUN, ...)) i.e. same behavior you noted for
So, when But like I said, that is confusing and also inconsistent. It makes no sense to fork one additional process and have the main process simply wait. The idea of the main process simply waiting I guess is the rational behind using |
Thanks for the comments @ilarischeinin and for spotting the I've also fixed a typo in my code suggestion: It should be About the number of active versus computing processes: Yes, it's true that the main process shouldn't consume that much of the CPU since it is "just" spawning and polling for results. Having said this, I can imagine that there are obnoxiously strict compute clusters that would kick you out if you ran one process more than you requested (regardless of it's CPU usage). Moreover, this is holds for |
Background
From
help("options")
we have thatmc.cores
is default as:From this definition, I would interpret
mc.cores
= 0, 1, 2 to mean:mc.cores = 0
: Only the main R process may run.mc.cores = 1
: The main R process plus one more forked process may run.mc.cores = 2
: The main R process plus two more forked processes may run.Comment: This means that from a computational point of view it makes little sense to use
mc.cores = 1
iff you're usingparallel::mclapply()
and friends, because it forks off a single R processing (with the main process only polling/waiting for it to finish) and performs the same calculation that you could have done in the main R process alone. In this sense,mc.cores = 1
could effectively be doing/implemented the same asmc.cores = 0
. (However, you could imagine implementations that are making full use of exactly two R processes. This is for instance possible to do using the future package.)On compute clusters with schedulers such as PBS and Slurm, you submit jobs and request the number of cores you would need. If you request a single-core process, it makes sense to do all calculations in the main R process. Thus, we should really use
mc.cores = 0
whenever allocated single-core R sessions. If we usemc.cores = 1
, we are actually consuming two processes.Problem
Currently,
mc.cores = 0
gives an error when used byparallel::mclapply()
and friends , e.g.This means that in order to write code that is agile to cluster settings and work with any number of allocated cores, we need to tedious coding such as:
It is clear that not everyone is aware that
mc.cores
specifies additional R process. For instance, it is not uncommon to seemc.cores = detectCores()
where the developer probably intendedmc.cores = detectCores() - 1
. (PS. It is not really a good thing to usedetectCores()
this way, cf. the help).Wish / Suggestion
Add support for
mc.cores = 0
bymclapply()
and friends in the parallel package. Specifically:mclapply()
and friends to allow formc.cores = 0
, which should fall back tolapply()
or similarly. Actually,mc.cores = 1
could do the same thing.mc.cores = 0
is a properly fine setting.mc.cores
is a missing value.mclapply()
on Windows to have argumentmc.cores = 0
and notmc.cores = 1
as done currently.Details
The current implementation of parallel::mclapply() already falls back to using
base::lapply()
whenever is called by a multicore child process and recursive multicore processing is not explicitly enabled;Thus, it would take very little to extend it to also support
mc.cores = 0
, e.g.We may even want to use
if (mc.cores <= 1 || ...)
as suggested above.UPDATE 2016-02-04: As @ilarischeinin points in his comment below, with
mclapply(..., mc.preschedule=TRUE)
(the default), a bit further down in the code it actually already says:Thus, it's clear that here the developer has had similar thoughts.
Continuing, On R for Windows, which does not support multicore processing / forking of processes,
mclapply()
falls back to callinglapply()
;It would not be hard to update this one accordingly, i.e.
Interestingly, looking at
parallel::pvec()
, we can see that the developer also thinks it is unnecessary to fork off a process ifmc.cores = 1
(see also paragraph onmclapply(..., mc.preschedule=TRUE)
above);Thus, also here it is easy to update to support
mc.cores = 0
.The text was updated successfully, but these errors were encountered: