-
Notifications
You must be signed in to change notification settings - Fork 14
R parallel Errors
That is a loose collection of errors thrown from multiple R packages and potential solutions. These errors were collected over the range of, let me double check, 4 weeks.
BTW, big lie, "WINDOWS" never notified me when there was a solution available. (Assumption one, there was never a solution available; Assumption two, Windows lied aka here you go pig that will do; Assumption three I assume that.. who cares).
Reality check, this has not much to do with Windows itself (besides the DLL hell) but rather sloppy programming and missing unit, version and dependency testing of R packages by multiple authors. The R dependency hell is especially unfortunate. Each package that depends on multiple other packages is doomed to fail, if one of the weakest links is failing. The R versioning hell adds another dimension, because updated R packages (good) may increase error rates due to incompatibility in older versions. In principle that only could be solved by authoritative code review and testing of all (in-)dependent functions.
The dilemma is of course: do we want fireworks of new and exciting functions, quickly and widely available? Yes! On the other hand this will lead to incompatibilities, errors, crashes, versioning hell, dependency hell and other issues because of untested code. So here we go, a loose collection of a few R parallel errors.
In case of the snow() and other parallel libraries make sure to kill the Rscript.exe and conhost.exe zombie processes too, once the Rgui was crashed. If for example the Rgui.exe is killed, the associated (independent) Rscript.exe processes are still alive as ghost processes and may eat CPU or your valuable memory. Under Windows invoke the task manager by pressing Ctrl-Alt-Del or calling taskmgr.exe in the run box. This error also happens if the stopCluster(cl) command is not invoked after a parallel session.
Solution: In case of more than 60 threads to be removed use "taskkill /IM rscript.exe" under the command line or download the PSTools with PSKill.
Error in mpi.comm.spawn(slave = mpitask, slavearg = args, nslaves = count, :
Other MPI error, error stack:
MPI_Comm_spawn(cmd="C:/RRO/R-3.2.1/bin/Rscript.exe", argv=0x0000000006DE2088, maxprocs=3, MPI_INFO_NULL, root=0, MPI_COMM_SELF, intercomm=0x000000001149C2F0, errors=0x0000000012067D68) failed
Function not implemented
Solution: None, the RMPI spawn function is not implemented for Windows. See RMPI code
In special case of xcms, uninstall Rmpi and snow will be used.
This happens when any parallel library such as snow is invoked with 128 threads under Windows. Error output is different from R to R version.
* Source: R/src/main/connections.c
* R : A Computer Language for Statistical Data Analysis
#define NCONNECTIONS 128 /* snow needs one per slave node */
That means any local cluster will never be able to exceed 127 nodes, unless R is recompiled. The error can be seen below
Starting snow cluster with 128 local sockets.
Error in file(con, "w") : all connections are in use
> sessionInfo()
Error in gzfile(file, "rb") : all connections are in use
Solution: Recompile R or PRO and/or ask for fix. Its an R limitation for max 127 nodes (2015). This is not a Windows threading limitation, however can be also invoked by a memory limitation, because each rscript.exe (+conhost.exe) require at least 44 MByte RAM each. For a local Windows snow cluster with 99 clients 4 Gbyte RAM are required.
> install.packages("doParallel")
Warning in install.packages("doParallel") :
'lib = "C:/Program Files/R/R-3.2.2/library"' is not writable
Error in install.packages("doParallel") : unable to install packages
Solution: Packages under Windows 7, Win8, Win10 need to be installed into a user directory (C:\R\R-3.2.2), or R needs to be started as Administrator (in start menu or Explorer use mouse right-click, then run as administrator).
A command such as mclapply "mc.cores=8" will not run under Windows. That includes all forking commands from the library(parallel), these are library(multicore) solutions that are not supported under Windows.
system.time({out=mclapply(X=1:n,FUN = f, mc.cores=8)
out = unlist(out)})
Error in mclapply(X = 1:n, FUN = f, mc.cores = 8) :
'mc.cores' > 1 is not supported on Windows
Timing stopped at: 0 0 0
Solution:
See mclapply-WIN-hack
See Source on Github
Explained in the mcfork {parallel} documentation. These are low-level functions, not available on Windows, and not exported from the namespace. Question is, why deploy it in the first place when it is not working under Windows? These errors really go on my nerves. Basically every second code snippet for R parallel does not work under Windows. Reminds me of the DLL hell under Windows. Now its R dependency hell and R library hell for the poor Windows guys (sob).
# example code not working under Windows
+ if (inherits(parallel:::mcfork(), "masterProcess")) {
Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
object 'mcfork' not found
Solution: Do not use Windows, just kidding, no seriously, just kidding. See also example code that will not run under Windows
Warning message:
> rf <- foreach(ntree=rep(250, 4), .combine=combine, .packages='randomForest') %dopar%
+ randomForest(x, y, ntree=ntree)
Warning message:
executing %dopar% sequentially: no parallel backend registered
Solution: Register a parallel backend using library(doParallel) or library(parallel)
### Load parallel libraries (doMc or snow also work)
library(doParallel)
library(foreach)
### Register parallel backend
cl <- makeCluster(detectCores())
registerDoParallel(cl)
getDoParWorkers()
#insert code here
### Stop cluster
stopCluster(cl)
Solution: the packages with the failing functions need to be distributed to all cluster nodes via the ".packages" switch (Example: .packages='randomForest'). 👍
This is a very funny error, it is really reproducible, at least I wasted my time on it. It basically occurs if doSNOW and doParallel are freshly installed after one another and code examples for both are run sequentially using the "cl" variable as cluster object (class(cl) = [1] "SOCKcluster" "cluster"). If that object can not be overwritten, it will create the subsequent error. A classical cluster-fuck for lazy typed languages, if I may say so. I blame the user.
> library(doParallel);
Loading required package: nws
Error in make.socket(serverHost, port) : socket not established
In addition: Warning message:
library(doParallel); library(doSNOW);
{ # doParallel
cl <- makeCluster(detectCores()); registerDoParallel(cl);
getDoParWorkers(); stopCluster(cl);
# let it snow (doSNOW)
cl <- makeCluster(32,type="SOCK")
stopCluster(cl)
}
{
# let it snow first (doSNOW)
cl <- makeCluster(32,type="SOCK"); stopCluster(cl)
# doParallel
cl <- makeCluster(detectCores()); registerDoParallel(cl);
getDoParWorkers(); stopCluster(cl); registerDoSEQ()
}
Solution: Use registerDoSEQ() Do NOT mix variables. Do NOT randomly copy/paste code examples (like my own code snippets). Clean up after you(r) code in R. Remove all your unused objects with remove(x) or here remove(cl) (does not help here).
This error occurs under the normal R 3.2.1. installation, when mixing doParallel and doSNOW code.
Loading required package: Rmpi
Error : .onLoad failed in loadNamespace() for 'Rmpi', details:
call: inDL(x, as.logical(local), as.logical(now), ...)
error: unable to load shared object 'z:/R/R-3.2.1/library/Rmpi/libs/x64/Rmpi.dll':
LoadLibrary failure: The specified module could not be found.
In addition: Warning message:
package ‘Rmpi’ was built under R version 3.2.2
Error in makeMPIcluster(spec, ...) :
the `Rmpi' package is needed for MPI clusters.
Solution:: Use Revolution PRO R Engine (I guess). Or install RMPI correctly.
This is a common error, assume the cluster node has no imprints; its an innocent infant, no constants, no variables, no knowledge. One needs to tell that node everything. The proper function is clusterExport().
> loopMax = 100
> cl <- makeCluster(4,type="SOCK")
> tm1 <- snow.time(clusterCall(cl, loopMax, function(x) for (i in 1:loopMax) sum(x), x))
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
4 nodes produced errors; first error: could not find function "fun"
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
4 nodes produced errors; first error: object 'loopMax' not found
See below the working solution with the constant LoopMax exported by using the function clusterExport(cl,"loopMax").
# Use of clusterExport to copy variables and functions to each node.
library(doSNOW) # make sure library(doSNOW) is installed
cl <- makeCluster(4,type="SOCK"); x <- rnorm(1000000); loopMax = 100
# Make sure all nodes know all variables and functions
clusterExport(cl,"loopMax");
# Run some code on a 4 socket snow cluster
tm1 <- snow.time(clusterCall(cl, function(x) for (i in 1:loopMax) sum(x), x))
print(tm1); plot(tm1); stopCluster(cl);
Solution:: Variables and constants must be exported to each node, before doing any calculation. Under library(snow) use function clusterExport(cl, "var1", var2", "...") all included in quotation marks (weird).
This is an oldie but no goldie. It comes from using libary(doSNOW) with incorrect parameters used under Windows.
# not correct, forgot to define type under WINDOWS
>library(doSNOW)
>cl <- makeCluster(8)
Loading required package: nws
Error in make.socket(serverHost, port) : socket not established
# correct solution define (type="SOCK")
>library(doSNOW)
>cl <- makeCluster(8,type="SOCK")
# correct solution use parallel()
>library(parallel)
>cl <- makeCluster(8)
Solution:: Under WINDOWS define the socket type cl <- makeCluster(8,type="SOCK") or use library(parallel).
Error messages under R and PRO can be different. I always hesitated to use PRO when it came out first, thinking it was a different R, well turns out it is a bit different. Most error messages are the same except for those parallel() issues. Now I am mosty using PRO, because it can be 100x fold faster one 16 core CPU.
# Error under R3.2.1 (WIN 64 bit) + Windows pop-up (MSMPI.dll is missing)
> library(doSNOW)
> cl <- makeCluster(8)
Loading required package: Rmpi
Error : .onLoad failed in loadNamespace() for 'Rmpi', details:
call: inDL(x, as.logical(local), as.logical(now), ...)
error: unable to load shared object 'D:/mathematics/R/R-3.2.1/library/Rmpi/libs/x64/Rmpi.dll':
LoadLibrary failure: The specified module could not be found.
In addition: Warning message:
package ‘Rmpi’ was built under R version 3.2.2
Error in makeMPIcluster(spec, ...) :
the `Rmpi' package is needed for MPI clusters.
#-------------------------------------------------
# Error under PRO 3.2.1 (WIN 64 bit)
> library(doSNOW)
> cl <- makeCluster(8)
Loading required package: nws
Error in make.socket(serverHost, port) : socket not established
In addition: Warning message:
package ‘nws’ was built under R version 3.2.2
Solution:: See one above error, under WINDOWS define the socket type cl <- makeCluster(8,type="SOCK") or use library(parallel).
This happens when code from library(doSNOW) and library(parallel) is mixed.
> #stop it
> stopCluster(cluster)
Error in for (n in cl) stopNode(n) : invalid for() loop sequence
>
> # really stop it
> stopCluster(cl)
Solution:: stopCluster(cl)
This error occours with the libary(parallel) if a cluster was created, the cl object exists and then another unreasonably large cluster is created (number of connections is hardcoded = 128 in R/src/main/connections.c).
library(parallel)
cl <- makeCluster(1); cl;
stopCluster(cl)
library(parallel)
cl <- makeCluster(8*64);
stopCluster(cl)
Solution:: Do not crated parallel socket clusters > 128 nodes in R (2015). Nope. There is no reason to own clusters with more than 128 threads, also I see no use for hard disks with more than 4 TByte and computers with more than 4 CPUs..
Yeah, I hear you. I know, there is no Samsung SSD with 16 TByte (2015), there is no Windows Server OS that supports more than 640 logical processors and certainly no CPU with more than 200 threads plus no R support.
This is related to the doSNOW parallel wrapper function, which is the Foreach parallel function for snow. The error occurs for example after a caret training run when the cluster is initiated and then stopped, and then initiated and stopped again.
socket cluster with 32 nodes on host ‘localhost’
mlMethod <- train(diagnosis ~ ., data=training, method="rf"))
Error in summary.connection(connection) : invalid connection
can be resolved by using the Install-doSNOW-parallel-DeLuxe.R installation script.
# Make sure to stop cluster *plus* insert serial backend
# stop cluster and remove clients
stopCluster(cluster); print("Cluster stopped.")
# insert serial backend, otherwise error in repetetive tasks
registerDoSEQ()
Solution:: Use registerDoSEQ() after stopping the cluster. The correct solution would be to define this function in the doSNOW wrapper package. At least that should be mentioned in the FAQ.
Solution::
Solution::
Solution::
Solution: Restart. You are doomed. People used to blame Windows for bugs etc, turned out the program writers and programmers of independent libraries and programs caused most errors.
- R-parallel Home
- R-parallel Overview
- R-parallel Setups
- R-parallel Benchmarks
- R-parallel Examples
- R-parallel Snippets
- R-parallel Errors
- R-parallel Links and blogs