Skip to content

R parallel Errors

Tobias Kind edited this page Aug 16, 2017 · 84 revisions

That is a loose collection of errors thrown from multiple R packages and potential solutions. These errors were collected over the range of, let me double check, 4 weeks.

R has stopped working

BTW, big lie, "WINDOWS" never notified me when there was a solution available. (Assumption one, there was never a solution available; Assumption two, Windows lied aka here you go pig that will do; Assumption three I assume that.. who cares).

Reality check, this has not much to do with Windows itself (besides the DLL hell) but rather sloppy programming and missing unit, version and dependency testing of R packages by multiple authors. The R dependency hell is especially unfortunate. Each package that depends on multiple other packages is doomed to fail, if one of the weakest links is failing. The R versioning hell adds another dimension, because updated R packages (good) may increase error rates due to incompatibility in older versions. In principle that only could be solved by authoritative code review and testing of all (in-)dependent functions.

The dilemma is of course: do we want fireworks of new and exciting functions, quickly and widely available? Yes! On the other hand this will lead to incompatibilities, errors, crashes, versioning hell, dependency hell and other issues because of untested code. So here we go, a loose collection of a few R parallel errors.


R engine killed or unresponsive

In case of the snow() and other parallel libraries make sure to kill the Rscript.exe and conhost.exe zombie processes too, once the Rgui was crashed. If for example the Rgui.exe is killed, the associated (independent) Rscript.exe processes are still alive as ghost processes and may eat CPU or your valuable memory. Under Windows invoke the task manager by pressing Ctrl-Alt-Del or calling taskmgr.exe in the run box. This error also happens if the stopCluster(cl) command is not invoked after a parallel session.

rscript-zombies-under-windows

Solution: In case of more than 60 threads to be removed use "taskkill /IM rscript.exe" under the command line or download the PSTools with PSKill.


RMPI

Error in mpi.comm.spawn(slave = mpitask, slavearg = args, nslaves = count,  : 
  Other MPI error, error stack:
MPI_Comm_spawn(cmd="C:/RRO/R-3.2.1/bin/Rscript.exe", argv=0x0000000006DE2088, maxprocs=3, MPI_INFO_NULL, root=0, MPI_COMM_SELF, intercomm=0x000000001149C2F0, errors=0x0000000012067D68) failed
Function not implemented

Solution: None, the RMPI spawn function is not implemented for Windows. See RMPI code

In special case of xcms, uninstall Rmpi and snow will be used.


SNOW and doSNOW - all connections are in use

This happens when any parallel library such as snow is invoked with 128 threads under Windows. Error output is different from R to R version.

*  Source: R/src/main/connections.c
*  R : A Computer Language for Statistical Data Analysis
   #define NCONNECTIONS 128 /* snow needs one per slave node */

That means any local cluster will never be able to exceed 127 nodes, unless R is recompiled. The error can be seen below

Starting snow cluster with 128 local sockets.
Error in file(con, "w") : all connections are in use
>   sessionInfo()
Error in gzfile(file, "rb") : all connections are in use

Solution: Recompile R or PRO and/or ask for fix. Its an R limitation for max 127 nodes (2015). This is not a Windows threading limitation, however can be also invoked by a memory limitation, because each rscript.exe (+conhost.exe) require at least 44 MByte RAM each. For a local Windows snow cluster with 99 clients 4 Gbyte RAM are required.


lib is not writable under Windows

> install.packages("doParallel")
Warning in install.packages("doParallel") :
  'lib = "C:/Program Files/R/R-3.2.2/library"' is not writable
Error in install.packages("doParallel") : unable to install packages

Solution: Packages under Windows 7, Win8, Win10 need to be installed into a user directory (C:\R\R-3.2.2), or R needs to be started as Administrator (in start menu or Explorer use mouse right-click, then run as administrator).


'mc.cores' > 1 is not supported on Windows under mclapply

A command such as mclapply "mc.cores=8" will not run under Windows. That includes all forking commands from the library(parallel), these are library(multicore) solutions that are not supported under Windows.

system.time({out=mclapply(X=1:n,FUN = f, mc.cores=8)
out = unlist(out)})
Error in mclapply(X = 1:n, FUN = f, mc.cores = 8) : 
  'mc.cores' > 1 is not supported on Windows
Timing stopped at: 0 0 0 

Solution: See mclapply-WIN-hack
See Source on Github


object 'mcfork' not found under Windows

Explained in the mcfork {parallel} documentation. These are low-level functions, not available on Windows, and not exported from the namespace. Question is, why deploy it in the first place when it is not working under Windows? These errors really go on my nerves. Basically every second code snippet for R parallel does not work under Windows. Reminds me of the DLL hell under Windows. Now its R dependency hell and R library hell for the poor Windows guys (sob).

# example code not working under Windows
+   if (inherits(parallel:::mcfork(), "masterProcess")) {
Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : 
  object 'mcfork' not found

Solution: Do not use Windows, just kidding, no seriously, just kidding. See also example code that will not run under Windows


executing %dopar% sequentially: no parallel backend registered

Warning message:
> rf <- foreach(ntree=rep(250, 4), .combine=combine, .packages='randomForest') %dopar%
+ randomForest(x, y, ntree=ntree)
Warning message:
executing %dopar% sequentially: no parallel backend registered 

Solution: Register a parallel backend using library(doParallel) or library(parallel)

### Load parallel libraries (doMc or snow also work)
library(doParallel)
library(foreach)
### Register parallel backend
cl <- makeCluster(detectCores())
registerDoParallel(cl)
getDoParWorkers()
#insert code here
### Stop cluster
stopCluster(cl)

foreach - Error in { : task 1 failed - could not find function

Solution: the packages with the failing functions need to be distributed to all cluster nodes via the ".packages" switch (Example: .packages='randomForest'). 👍


doParallel - socket not established

This is a very funny error, it is really reproducible, at least I wasted my time on it. It basically occurs if doSNOW and doParallel are freshly installed after one another and code examples for both are run sequentially using the "cl" variable as cluster object (class(cl) = [1] "SOCKcluster" "cluster"). If that object can not be overwritten, it will create the subsequent error. A classical cluster-fuck for lazy typed languages, if I may say so. I blame the user.

> library(doParallel);
Loading required package: nws
Error in make.socket(serverHost, port) : socket not established
In addition: Warning message:
library(doParallel); library(doSNOW);
{	  # doParallel
	  cl <- makeCluster(detectCores()); registerDoParallel(cl);
	  getDoParWorkers(); stopCluster(cl); 
	  
	  # let it snow (doSNOW)
	  cl <- makeCluster(32,type="SOCK")
	  stopCluster(cl)
} 
  	  
{         
	 # let it snow first (doSNOW)
	 cl <- makeCluster(32,type="SOCK"); stopCluster(cl)

	 # doParallel
	 cl <- makeCluster(detectCores()); registerDoParallel(cl);
	 getDoParWorkers(); stopCluster(cl); registerDoSEQ()
} 

Solution: Use registerDoSEQ() Do NOT mix variables. Do NOT randomly copy/paste code examples (like my own code snippets). Clean up after you(r) code in R. Remove all your unused objects with remove(x) or here remove(cl) (does not help here).


R MSMPI.dll is missing

This error occurs under the normal R 3.2.1. installation, when mixing doParallel and doSNOW code.

msmpi-dll-missing

Loading required package: Rmpi
Error : .onLoad failed in loadNamespace() for 'Rmpi', details:
  call: inDL(x, as.logical(local), as.logical(now), ...)
  error: unable to load shared object 'z:/R/R-3.2.1/library/Rmpi/libs/x64/Rmpi.dll':
  LoadLibrary failure:  The specified module could not be found.

In addition: Warning message:
packageRmpiwas built under R version 3.2.2 
Error in makeMPIcluster(spec, ...) : 
  the `Rmpi' package is needed for MPI clusters.

Solution:: Use Revolution PRO R Engine (I guess). Or install RMPI correctly.


Snow cluster: nodes produced errors; first error: object 'x' not found

This is a common error, assume the cluster node has no imprints; its an innocent infant, no constants, no variables, no knowledge. One needs to tell that node everything. The proper function is clusterExport().

> loopMax = 100
> cl <- makeCluster(4,type="SOCK")
> tm1 <- snow.time(clusterCall(cl, loopMax, function(x) for (i in 1:loopMax) sum(x), x))
Error in checkForRemoteErrors(lapply(cl, recvResult)) : 
  4 nodes produced errors; first error: could not find function "fun"

Error in checkForRemoteErrors(lapply(cl, recvResult)) : 
  4 nodes produced errors; first error: object 'loopMax' not found

See below the working solution with the constant LoopMax exported by using the function clusterExport(cl,"loopMax").

# Use of clusterExport to copy variables and functions to each node.
library(doSNOW) # make sure library(doSNOW) is installed
cl <- makeCluster(4,type="SOCK"); x <- rnorm(1000000); loopMax = 100

# Make sure all nodes know all variables and functions
clusterExport(cl,"loopMax");

# Run some code on a 4 socket snow cluster
tm1 <- snow.time(clusterCall(cl, function(x) for (i in 1:loopMax) sum(x), x))
print(tm1); plot(tm1); stopCluster(cl);

Solution:: Variables and constants must be exported to each node, before doing any calculation. Under library(snow) use function clusterExport(cl, "var1", var2", "...") all included in quotation marks (weird).


doSNOW: Error in make.socket(serverHost, port) : socket not established

This is an oldie but no goldie. It comes from using libary(doSNOW) with incorrect parameters used under Windows.

# not correct, forgot to define type under WINDOWS
>library(doSNOW)
>cl <- makeCluster(8)
Loading required package: nws
Error in make.socket(serverHost, port) : socket not established

# correct solution define (type="SOCK")
>library(doSNOW)
>cl <- makeCluster(8,type="SOCK")

# correct solution use parallel()
>library(parallel)
>cl <- makeCluster(8)

Solution:: Under WINDOWS define the socket type cl <- makeCluster(8,type="SOCK") or use library(parallel).


Error messages under R and PRO (RevolutionAnalytics R) are different

Error messages under R and PRO can be different. I always hesitated to use PRO when it came out first, thinking it was a different R, well turns out it is a bit different. Most error messages are the same except for those parallel() issues. Now I am mosty using PRO, because it can be 100x fold faster one 16 core CPU.

# Error under R3.2.1 (WIN 64 bit) + Windows pop-up (MSMPI.dll is missing)
> library(doSNOW)
> cl <- makeCluster(8)
Loading required package: Rmpi
Error : .onLoad failed in loadNamespace() for 'Rmpi', details:
  call: inDL(x, as.logical(local), as.logical(now), ...)
  error: unable to load shared object 'D:/mathematics/R/R-3.2.1/library/Rmpi/libs/x64/Rmpi.dll':
  LoadLibrary failure:  The specified module could not be found.

In addition: Warning message:
packageRmpiwas built under R version 3.2.2 
Error in makeMPIcluster(spec, ...) : 
  the `Rmpi' package is needed for MPI clusters.

#-------------------------------------------------
# Error under PRO 3.2.1 (WIN 64 bit)
> library(doSNOW)
> cl <- makeCluster(8)
Loading required package: nws
Error in make.socket(serverHost, port) : socket not established
In addition: Warning message:
package ‘nws’ was built under R version 3.2.2 

Solution:: See one above error, under WINDOWS define the socket type cl <- makeCluster(8,type="SOCK") or use library(parallel).


Error in stopNode(n)

This happens when code from library(doSNOW) and library(parallel) is mixed.

> #stop it
> stopCluster(cluster)
Error in for (n in cl) stopNode(n) : invalid for() loop sequence
> 
> # really stop it
> stopCluster(cl)

Solution:: stopCluster(cl)


Error Error in defaultCluster(cl) : object 'cl' not found

This error occours with the libary(parallel) if a cluster was created, the cl object exists and then another unreasonably large cluster is created (number of connections is hardcoded = 128 in R/src/main/connections.c).

library(parallel)
cl <- makeCluster(1); cl;
stopCluster(cl)

library(parallel)
cl <- makeCluster(8*64);  
stopCluster(cl)

Solution:: Do not crated parallel socket clusters > 128 nodes in R (2015). Nope. There is no reason to own clusters with more than 128 threads, also I see no use for hard disks with more than 4 TByte and computers with more than 4 CPUs..

Yeah, I hear you. I know, there is no Samsung SSD with 16 TByte (2015), there is no Windows Server OS that supports more than 640 logical processors and certainly no CPU with more than 200 threads plus no R support.


Error Error in summary.connection(connection) : invalid connection

This is related to the doSNOW parallel wrapper function, which is the Foreach parallel function for snow. The error occurs for example after a caret training run when the cluster is initiated and then stopped, and then initiated and stopped again.

socket cluster with 32 nodes on hostlocalhostmlMethod <- train(diagnosis ~ ., data=training, method="rf"))
Error in summary.connection(connection) : invalid connection

can be resolved by using the Install-doSNOW-parallel-DeLuxe.R installation script.

# Make sure to stop cluster *plus* insert serial backend
# stop cluster and remove clients
stopCluster(cluster); print("Cluster stopped.")

# insert serial backend, otherwise error in repetetive tasks
registerDoSEQ()

Solution:: Use registerDoSEQ() after stopping the cluster. The correct solution would be to define this function in the doSNOW wrapper package. At least that should be mentioned in the FAQ.


Error 857

Solution::


Error 858

Solution::


Error 859

Solution::


R has stopped working

R has stopped working

Solution: Restart. You are doomed. People used to blame Windows for bugs etc, turned out the program writers and programmers of independent libraries and programs caused most errors.