-
Notifications
You must be signed in to change notification settings - Fork 50
Merging Tasks #175
Description
Is there a doAzureParallel 'recommended' way to merge (very) large lists after analysis?
I have a loop of ~500,000 iterations running across a 16-node pool. The Azure processing time within the foreach loop is fast but merging a list that long takes a very long time - much longer than the time taken to carry out the analysis.
I have set a number of arguments which should improve the time taken to return the completed list but they've made very little difference (if any) to the total processing time. These arguments are:
.inorder=F - List order isn't important
.combine = function(...) rbindlist(list(...)) - Apparently rbindlist is faster than the default
I have copied my loop below. It's essentially trying to brute-force the start values for a non-linear model. It is running through all combinations of coefficients (in the object st1.1) until it finds a set of parameters which doesn't result in some sort of convergence error. Those iterations which fail should be removed (.errorhandling=c('remove')) leaving me with a list of potential start values out of the original 500,000.
mod1.1<-foreach(i = 1:nrow(st1.1), .errorhandling=c('remove'), .inorder=FALSE, .combine = function(...) rbindlist(list(...)), .options.azure = opt, .packages = c("nls2")) %dopar% {
nls(xx2~b0+b1*exp(b2*xx), data=xx.df, start = list(b0=st1.1[i,1],b1=st1.1[i,2],b2=st1.1[i,3]) )
}
This isn't technically an issue with the package but I have nowhere else to ask such questions in relation to doAzureParallel. Many thanks in advance.