Merging Tasks

Is there a doAzureParallel 'recommended' way to merge (very) large lists after analysis?

I have a loop of ~500,000 iterations running across a 16-node pool. The Azure processing time within the foreach loop is fast but merging a list that long takes a very long time - much longer than the time taken to carry out the analysis.

I have set a number of arguments which should improve the time taken to return the completed list but they've made very little difference (if any) to the total processing time. These arguments are:

 .inorder=F - List order isn't important
.combine = function(...) rbindlist(list(...)) - Apparently rbindlist is faster than the default

I have copied my loop below. It's essentially trying to brute-force the start values for a non-linear model. It is running through all combinations of coefficients (in the object st1.1) until it finds a set of parameters which doesn't result in some sort of convergence error. Those iterations which fail should be removed (.errorhandling=c('remove')) leaving me with a list of potential start values out of the original 500,000.

```
 mod1.1<-foreach(i = 1:nrow(st1.1), .errorhandling=c('remove'), .inorder=FALSE, .combine = function(...) rbindlist(list(...)), .options.azure = opt, .packages = c("nls2")) %dopar% {
   nls(xx2~b0+b1*exp(b2*xx), data=xx.df, start = list(b0=st1.1[i,1],b1=st1.1[i,2],b2=st1.1[i,3])  ) 
 }
```

This isn't technically an issue with the package but I have nowhere else to ask such questions in relation to doAzureParallel. Many thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merging Tasks #175

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Merging Tasks #175

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions