Feature request: parallel functions #9

kendonB · 2016-06-15T14:26:55Z

Currently, there's no nice way to get progress bars for parallel::*apply functions - the best I'm able to do on Windows is write to a .txt log file from each process, which is cumbersome.

psolymos · 2016-06-15T14:54:02Z

@kendonB: not sure about mclapply, but the par*apply functions split the workload and push that to the workers, so the main process is only idling thus no real progress to show. Right now I can't see an easy way of implementing the request, but I am open to suggestions.

kendonB · 2016-06-15T15:14:00Z

Perhaps a solution is through a text file on disk. The main process could periodically read the file and output something to the console. I have no idea how easy this might be and how deep into the parallel package functions you might have to go to get it to periodically monitor something.

kvnkuang · 2016-09-07T18:31:54Z

Hi there, I recently created a package to track the parallel apply functions (mc*apply). It's on CRAN now: https://cran.r-project.org/web/packages/pbmcapply/index.html.

psolymos · 2016-09-07T18:48:14Z

@kvnkuang : thanks for the note, it is great to see the package addressing the feature request for forking type parallelism. I consider #9 closed for now.

kendonB · 2016-09-07T19:07:00Z

I'd suggest keeping this open as it doesn't yet work for Windows

kvnkuang · 2016-09-07T19:10:27Z

Hey @kendonB, since forking is not supported on Windows, mc*apply will throw an error if you try to run it on Windows with num.cores > 1. So unfortunately the package cannot work on Windows as a result.

psolymos · 2016-09-07T19:43:00Z

Maybe a solution similar to parLapplyLB could be implemented by increased communication overhead among the workers. This could work on Windows (and other OS as well).

psolymos · 2016-09-08T16:16:29Z

@kendonB : see my take on a possible solution 9bf861b . The same idea can be carried forward for similar functions (pbsapply and pbreplicate for sure because these are based on pblapply). By adding the cl argument after ... I can see that I add the option for parallel processing to pblapply instead of having it in a separate function.

The main difference relative to what parallel::parLapply does is this:

> parallel::splitIndices(10, 4)
[[1]]
[1] 1 2 3

[[2]]
[1] 4 5

[[3]]
[1] 6 7

[[4]]
[1]  8  9 10

> splitpb(10, 4)
[[1]]
[1] 1 2 3 4

[[2]]
[1] 5 6 7 8

[[3]]
[1]  9 10

which means that instead of passing the chunks to the workers at once, we do it multiple times while updating the progress bar. This means increased communication overhead between the master and workers, which is a price one has to pay for a progress bar. Currently I can't see any work-around to speed things up even more. See a little example in the commit cited above for timings.

mclapply can be added in a similar manner, even as cl defined as an integer. I would rather remove the cluster-auto-detect feature as I find it quite dangerous (e.g. you might have to push objects to the workers anyways due to lack of shared memory in a non-forking situation, but more importantly, setting up RNGs safely cannot be done when the cluster is created AND destroyed within the function).

psolymos · 2016-09-08T18:56:32Z

This is now in the pb-parallel branch. Here is a todo list:

implement parallel option in this as part of pblapply through cl argument
implement mclapply based forking when is.integer(cl)
test forking on Unix
update examples with parallel feature and add timings (use dontrun{})
remind folks that objects need to be pushed to cluster
remind folks that safe RNG is not set-up is their responsibility

psolymos · 2016-09-08T23:18:07Z

Forking on Ubuntu Linux technically works, but the performance is very bad. So far it looks like that neither my implementation, nor @kvnkuang 's pbmcapply::pbmclapply seem to give huge improvement in this particular bootstrap example:

> n=10000
> x <- rnorm(n)
> y <- rnorm(n, crossprod(t(model.matrix(~x)), c(0,1)), sd=0.5)
d <- data.frame(y, x)
## model fitting and bootstrap
mod <- lm(y~x, d)
ndat <- model.frame(mod)
B <- 100
bid <- sapply(1:B, function(i) sample(nrow(ndat), nrow(ndat), TRUE))
fun <- function(z) {
    if (missing(z))
        z <- sample(nrow(ndat), nrow(ndat), TRUE)
    coef(lm(mod$call$formula, data=ndat[z,]))
}
> d <- data.frame(y, x)
> ## model fitting and bootstrap
> mod <- lm(y~x, d)
> ndat <- model.frame(mod)
> B <- 100
> bid <- sapply(1:B, function(i) sample(nrow(ndat), nrow(ndat), TRUE))
> fun <- function(z) {
+     if (missing(z))
+         z <- sample(nrow(ndat), nrow(ndat), TRUE)
+     coef(lm(mod$call$formula, data=ndat[z,]))
+ }
> system.time(res1 <- lapply(1:B, function(i) fun(bid[,i])))
   user  system elapsed
  1.444   0.016   1.460
> system.time(res1pb <- pblapply(1:B, function(i) fun(bid[,i])))
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 01s
   user  system elapsed
  1.460   0.036   1.495
> system.time(res2mc <- mclapply(1:B, function(i) fun(bid[,i]), mc.cores = 2L))
   user  system elapsed
  0.004   0.008   0.959
> system.time(res1pbmc <- pblapply(1:B, function(i) fun(bid[,i]), cl = 2L))
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 02s
   user  system elapsed
  3.848   0.900   1.612
> system.time(res1pbmcx <- pbmclapply(1:B, function(i) fun(bid[,i]), mc.cores = 2L))
  |========================================================| 100%   
   user  system elapsed
  0.152   0.020   1.564

As opposed to forking, snow type clusters work much faster and the improvement is reasonable even with increased overhead:

> cl <- makeCluster(2L)
> clusterExport(cl, c("fun", "mod", "ndat", "bid"))
> system.time(res1cl <- parLapply(cl = cl, 1:B, function(i) fun(bid[,i])))
   user  system elapsed
  0.004   0.000   0.984
> system.time(res1pbcl <- pblapply(1:B, function(i) fun(bid[,i]), cl = cl))
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 01s
   user  system elapsed
  0.076   0.008   1.163
> stopCluster(cl)

I am also tempted to find some clever way of how splitpb works. Currently it splits the problem of nx jobs into nn = ceiling(nx / ncl) partitions. That is reasonable if say nn is <50 or <25 so that the progress bar advances smoothly. For larger problems, we might use a constant k to keep the number of partitions a maximum number, say 50 or 100. Instead of splitpb(nx, ncl) I can use splitpb(nx, ncl*k). This would provide a smooth progress bar but minimize overhead for large problems. Could also help in the forking case when number of iterations (B) is large.

Additional todo items:

implement tuning for splitpb
test bootstrap case with B=1000 and see how much tuning helps.

psolymos · 2016-09-12T00:58:46Z

See some timing results in this blog post.

psolymos · 2016-09-14T22:02:52Z

PR #10 closes this feature request.

psolymos added the enhancement label Sep 7, 2016

psolymos closed this as completed Sep 7, 2016

psolymos reopened this Sep 7, 2016

psolymos added this to the v1.3 milestone Sep 8, 2016

psolymos self-assigned this Sep 8, 2016

psolymos mentioned this issue Sep 14, 2016

Pb parallel #10

Merged

psolymos closed this as completed Sep 14, 2016

pbulsink mentioned this issue Apr 10, 2017

pbapply parallel #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: parallel functions #9

Feature request: parallel functions #9

kendonB commented Jun 15, 2016

psolymos commented Jun 15, 2016

kendonB commented Jun 15, 2016

kvnkuang commented Sep 7, 2016 •

edited

Loading

psolymos commented Sep 7, 2016

kendonB commented Sep 7, 2016

kvnkuang commented Sep 7, 2016 •

edited

Loading

psolymos commented Sep 7, 2016 •

edited

Loading

psolymos commented Sep 8, 2016 •

edited

Loading

psolymos commented Sep 8, 2016 •

edited

Loading

psolymos commented Sep 8, 2016 •

edited

Loading

psolymos commented Sep 12, 2016

psolymos commented Sep 14, 2016

Feature request: parallel functions #9

Feature request: parallel functions #9

Comments

kendonB commented Jun 15, 2016

psolymos commented Jun 15, 2016

kendonB commented Jun 15, 2016

kvnkuang commented Sep 7, 2016 • edited Loading

psolymos commented Sep 7, 2016

kendonB commented Sep 7, 2016

kvnkuang commented Sep 7, 2016 • edited Loading

psolymos commented Sep 7, 2016 • edited Loading

psolymos commented Sep 8, 2016 • edited Loading

psolymos commented Sep 8, 2016 • edited Loading

psolymos commented Sep 8, 2016 • edited Loading

psolymos commented Sep 12, 2016

psolymos commented Sep 14, 2016

kvnkuang commented Sep 7, 2016 •

edited

Loading

kvnkuang commented Sep 7, 2016 •

edited

Loading

psolymos commented Sep 7, 2016 •

edited

Loading

psolymos commented Sep 8, 2016 •

edited

Loading

psolymos commented Sep 8, 2016 •

edited

Loading

psolymos commented Sep 8, 2016 •

edited

Loading