Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: parallel functions #9

Closed
kendonB opened this issue Jun 15, 2016 · 12 comments
Closed

Feature request: parallel functions #9

kendonB opened this issue Jun 15, 2016 · 12 comments
Assignees
Milestone

Comments

@kendonB
Copy link

kendonB commented Jun 15, 2016

Currently, there's no nice way to get progress bars for parallel::*apply functions - the best I'm able to do on Windows is write to a .txt log file from each process, which is cumbersome.

@psolymos
Copy link
Owner

@kendonB: not sure about mclapply, but the par*apply functions split the workload and push that to the workers, so the main process is only idling thus no real progress to show. Right now I can't see an easy way of implementing the request, but I am open to suggestions.

@kendonB
Copy link
Author

kendonB commented Jun 15, 2016

Perhaps a solution is through a text file on disk. The main process could periodically read the file and output something to the console. I have no idea how easy this might be and how deep into the parallel package functions you might have to go to get it to periodically monitor something.

@kvnkuang
Copy link

kvnkuang commented Sep 7, 2016

Hi there, I recently created a package to track the parallel apply functions (mc*apply). It's on CRAN now: https://cran.r-project.org/web/packages/pbmcapply/index.html.

@psolymos
Copy link
Owner

psolymos commented Sep 7, 2016

@kvnkuang : thanks for the note, it is great to see the package addressing the feature request for forking type parallelism. I consider #9 closed for now.

@psolymos psolymos closed this as completed Sep 7, 2016
@kendonB
Copy link
Author

kendonB commented Sep 7, 2016

I'd suggest keeping this open as it doesn't yet work for Windows

@kvnkuang
Copy link

kvnkuang commented Sep 7, 2016

Hey @kendonB, since forking is not supported on Windows, mc*apply will throw an error if you try to run it on Windows with num.cores > 1. So unfortunately the package cannot work on Windows as a result.

@psolymos psolymos reopened this Sep 7, 2016
@psolymos
Copy link
Owner

psolymos commented Sep 7, 2016

Maybe a solution similar to parLapplyLB could be implemented by increased communication overhead among the workers. This could work on Windows (and other OS as well).

@psolymos
Copy link
Owner

psolymos commented Sep 8, 2016

@kendonB : see my take on a possible solution 9bf861b . The same idea can be carried forward for similar functions (pbsapply and pbreplicate for sure because these are based on pblapply). By adding the cl argument after ... I can see that I add the option for parallel processing to pblapply instead of having it in a separate function.

The main difference relative to what parallel::parLapply does is this:

> parallel::splitIndices(10, 4)
[[1]]
[1] 1 2 3

[[2]]
[1] 4 5

[[3]]
[1] 6 7

[[4]]
[1]  8  9 10

> splitpb(10, 4)
[[1]]
[1] 1 2 3 4

[[2]]
[1] 5 6 7 8

[[3]]
[1]  9 10

which means that instead of passing the chunks to the workers at once, we do it multiple times while updating the progress bar. This means increased communication overhead between the master and workers, which is a price one has to pay for a progress bar. Currently I can't see any work-around to speed things up even more. See a little example in the commit cited above for timings.

mclapply can be added in a similar manner, even as cl defined as an integer. I would rather remove the cluster-auto-detect feature as I find it quite dangerous (e.g. you might have to push objects to the workers anyways due to lack of shared memory in a non-forking situation, but more importantly, setting up RNGs safely cannot be done when the cluster is created AND destroyed within the function).

@psolymos
Copy link
Owner

psolymos commented Sep 8, 2016

This is now in the pb-parallel branch. Here is a todo list:

  • implement parallel option in this as part of pblapply through cl argument
  • implement mclapply based forking when is.integer(cl)
  • test forking on Unix
  • update examples with parallel feature and add timings (use dontrun{})
  • remind folks that objects need to be pushed to cluster
  • remind folks that safe RNG is not set-up is their responsibility

@psolymos psolymos added this to the v1.3 milestone Sep 8, 2016
@psolymos psolymos self-assigned this Sep 8, 2016
@psolymos
Copy link
Owner

psolymos commented Sep 8, 2016

Forking on Ubuntu Linux technically works, but the performance is very bad. So far it looks like that neither my implementation, nor @kvnkuang 's pbmcapply::pbmclapply seem to give huge improvement in this particular bootstrap example:

> n=10000
> x <- rnorm(n)
> y <- rnorm(n, crossprod(t(model.matrix(~x)), c(0,1)), sd=0.5)
d <- data.frame(y, x)
## model fitting and bootstrap
mod <- lm(y~x, d)
ndat <- model.frame(mod)
B <- 100
bid <- sapply(1:B, function(i) sample(nrow(ndat), nrow(ndat), TRUE))
fun <- function(z) {
    if (missing(z))
        z <- sample(nrow(ndat), nrow(ndat), TRUE)
    coef(lm(mod$call$formula, data=ndat[z,]))
}
> d <- data.frame(y, x)
> ## model fitting and bootstrap
> mod <- lm(y~x, d)
> ndat <- model.frame(mod)
> B <- 100
> bid <- sapply(1:B, function(i) sample(nrow(ndat), nrow(ndat), TRUE))
> fun <- function(z) {
+     if (missing(z))
+         z <- sample(nrow(ndat), nrow(ndat), TRUE)
+     coef(lm(mod$call$formula, data=ndat[z,]))
+ }
> system.time(res1 <- lapply(1:B, function(i) fun(bid[,i])))
   user  system elapsed
  1.444   0.016   1.460
> system.time(res1pb <- pblapply(1:B, function(i) fun(bid[,i])))
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 01s
   user  system elapsed
  1.460   0.036   1.495
> system.time(res2mc <- mclapply(1:B, function(i) fun(bid[,i]), mc.cores = 2L))
   user  system elapsed
  0.004   0.008   0.959
> system.time(res1pbmc <- pblapply(1:B, function(i) fun(bid[,i]), cl = 2L))
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 02s
   user  system elapsed
  3.848   0.900   1.612
> system.time(res1pbmcx <- pbmclapply(1:B, function(i) fun(bid[,i]), mc.cores = 2L))
  |========================================================| 100%   
   user  system elapsed
  0.152   0.020   1.564

As opposed to forking, snow type clusters work much faster and the improvement is reasonable even with increased overhead:

> cl <- makeCluster(2L)
> clusterExport(cl, c("fun", "mod", "ndat", "bid"))
> system.time(res1cl <- parLapply(cl = cl, 1:B, function(i) fun(bid[,i])))
   user  system elapsed
  0.004   0.000   0.984
> system.time(res1pbcl <- pblapply(1:B, function(i) fun(bid[,i]), cl = cl))
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 01s
   user  system elapsed
  0.076   0.008   1.163
> stopCluster(cl)

I am also tempted to find some clever way of how splitpb works. Currently it splits the problem of nx jobs into nn = ceiling(nx / ncl) partitions. That is reasonable if say nn is <50 or <25 so that the progress bar advances smoothly. For larger problems, we might use a constant k to keep the number of partitions a maximum number, say 50 or 100. Instead of splitpb(nx, ncl) I can use splitpb(nx, ncl*k). This would provide a smooth progress bar but minimize overhead for large problems. Could also help in the forking case when number of iterations (B) is large.

Additional todo items:

  • implement tuning for splitpb
  • test bootstrap case with B=1000 and see how much tuning helps.

@psolymos
Copy link
Owner

See some timing results in this blog post.

@psolymos psolymos mentioned this issue Sep 14, 2016
@psolymos
Copy link
Owner

PR #10 closes this feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants