-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallel processing #611
Comments
Brilliant! One big pull request should be fine, after all it's happening pretty much in one single file. |
One worry (as always) is support for this by all CRAN platforms; see here -- we might want to look at how other packages do this, so that a single code base compiles on all CRAN platforms. |
OpenMP is not a good solution. OSX does not seem to support this in the standard build train. Nor do Windows builds. As I see it, GEOS unary predicates and operations may benefit directly (top-down split of geometry column and lapply the function) if a parallel cluster is available. There is a start-up cost for clusters, be they PSOCK (Windows) or FORK (unix). FORK will be embarassingly parallelisable. On M$ Open R, all of these may run foul of MKL (but since GEOS doesn't link to BLAS, this won't bite in most circumstances). I'll see if I can write a use case - experience with spdep uses package-specific options in an environment in the namespace to hold the number of cores and a parallel cluster, and simply cplits the input data between cores and catenates the results. |
Solution to what? SO and WRE suggest that windows CRAN builds should now support OpenMP, and that osx should be possible but isn't done by CRAN right now. From the perspective of code & maintenance, I prefer the openMP way rather than doing things at the R level. We shouldn't forget that people with serious computational problems may as well prefer using serious operating systems. Is the |
OpenMP is way out of my league, but there is a current issue thread related to OpenMP in fst package that may be of interest fstpackage/fst#109 |
I'll defer to others with more experience in writing and maintaining R packages about whether OpenMP is a viable solution. With regards to the user interface for setting thread usage, perhaps sf could supply a function like |
The underlying problem is that, as WRE details (painfully), there are no general, portable solutions at the C/C++ level, and they are not coming at all, ever, because systems change over time. Feel free to follow OMP, but for embarrassingly paralleiizable problems, the parallel package works cross-platform, and will only need a wrapper around calls out to split the input list into nthreads parts and mclapply() on OSX and Linux. This was why the parallel package was written. I'll look later at this. If OMP works even in some cases, that's helpful for some, but can also be handled in a wrapper. Still need to be able to set nthreads, though. |
The underlying problem is that, as WRE details (painfully), there are no general, portable solutions at the C/C++ level, and they are not coming at all, ever, because systems change over time. Feel free to follow OMP, but for embarrassingly paralleiizable problems, the parallel package works cross-platform, and will only need a wrapper around calls out to split the input list into nthreads parts and mclapply() on OSX and Linux. This was why the parallel package was written. I'll look later at this. If OMP works even in some cases, that's helpful for some, but can also be handled in a wrapper. Still need to be able to set nthreads, though. And see this R-devel thread. |
A compromise might be to have OMP off by default, and on only when installed with |
Would RcppParallel work for this case? |
Awesome idea @kendonB! It looks like RcppParallel ships with its own copies of parallelization libraries (TinyThread and Intel TBB), so this could potentially solve the problems with OpenMP not being available to specific compilers and the fact that different compilers need different flags to enable OpenMP. From what I can tell though, it looks like the convenience functions in RcppParallel (e.g. RcppParallel also exposes functions from the Intel TBB library (see http://rcppcore.github.io/RcppParallel/tbb.html). So one strategy could be to add RcppParallel to the package's DESCRIPTION (http://rcppcore.github.io/RcppParallel/index.html#r_packages) to standardise the installation/availability of the Intel TBB library, and we could use the Intel TBB functions in sf's the c++ source files to handle the parallel processing. It's worth noting though that Solaris doesn't support Intel TBB so we would still need ensure that the package can run without Intel TBB functions. I'll see if I can modify the OpenMP PR (GH-613) to use Intel TBB and report back. |
Ok, I've had a shot an implementing Intel TBB (GH-615). It seems to be easy enough to implement and appears to work on Windows, Linux, Mac OSX, and Solaris x86, so it might address most of the limitations associated with OpenMP? |
Just my 2 cents from working with OpenMP in the
hope that helps, best and good luck with your efforts! |
I'm still keen to start implementing parallelized versions of some of the more computationally demanding functions, so I just thought I'd ask if the package development team have decided on using the parallel R package, OpenMP, or TBB (or another package/library) for parallel processing? |
after my initial experiments under #615 I put it aside for a moment. I planned to compare to I have no strong opinions over OMP vs TBB; the first seems to be wider used altogether, the second more portable in R / CRAN. I'd be happy to see positive results, first, from something. |
Ok, yeah, I'll try implementing |
Several practical points, given that I have commit rights to master: I've created a local git branch, but do not want to push it now. So I need help with the Byzantine |
Probably a missing |
I use |
OK, a missing include declaration in R/bbox.R then. The NA bbox can't find NA_crs. Don't need Rcpp, only interpreted code. What might happen if i push a branch? |
It will tell you that the branch doesn't have an upstream (= on GH), and tell how to create it. Then you follow that instruction, push again and then it's in a branch on GH. |
Should I do that? |
How else do you want to share a branch? |
that is, what if I overwrite everything? |
Then you press Ctrl-Z. |
Unlike svn, for git, every local copy has the full history. |
I'll wait until later, am getting errors on nodes now, in the implementation, have to find out how parallel plays with Rcpp, and whether I need to load anything on the nodes. |
OK. Story so far OK, working with parallel_cl branch:
but as the plots (might) show, st_union() was a bad example, as there are country boundaries between countries in the split data set (two cores for me) which touch, but which don't get unioned. So the problems that are actually EP (embarrassingly parallel) are mostly trivial, problems that affect only one sfg at a time like simple, valid, etc. Is there a non-trivial GEOS predicate or operation which would actually benefit from parallelisation? Problem pushing branch:
I can push to r-spatial/spdep.git - is there a permissions issue? |
Ah, you're not an owner of the sf repo. I can make you an owner; an alternative is to fork sf, push to your own fork and then make a PR. PRs are, from git's (and my) perspective nothing else but branches. Less risky if you feel uncomfortable with branches on the main repo: you can only mess up your own work. |
|
I have a very old fork from sfr, but made the local branch in the r-spatial/sf master, not my forked master. I've no idea how I might copy that across - recursive copying would probably copy the .git files too, wouldn't it? My old git sf repo has |
Throw old repositories away. Fork on GH; git clone to local; go to your working dir, remove the upstream and set the new upstream to your fork. |
How to throw away old repos? On github? GH says a have a fork already, and will not let go. |
Everywhere. |
I can remove local copies, but not on GH. |
repo -> settings -> danger zone. |
Done on GH, and forked again. But locally I have an sf/ repo with the parallel_cl branch, and need to 1. clone rsbivand/sf, 2. copy the branch from the existing clone from r-spatial/sf to the clone of my fork. Can I simply change the name of the directory containing the existing clone from r-spatial/sf? Sorry fot the baby-steps, but git is strongly counter-intuitive. |
no, that won't work. maybe easiest to |
PR #616 is from changing the origin from my sf from r-spatial/sf to rsbivand/sf and pushing the parallel-cl branch there. I think it's not going anywhere unless we can see how to actually use parallel - maybe binary predicates and operators are a better bet than unary, where all of the features are needed anyway. I may continue trying with binary st_union, because x doesn't interact with x, only y. |
For the record, I redirected the remote fro r-spatial/sf.git to rsbivand/sf.git:
|
Report on status at last push. Output on medium sized objects for binary union (probably idiocies present, as two moderate sized objects give enormous output also for standard binary union): library(sf)
# Linking to GEOS 3.6.2, GDAL 2.2.2, proj.4 4.9.3
library(parallel)
library(maptools)
# Loading required package: sp
# Checking rgeos availability: TRUE
data(wrld_simpl)
sf_wrld <- st_as_sfc(wrld_simpl)
sp_hex <- HexPoints2SpatialPolygons(spsample(wrld_simpl, n=400, type="hexagonal"))
sf_hex <- st_as_sfc(sp_hex)
system.time(xxx <- st_union(sf_hex, sf_wrld))
# although coordinates are longitude/latitude, st_union assumes that they are planar
# user system elapsed
# 12.736 0.131 12.892
object.size(xxx)
# 518383144 bytes
length(xxx)
# [1] 80442
set_cores_option(detectCores(logical=FALSE))
get_cores_option()
# [1] 4
set_quiet_option(FALSE)
system.time(xxx1 <- st_union(sf_hex, sf_wrld))
# parallel: 4 cores
# although coordinates are longitude/latitude, st_union assumes that they are planar
# although coordinates are longitude/latitude, st_union assumes that they are planar
# although coordinates are longitude/latitude, st_union assumes that they are planar
# although coordinates are longitude/latitude, st_union assumes that they are planar
# parallel: mclapply
# user system elapsed
# 14.954 1.038 14.934
attr(xxx1, "timings")
# user.self elapsed
# index_split 0.002 0.002
# run_par 1.430 4.733
# catenate 9.950 10.198
object.size(xxx1)
# 517419744 bytes
length(xxx1)
# [1] 80442
set_mc_option(FALSE)
cl <- makeCluster(get_cores_option())
set_cluster_option(cl)
system.time(xxx2 <- st_union(sf_hex, sf_wrld))
# parallel: 4 cores
# parallel: parLapply
# user system elapsed
# 8.498 0.208 12.640
attr(xxx2, "timings")
# user.self elapsed
# index_split 0.004 0.004
# run_par 0.856 4.869
# catenate 7.636 7.765
object.size(xxx2)
# 517419744 bytes
length(xxx2)
# [1] 80442
stopCluster(cl) So it will be hard to know which combinations of objects (here 4x100 hex, 100 on each core, and wrld_simpl (n=246 but lots of islands)) will benefit from parallelising the actual binary union, as the catenation of the output is much more costly if there are many parts. The catenation lines, for
I guess that binary union isn't a great use case either. I also saw swapping last night on an 8GB/2-core laptop (multiple 0.5GB objects), but haven't reproduced major issues on a 16GB/4-core (8 with hyperthreading not used here) today. Could others try the parallel-cl branch with other data sets - I think that the islands are causing trouble? I haven't added cluster startup or shutdown times - the overhead is usually observable. |
By the way - if this is worth doing, it could be put in |
@jeffreyhanson some other considerations related to TBB/RcppParallel here. |
Since this discussion has stalled here, I'm closing the issue. As I see it now: we expect speed improvements by parallel computation of geometry operators. @jeffreyhanson has shown how this can be implemented at the C++ level using OpenMP and TBB; @rsbivand at the R level with package parallel. Neither have shown clear speed-ups, and for now we put the experiment to rest. To be continued, I hope! |
Sorry I haven't had the time to experiment more with this, I've just been really busy with other stuff. I'll play around more with this when I can find the time. |
Looking forward to! |
Hi, Apologies for my lack of activity on this thread. I'm posting now because I think I've found an OpenMP implementation with a demonstrable performance improvement. Although the benchmark I've included below is encouraging, it would be great if anyone has any ideas for further improving the implementation. Specifically, I've implemented a parallelized version of I've included a quick benchmark below comparing the parallel implementation to a recent implementation of # Initialization
## install openmp5 branch if needed
# devtools::install_github("jeffreyhanson/sf@openmp5")
## load packages
library(sf)
library(microbenchmark)
library(rnaturalearth)
library(lwgeom)
library(testthat)
## fetch example data and clean it
data <- ne_countries(scale = 50, returnclass ="sf")
data <- data[, "continent"]
data <- st_make_valid(data)
## plot raw data
plot(data) # Main processing
## verify that parallel implementation has comparable output to original
## implementation. For further tests, see tests/testthat/geom.R
a1 <- aggregate(data, list(data$continent), FUN = dplyr::first)
a2 <- aggregate(data, list(data$continent), FUN = dplyr::first, threads = 4)
test_that("same outputs", expect_true(all(a1 == a2)))
## run benchmark
benchmark_data <- microbenchmark(
standard = aggregate(data, list(data$continent), FUN = dplyr::first),
parallel = aggregate(data, list(data$continent), FUN = dplyr::first,
threads = 4),
times = 20L, unit = "s")
# Exports
## plot aggregated data for visual comparison
plot(a1[, "continent"], main = "standard") plot(a2[, "continent"], main = "parallel") ## print benchmark results
print(benchmark_data)
## plot benchmark results
boxplot(benchmark_data, unit = "s", log = FALSE, main = "Benchmark") For those interested, I've listed below the ways in which this specific implementation differs from my previous attempts. I've also included an explanation for why I think these design choices might contribute to greater performance compared to previous implementations:
I'd love to hear what everyone thinks, especially if anyone has any suggestions for improving this parallel implementation of |
To this particular problem, adding cores (I tried 32 on a 64 core machine) didn't change the results:
|
Hmm, that's disappointing and also an inherent limitation of this implementation. This implementation involves splitting up the geometries for each continent into a separate "task" (for lack of a better word), and then farming out these tasks to the pool of available cores. So, because there are only eight continents and thus eight "tasks", we wouldn't expect to see any increases in performance between 9 cores vs 100 cores. Additionally, if one of the continents takes a much longer period of time to process than the other continents, we might not see any increase in performance at all between 1 core vs. 100 cores. Here's an example which should, hopefully, show a noticeable increase in performance between 4 vs. 32 cores. In this example, we are aggregating state-level data to country-level data for more than 100 countries. # Initialization
## install openmp5 branch if needed
## devtools::install_github("jeffreyhanson/sf@openmp5")
## load packages
library(sf)
library(microbenchmark)
library(rnaturalearth)
library(lwgeom)
library(testthat)
## fetch example data and clean it
data <- ne_states(returnclass ="sf")
data <- data[, "admin"]
data <- st_make_valid(data)
## plot raw data
plot(data) ## print number of countries
print(length(unique(data$admin)))
# Main processing
## verify that parallel implementation has comparable output to original
## implementation. For further tests, see tests/testthat/geom.R
a1 <- aggregate(data, list(data$admin), FUN = dplyr::first)
a2 <- aggregate(data, list(data$admin), FUN = dplyr::first, threads = 4)
test_that("same outputs", expect_true(all(a1 == a2)))
## run benchmark
benchmark_data <- microbenchmark(
standard = aggregate(data, list(data$admin), FUN = dplyr::first),
parallel = aggregate(data, list(data$admin), FUN = dplyr::first,
threads = 4),
times = 20L, unit = "s")
# Exports
## plot aggregated data for visual comparison
plot(a1[, "admin"], main = "standard") plot(a2[, "admin"], main = "parallel") ## print benchmark results
print(benchmark_data)
## plot benchmark results
boxplot(benchmark_data, unit = "s", log = FALSE, main = "Benchmark") |
I just thought I'd ask if you were able to reproduce this benchmark on your system? I completely understand if you're really busy at the moment and don't have time for this. |
This thread gives insight into the openMP race condition affecting data.table, the R garbage collector and the R byte compiler. The internals are quite complicated, but in that setting |
Not well-qualified to comment on this but has the |
I've been working with some large-ish data sets and I was thinking it would be great if the sf R package could use parallel processing. I noticed that there was an openmp branch, but I can't tell if much progress has been made on it, and I was wondering if anyone was working on this? If not, I have a little experience with Rcpp and OpenMP and I'm keen to take a shot at implementing this.
From what I can tell, for the functions that utilise geos, the trick seems to be creating a copy of the handler object (
GEOSContextHandle_t
) for each thread. For instance, implementing something like this:If you think this would be a useful contribution, would you prefer me to submit a series of small pull requests (e.g. one pull request per function?) or would you prefer me to submit one big pull request with everything?
The text was updated successfully, but these errors were encountered: