-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
st_buffer very slow for complex linestrings #2039
Comments
Buffering happens in the upstream libraries, GEOS or s2geometry, see for instance here, for performance issues with these libraries this is the wrong place. What you could do in |
If you are using a planar coordinate system, you can try geos directly -- it's a bit faster. You can also limit the number of segments ( library("sf")
library("geos")
n = 1000
x = simulate_linestring(n, "cluster") # this returns `linestr`
t = bench::mark(
check = FALSE,
st_buffer(x, dist = 1, nQuadSegs = 30),
st_buffer(x, dist = 1, nQuadSegs = 5),
geos_buffer(x, distance = 1)
)
t[, 1:4]
#> expression min median `itr/sec`
#> 1 st_buffer(x, dist = 1, nQuadSegs = 30) 3.77s 3.77s 0.265
#> 2 st_buffer(x, dist = 1, nQuadSegs = 5) 3.62s 3.62s 0.276
#> 3 geos_buffer(x, distance = 1) 2.75s 2.75s 0.364 |
Thank you both, and apologies for the out-of-scope question. Good to know about For what it's worth, calling the GDAL buffervectors tool from R via the OSGEO4W shell with something like the code below is fast (even with the original data) and may be useful in some cases:
|
By removing 40% of vertices I see ~3x speedup on this example and the results look very similar. n = 2000
set.seed(123)
x = simulate_linestring(n, "cluster")
system.time(t1 <- st_buffer(x, dist = 1))
#> user system elapsed
#> 23.26 0.81 24.12
x = st_simplify(x, , dTolerance = 1) # 1177 vertices
system.time(t2 <- st_buffer(x, dist = 1))
#> user system elapsed
#> 8.64 0.16 8.83
par(mfrow = c(1, 2))
plot(t1, main = "Original data")
plot(t2, main = "Simplified data") |
Thank you, this is good. Using the attached GPS tracklog, I see no improvement in processing time of
The results all look very similar, even the buffer around the highly simplified track is close to the others. So |
Some performance differences between |
Are the results identical? |
The results are identical, but on my PC (I use Windows too) the timings are similar. Strange. linestr = sf::read_sf("gps_tracklog_with_clusters.shp")
system.time(t1 <- sf::st_buffer(linestr, dist = 20))
#> user system elapsed
#> 21.00 0.88 21.89
system.time(t2 <- geos::geos_buffer(linestr, distance = 20))
#> user system elapsed
#> 15.73 0.56 16.30
identical(sf::st_as_sfc(t1), sf::st_as_sfc(t2))
#> TRUE
sf::sf_extSoftVersion()
#> GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H PROJ
#> "3.9.3" "3.5.2" "8.2.1" "true" "true" "8.2.1" |
A fresh installation of sf fixed it by updating the external libraries from
to
Now the timing is similar to geos_buffer:
So it was something in the external libraries. Thank you very much! |
Highly complex linestrings do cause the standard buffer algorithm to be slow, due to the number of buffer line segments that are generated and processed internally. An approach which can be much faster is to split the line into sections (say, 10 vertices long), buffer the sections, and then union the results. This can be tens of times faster. Ideally this heuristic improvement could be provided automatically by the buffer code, but it's hard to detect when it should be applied. It could easily be supplied as a separate buffer function, for use at user's discretion (see JTS prototype implementation). |
st_buffer() can be very slow with complex linestrings, e.g. from GPS tracklogs. Especially clusters of points (e.g. when a GPS device didn't move but kept recording points, as shown below) can take very long to process:
In the track shown above st_buffer() took 1250 seconds, but in QGIS gdal:buffervectors took about 50 seconds.
Is there any way to improve the performance of st_buffer() in such clustered points?
As a reproducible example, below is a function for creating random linestrings, either linear or clustered, with adjustable number of points, and some benchmark results to illustrate the differences between linear and clustered linestrings.
Simulate linetrings with varying number of points:
Difference between linear and clustered linestrings (both with their buffer):
st_buffer takes much longer to process the clustered points (~300x longer for 2000 vertices):
The text was updated successfully, but these errors were encountered: