-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I set INTERLEAVED_READING options with PBF files #1213
Comments
I see the same warning, but why do you think it is an error? Do you have an indication that not all the features are being read? |
I never checked that (and I'm not sure how) but this morning I tested that hypothesis with the following expertiment (following the same ideas as in #1157). I think it shows some evidence that some features are missing with the OGR_INTERLEAVED_READING warning. # packages
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# download some data
download.file("http://download.geofabrik.de/europe/isle-of-man-latest.osm.pbf", "isle-of-man.osm.pbf", mode = "wb")
download.file("http://download.geofabrik.de/europe/andorra-latest.osm.pbf", "andorra.osm.pbf", mode = "wb")
download.file("http://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf", "greater-london.osm.pbf", mode = "wb")
# isle of man example
isle_of_man <- st_read(
"isle-of-man.osm.pbf",
layer = "lines",
stringsAsFactors = FALSE
)
#> Reading layer `lines' from data source `C:\Users\Utente\AppData\Local\Temp\Rtmp2FNMdi\reprex1bd45b8b2596\isle-of-man.osm.pbf' using driver `OSM'
#> Simple feature collection with 12943 features and 9 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: -20.05167 ymin: 48.18019 xmax: -2.913484 ymax: 56.7
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
nrow(isle_of_man %>% filter(highway == "primary"))
#> [1] 462
isle_of_man_query <- st_read(
"isle-of-man.osm.pbf",
layer = "lines",
query = "SELECT * FROM lines WHERE (highway = 'primary')",
stringsAsFactors = FALSE
)
#> Reading layer `lines' from data source `C:\Users\Utente\AppData\Local\Temp\Rtmp2FNMdi\reprex1bd45b8b2596\isle-of-man.osm.pbf' using driver `OSM'
#> Simple feature collection with 462 features and 9 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: -4.792454 ymin: 54.06365 xmax: -4.31848 ymax: 54.41379
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
nrow(isle_of_man_query)
#> [1] 462
# andorra example
andorra <- st_read(
"andorra.osm.pbf",
layer = "lines",
stringsAsFactors = FALSE
)
#> Reading layer `lines' from data source `C:\Users\Utente\AppData\Local\Temp\Rtmp2FNMdi\reprex1bd45b8b2596\andorra.osm.pbf' using driver `OSM'
#> Simple feature collection with 5439 features and 9 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: 0.975577 ymin: 42.32422 xmax: 1.824654 ymax: 42.78834
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
nrow(andorra %>% filter(highway == "primary"))
#> [1] 460
andorra_query <- st_read(
"andorra.osm.pbf",
layer = "lines",
query = "SELECT * FROM lines WHERE (highway = 'primary')",
stringsAsFactors = FALSE
)
#> Reading layer `lines' from data source `C:\Users\Utente\AppData\Local\Temp\Rtmp2FNMdi\reprex1bd45b8b2596\andorra.osm.pbf' using driver `OSM'
#> Simple feature collection with 460 features and 9 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: 1.419741 ymin: 42.43526 xmax: 1.737154 ymax: 42.63297
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
nrow(andorra_query)
#> [1] 460
# greater london example
greater_london <- st_read(
"greater-london.osm.pbf",
layer = "lines",
stringsAsFactors = FALSE
)
#> Reading layer `lines' from data source `C:\Users\Utente\AppData\Local\Temp\Rtmp2FNMdi\reprex1bd45b8b2596\greater-london.osm.pbf' using driver `OSM'
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> GDAL Error 1: Too many features have accumulated in points layer. Use
#> OGR_INTERLEAVED_READING=YES mode
#> Simple feature collection with 62785 features and 9 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: -0.6124681 ymin: 51.2822 xmax: 0.3760979 ymax: 51.78736
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
nrow(greater_london %>% filter(highway == "primary"))
#> [1] 2094
greater_london_query <- st_read(
"greater-london.osm.pbf",
layer = "lines",
query = "SELECT * FROM lines WHERE (highway = 'primary')",
stringsAsFactors = FALSE
)
#> Reading layer `lines' from data source `C:\Users\Utente\AppData\Local\Temp\Rtmp2FNMdi\reprex1bd45b8b2596\greater-london.osm.pbf' using driver `OSM'
#> Simple feature collection with 12600 features and 9 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: -0.5022575 ymin: 51.28324 xmax: 0.2661656 ymax: 51.68595
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
nrow(greater_london_query)
#> [1] 12600 Created on 2019-12-10 by the reprex package (v0.3.0) I applied the same idea in all the examples: in the first part of each example I read the data using |
Just to confirm: I can reproduce this issue with GDAL 3.0.2 and have found a work-around, convert to another file format with ogr2ogr first: # Aim: test gdal's interleaved settings
gdal-config --version
wget http://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf
ogr2ogr \
-f GPKG lnd.gpkg \
greater-london-latest.osm.pbf \
"lines" Then back in R: greater_london_gpkg = read_sf("lnd.gpkg")
nrow(greater_london_gpkg %>% filter(highway == "primary")) # 12600 |
Yes, or all in R: > gdal_utils("vectortranslate", "greater-london-latest.osm.pbf", "greater-london-latest.gpkg")
> st_layers("greater-london-latest.gpkg")
Driver: GPKG
Available layers:
layer_name geometry_type features fields
1 points Point 326046 10
2 lines Line String 352014 9
3 multilinestrings Multi Line String 4332 4
4 multipolygons Multi Polygon 544434 25
5 other_relations Geometry Collection 15859 4
> st_layers("greater-london-latest.osm.pbf", do_count = TRUE)
Driver: OSM
Available layers:
layer_name geometry_type features fields
1 points Point 326046 10
2 lines Line String 62788 9
3 multilinestrings Multi Line String 0 4
4 multipolygons Multi Polygon 0 25
5 other_relations Geometry Collection 0 4
Warning messages:
1: In CPL_get_layers(dsn, options, do_count) :
GDAL Error 1: Too many features have accumulated in lines layer. Use OGR_INTERLEAVED_READING=YES mode
2: In CPL_get_layers(dsn, options, do_count) :
GDAL Error 1: Too many features have accumulated in points layer. Use OGR_INTERLEAVED_READING=YES mode |
Yes, it is not part of the GDAL C API for GDAL utilities. |
Ah I see, thanks for clarifying! |
I just tested that code and it doesn't work on my laptop since I get the following error: library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
> gdal_utils("vectortranslate", "greater-london-latest.osm.pbf", "greater-london-latest.gpkg")
#> Error in gdal_utils("vectortranslate", "greater-london-latest.osm.pbf", :
#> gdal_utils vectortranslate: an error occured
#> In addition: Warning messages:
#> 1: In CPL_gdalvectortranslate(source, destination, options) :
#> GDAL Message 6: Normalized/laundered field name: 'admin_level' to 'admin_le_3'
#> 2: In CPL_gdalvectortranslate(source, destination, options) :
#> GDAL Error 6: Geometry type of `Geometry Collection' not supported in shapefiles. Type can be #> overridden with a layer creation option of SHPT=POINT/ARC/POLYGON/MULTIPOINT/POINTZ/ARCZ/POLYGONZ/MULTIPOINTZ/MULTIPATCH. Session infodevtools::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 3.6.1 (2019-07-05)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language EN
#> collate English_United Kingdom.1252
#> ctype English_United Kingdom.1252
#> tz Europe/Berlin
#> date 2019-12-10
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1)
#> callr 3.4.0 2019-12-09 [1] CRAN (R 3.6.1)
#> class 7.3-15 2019-01-01 [2] CRAN (R 3.6.1)
#> classInt 0.4-2 2019-10-17 [1] CRAN (R 3.6.1)
#> cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
#> DBI 1.0.0 2018-05-02 [1] CRAN (R 3.6.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
#> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.1)
#> digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.1)
#> e1071 1.7-3 2019-11-26 [1] CRAN (R 3.6.1)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.1)
#> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.1)
#> KernSmooth 2.23-15 2015-06-29 [2] CRAN (R 3.6.1)
#> knitr 1.26 2019-11-12 [1] CRAN (R 3.6.1)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.1)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
#> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.1)
#> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.1)
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.1)
#> rlang 0.4.2 2019-11-23 [1] CRAN (R 3.6.1)
#> rmarkdown 1.18 2019-11-27 [1] CRAN (R 3.6.1)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
#> sf * 0.8-1 2019-12-10 [1] Github (r-spatial/sf@60208b7)
#> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
#> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.1)
#> units 0.6-5 2019-10-08 [1] CRAN (R 3.6.1)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.1)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
#> xfun 0.11 2019-11-12 [1] CRAN (R 3.6.1)
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
#>
#> [1] C:/Users/Utente/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.1/library so I think that it's a problem related to my GDAL version. I followed the instructions here for updating the GDAL distribution on windows using Conda and I completed all the steps but I still get the same "old" GDAL version linked to sf package. Sorry for the stupid question, but how can I update that? |
Switch to Linux ; ) |
Support for auto-recognizing target driver from file name extension (i.e. .gpkg means geopackage) was apparently added to GDAL after 2.2.3; you'd have to specify the output format, which you can do. Anyway, this is getting of topic, so closing now. |
Thanks, and just for future reference, I solved that problem as follows: gdal_utils(
util = "vectortranslate",
source = "greater-london-latest.osm.pbf",
destination = "greater-london-latest.gpkg",
options = c("-f", "GPKG")
) If you want I could create a PR and add this example to |
PR welcome! |
I'm not sure the original issue is solved. @edzer do you think the best solution in this situation, building on the reprex above, is to first convert the object and then read it in, e.g.: library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
download.file("http://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf", "greater-london-latest.osm.pbf", mode = "wb")
gdal_utils(
util = "vectortranslate",
source = "greater-london-latest.osm.pbf",
destination = "greater-london-latest.gpkg",
options = c("-f", "GPKG")
)
greater_london_gpkg = read_sf("greater-london-latest.gpkg", layer = "lines")
nrow(greater_london_gpkg)
#> [1] 352158 Created on 2019-12-10 by the reprex package (v0.3.0) I imagine there could be a faster way, e.g. selecting the layer in the translate stage, but not sure. |
Out of interest I did a benchmark. An interesting question is 'at what point is interleaving/translate needed? For context, @agila5 and I are working on a package that aims to make access to large OSM datasets quicker and easier for R users (comments/suggestions on this welcome): https://github.com/ITSLeeds/geofabric library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
setwd("~/atfutures/who-data/")
get_highways1 = function(x, highway_type) {
x_gpkg = paste0(x, ".gpkg")
gdal_utils(
util = "vectortranslate",
source = x,
destination = x_gpkg,
options = c("-f", "GPKG")
)
full_res = read_sf(x_gpkg, layer = "lines")
full_res[full_res$highway == highway_type, ]
}
get_highways2 = function(x, highway_type) {
q = paste0("SELECT * FROM lines WHERE (highway = '", highway_type, "')")
read_sf(x, layer = "lines", query = q)
}
h1 = get_highways1("greater-london-latest.osm.pbf", "primary")
h2 = get_highways2("greater-london-latest.osm.pbf", "primary")
plot(st_geometry(h1))
plot(st_geometry(h2)) bench::mark(iterations = 1, check = F,
get_highways1("greater-london-latest.osm.pbf", "primary"),
get_highways2("greater-london-latest.osm.pbf", "primary")
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 x 6
#> expression min median
#> <bch:expr> <bch> <bch:>
#> 1 get_highways1("greater-london-latest.osm.pbf", "primary") 30.1s 30.1s
#> 2 get_highways2("greater-london-latest.osm.pbf", "primary") 2.7s 2.7s
#> # … with 3 more variables: `itr/sec` <dbl>, mem_alloc <bch:byt>, `gc/sec` <dbl> Created on 2019-12-11 by the reprex package (v0.3.0) |
IMO the issue is not solved from the geofabric package pov and for pbf much bigger than my original example. For example, the PBF file of the region where I live is here and it's almost 400MB. The corresponding GPKG file is 2.2GB and the conversion time is more than 15 minutes. > system.time({
+ gdal_utils(
+ util = "vectortranslate",
+ source = "nord-ovest-latest.osm.pbf",
+ destination = "nord-ovest-latest.gpkg",
+ options = c("-f", "GPKG")
+ )
+ })
user system elapsed
964.39 425.41 1464.15
> file.size("nord-ovest-latest.gpkg") / 1e9
[1] 2.404442 Still I'm not sure if it's possible to read the PBF file without the conversion e.g. by properly setting the INTERLEAVED_READING option (and check the timings) since I don't know how to do that.
I read the GDAL documentation of the PBF format but they did not mention anything about the "maximum size" (and I'm not sure it does make sense to talk about a max size). They explain the problem in the interleaved reading section. |
Interesting, does that section mean that
@edzer this issue does not seem resolved and am not sure why it was closed, other than it briefly going off topic. |
It seems edzer@gin-edzer:/tmp$ ogrinfo -oo INTERLEAVED_READING=YES *pbf points > x2
edzer@gin-edzer:/tmp$ ogrinfo -oo OGR_INTERLEAVED_READING=YES *pbf points > x1
Warning 6: driver OSM does not support open option OGR_INTERLEAVED_READING
ERROR 1: Too many features have accumulated in lines layer. Use OGR_INTERLEAVED_READING=YES mode but that the effect is the same (x1 and x2 are identical). My idea behind closing this was that I felt it was not an sf issue, but if anything, a GDAL issue. What do you think? |
I think it may be a GDAL issue and was considering raising an issue. If you're up for mentoring us through the process and help us develop a decent reprex (e.g. based on the rich examples above) I would happily submit one. Note, thought, that I've commented on a closed related issue here: OSGeo/gdal#1785 From what I can gather a good starting point for a GDAL reprex could be something like:
Although I must confess I'm not 100% sure what the second line does and suspect we may get a better response if you raise the issue on our (and sf's) behalf. |
I think the warning & error message sufficiently illustrate the problem, and that the first command
suggests the docs are correct, but that the error message gives the wrong hint. |
OK, shall I open a ticket on the GitHub Issue tracker? |
And will the solution to that issue resolve this issue sf side (or do we just need to add some incantation like |
What makes you think so? |
Tested it and no it doesn't work:
|
This one did though, issue solved I think 🎉
Can you reproduce this @agila5 ? |
I think it's the same example as in my first comment and the problem is that with that option it reads 0 features. It doesn't make sense, right? |
I agree, but I get the same when I try edzer@gin-edzer:/tmp$ ogrinfo -oo INTERLEAVED_READING=YES *pbf lines > l2
edzer@gin-edzer:/tmp$ ogrinfo *pbf lines > l1
ERROR 1: Too many features have accumulated in points layer. Use OGR_INTERLEAVED_READING=YES mode
edzer@gin-edzer:/tmp$ wc l1 l2
470818 1744967 19993830 l1
30 74 831 l2
470848 1745041 19994661 total suggesting it is a GDAL issue, not an sf issue. |
Correct, it returns 0 features, sorry for going round in circles. Looks like a GDAL issue that could solve this issue when closed (it may be worth keeping this issue open as an issue that requires changes in GDAL to be closed). Final question: how to I demonstrate that
yields 0 features? |
Open |
Please proof-read this issue: Dear GDAL dev team, I think there is an issue with the osm driver. The error message it produces does not seem to be useful (it says Please could you try to reproduce the (hopefully reproducible) example below? # Aim: demonstrate issues with the OSM driver
gdalinfo --version
# $ GDAL 2.4.2, released 2019/06/28
# get data
wget http://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf
ogrinfo -oo OGR_INTERLEAVED_READING=YES *pbf lines > x2
# $ Warning 6: driver OSM does not support open option OGR_INTERLEAVED_READING
# $ ERROR 1: Too many features have accumulated in points layer. Use OGR_INTERLEAVED_READING=YES mode
ogrinfo -oo INTERLEAVED_READING=YES *pbf lines > x3
wc x3
# $ 27 65 740 x3 # contains no features This may be related to OSGeo/gdal#1785. See here for use case and some additional tests from within R: #1213 |
(If you post that comment as a github issue I would also cite the other related github issue, i.e. OSGeo/gdal#1785, maybe helps with the context.) |
I think we have a solution. In OSGeo/gdal#2100 Even Rouault suggests converting only the layer we need to a GeoPackage before reading. I was sceptical about this 2 stage process but it seems it is indeed the quickest way, as benchmarked below (with conversion only needing to run once, hence 2 get1 = function(x = "greater-london-latest.osm.pbf", highway_type = "residential") {
x_gpkg = paste0(x, ".gpkg")
gdal_utils(
util = "vectortranslate",
source = x,
destination = x_gpkg,
options = c("-f", "GPKG")
)
full_res = read_sf(x_gpkg, layer = "lines")
full_res[full_res$highway == highway_type, ]
}
get2 = function(x = "greater-london-latest.osm.pbf", highway_type = "residential") {
q = paste0("SELECT * FROM lines WHERE (highway = '", highway_type, "')")
read_sf(x, layer = "lines", query = q)
}
get3 = function(x = "greater-london-latest.osm.pbf", highway_type = "residential") {
q = paste0("SELECT * FROM lines WHERE (highway = '", highway_type, "')")
dest_filename = paste0(x, ".gpkg")
if(!file.exists(dest_filename)) {
sf::gdal_utils(
util = "vectortranslate",
source = "greater-london-latest.osm.pbf",
destination = dest_filename,
options = c("-f", "GPKG", "lines")
)
}
st_read(dest_filename, query = q)
}
h1 = get1()
h2 = get2()
h3 = get3()
nrow(h1)
nrow(h2)
nrow(h3)
file.remove("greater-london-latest.osm.pbf.gpkg")
bench::mark(iterations = 1, check = FALSE, get1(), get2(), get3(), get3()) Could you try to reproduce this @agila5 to check it works on Windows? If so I suggest we go with this solution in |
I get slighlty different results between h1 and h2 and h3 because > table(h1$highway, useNA = "ifany")
residential <NA>
70406 74234 Another thing that I noticed is that the geometry column in h2 isn't called geom or geometry > colnames(h2)
[1] "osm_id" "name" "highway" "waterway" "aerialway"
[6] "barrier" "man_made" "z_order" "other_tags" "_ogr_geometry_" If I correct those "problems", then I get:
Anyway I will add more tests and I think that now we can move the discussion to geofabrik repo. |
Great, all that is pointing towards the 'h3' solution for me, main thing, it works (touch wood)! Great thread, good idea to move it back to geofabric repo. |
Hi! I'm working on a project on spatial networks in Greater London and I'm downloading the network data from Geofabrik.
When I read the data I get a warning
Created on 2019-12-09 by the reprex package (v0.3.0)
so I checked on gdal website on the meaning of that error and, after reading your answers in #1157, I tried to that set parameter. The problem is that none of these options work:
Created on 2019-12-09 by the reprex package (v0.3.0)
How can I set that parameter? Do I have to modify the .ini config file?
This problem is related to ropensci/osmextract#12 so apologies for "crossposting".
The PBF file is approximately 55MB of data and the same problem/warning disapper if I consider a smaller PBF.
Session info
The text was updated successfully, but these errors were encountered: