Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDAL Error 1: Too many features have accumulated in lines layer. Use OGR_INTERLEAVED_READING=YES mode #12

Closed
agila5 opened this issue Oct 8, 2019 · 21 comments

Comments

@agila5
Copy link
Collaborator

agila5 commented Oct 8, 2019

Reprex of the error:

library(geofabric)
get_geofabric(layer = "points")
#> No exact matching geofabric zone. Best match is West Yorkshire (28.7 MB)
#> Downloading http://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf to 
#> /tmp/RtmpfjodGC/west-yorkshire.osm.pbf
#> Old attributes: attributes=name,barrier,highway,ref,address,is_in,place,man_made
#> New attributes: attributes=name,barrier,highway,ref,address,is_in,place,man_made,building,natural,surface,source,power,amenity,shop,operator
#> Using ini file that can can be edited with file.edit(/tmp/RtmpfjodGC/ini_new.ini)
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> GDAL Error 1: Too many features have accumulated in lines layer. Use
#> OGR_INTERLEAVED_READING=YES mode

Created on 2019-10-08 by the reprex package (v0.3.0)

Probably I'm missing something, but I'm not sure why it returns Too many features have accumulated in lines layer even if I set layer = "points". Anyway I tried added the OGR_INTERLEAVED_READING=YES option to st_read but with no success:

my_url <- "https://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf"
sf::st_read(my_url, layer = "points", options = "OGR_INTERLEAVED_READING=YES")
#> options:        OGR_INTERLEAVED_READING=YES 
#> Reading layer `points' from data source `https://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf' using driver `OSM'
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> GDAL Error 1: Too many features have accumulated in lines layer. Use
#> OGR_INTERLEAVED_READING=YES mode
#> Simple feature collection with 72849 features and 10 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -2.326617 ymin: 53.3142 xmax: -1.039338 ymax: 54.03102
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs

Then I read here some details on that error and here that the appropriate open option is called INTERLEAVED_READING but that doesn't work either:

my_url <- "https://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf"
res <- sf::st_read(my_url, layer = "points", options = "INTERLEAVED_READING=YES")
#> options:        INTERLEAVED_READING=YES 
#> Reading layer `points' from data source `https://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf' using driver `OSM'
#> Simple feature collection with 72849 features and 10 fields (with 72849 geometries empty)
#> geometry type:  GEOMETRYCOLLECTION
#> dimension:      XY
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
res$geometry
#> Geometry set for 72849 features  (with 72849 geometries empty)
#> geometry type:  GEOMETRYCOLLECTION
#> dimension:      XY
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#> First 5 geometries:
#> GEOMETRYCOLLECTION EMPTY
#> GEOMETRYCOLLECTION EMPTY
#> GEOMETRYCOLLECTION EMPTY
#> GEOMETRYCOLLECTION EMPTY
#> GEOMETRYCOLLECTION EMPTY

Created on 2019-10-08 by the reprex package (v0.3.0)

@agila5
Copy link
Collaborator Author

agila5 commented Oct 8, 2019

The problem does not exist for very small pbf files:

library(geofabric)
get_geofabric("Comores")
#> Downloading http://download.geofabrik.de/africa/comores-latest.osm.pbf to 
#> /tmp/RtmpYvIA3r/Comores.osm.pbf
#> Old attributes: attributes=name,highway,waterway,aerialway,barrier,man_made
#> New attributes: attributes=name,highway,waterway,aerialway,barrier,man_made,maxspeed,oneway,building,surface,landuse,natural,start_date,wall,service,lanes,layer,tracktype,bridge,foot,bicycle,lit,railway,footway
#> Using ini file that can can be edited with file.edit(/tmp/RtmpYvIA3r/ini_new.ini)
get_geofabric("Comores", layer = "points")
#> Data already detected in /tmp/RtmpYvIA3r/Comores.osm.pbf
#> Old attributes: attributes=name,barrier,highway,ref,address,is_in,place,man_made
#> New attributes: attributes=name,barrier,highway,ref,address,is_in,place,man_made,building,natural,surface,source,power,amenity,shop,operator
#> Using ini file that can can be edited with file.edit(/tmp/RtmpYvIA3r/ini_new.ini)

Created on 2019-10-08 by the reprex package (v0.3.0)

@Robinlovelace
Copy link
Member

Yes it seems to be exclusively an issue with giant pbfs. I found this GDAL issue but not sure if it's the same thing:

OSGeo/gdal#1785

@agila5
Copy link
Collaborator Author

agila5 commented Oct 10, 2019

I don't think so since in that issue the OP simply solved his problems with a new version of the .pbf file. In my opinion in this case the questions is: "How to set Use OGR_INTERLEAVED_READING=YES mode"?

I tried with options = "OGR_INTERLEAVED_READING=YES" and options = "INTERLEAVED_READING=YES" (with and without the = sign) but none of them work.

@Robinlovelace
Copy link
Member

Agreed, I can reproduce the problem:

devtools::install_github("itsleeds/geofabric")
library(geofabric)
library(sf)
res = get_geofabric("west yorkshire")
resp = get_geofabric("west yorkshire", layer = "points")
f = file.path(tempdir(), "west yorkshire.osm.pbf")
sf::st_read(f, layer = "points", options = "OGR_INTERLEAVED_READING=YES")
nrow(resp)

@Robinlovelace
Copy link
Member

Seems the results are OK and that the OSM driver simply does not have the interleaved option.

Another reprex:

devtools::install_github("itsleeds/geofabric")
#> Skipping install of 'geofabric' from a github remote, the SHA1 (54b35d73) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(geofabric)
library(sf)
#> Linking to GEOS 3.7.1, GDAL 2.4.2, PROJ 5.2.0
res = get_geofabric("west yorkshire")
#> No exact matching geofabric zone. Best match is West Yorkshire (28.7 MB)
#> Downloading http://download.geofabrik.de/europe/great-britain/england/west-yorkshire-latest.osm.pbf to 
#> /tmp/RtmpZv7nDk/west yorkshire.osm.pbf
#> Old attributes: attributes=name,highway,waterway,aerialway,barrier,man_made
#> New attributes: attributes=name,highway,waterway,aerialway,barrier,man_made,maxspeed,oneway,building,surface,landuse,natural,start_date,wall,service,lanes,layer,tracktype,bridge,foot,bicycle,lit,railway,footway
#> Using ini file that can can be edited with file.edit(/tmp/RtmpZv7nDk/ini_new.ini)
resp = get_geofabric("west yorkshire", layer = "points")
#> No exact matching geofabric zone. Best match is West Yorkshire (28.7 MB)
#> Data already detected in /tmp/RtmpZv7nDk/west yorkshire.osm.pbf
#> Old attributes: attributes=name,barrier,highway,ref,address,is_in,place,man_made
#> New attributes: attributes=name,barrier,highway,ref,address,is_in,place,man_made,building,natural,surface,source,power,amenity,shop,operator
#> Using ini file that can can be edited with file.edit(/tmp/RtmpZv7nDk/ini_new.ini)
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> GDAL Error 1: Too many features have accumulated in lines layer. Use
#> OGR_INTERLEAVED_READING=YES mode
nrow(resp)
#> [1] 72897
f = file.path(tempdir(), "west yorkshire.osm.pbf")
resp = sf::st_read(f, layer = "points", options = "OGR_INTERLEAVED_READING='YES'")
#> options:        OGR_INTERLEAVED_READING='YES'
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options),
#> quiet, : GDAL Message 6: driver OSM does not support open option
#> OGR_INTERLEAVED_READING
#> Reading layer `points' from data source `/tmp/RtmpZv7nDk/west yorkshire.osm.pbf' using driver `OSM'
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> GDAL Error 1: Too many features have accumulated in lines layer. Use
#> OGR_INTERLEAVED_READING=YES mode
#> Simple feature collection with 72897 features and 10 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -2.326617 ymin: 53.3142 xmax: -1.039338 ymax: 54.03102
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
resp
#> Simple feature collection with 72897 features and 10 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -2.326617 ymin: 53.3142 xmax: -1.039338 ymax: 54.03102
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#> First 10 features:
#>    osm_id           name barrier           highway ref address is_in place
#> 1  154941      Flushdyke    <NA> motorway_junction  40    <NA>  <NA>  <NA>
#> 2  154962        Bramham    <NA> motorway_junction  44    <NA>  <NA>  <NA>
#> 3  155014     Belle Isle    <NA> motorway_junction  43    <NA>  <NA>  <NA>
#> 4  155023      Lofthouse    <NA> motorway_junction  42    <NA>  <NA>  <NA>
#> 5  155035      Lofthouse    <NA> motorway_junction  42    <NA>  <NA>  <NA>
#> 6  318787          Haigh    <NA> motorway_junction  38    <NA>  <NA>  <NA>
#> 7  319339 Rothwell Haigh    <NA> motorway_junction  44    <NA>  <NA>  <NA>
#> 8  319387     East Leeds    <NA> motorway_junction  45    <NA>  <NA>  <NA>
#> 9  319577      Austhorpe    <NA> motorway_junction  46    <NA>  <NA>  <NA>
#> 10 319599      Hook Moor    <NA> motorway_junction  43    <NA>  <NA>  <NA>
#>    man_made          other_tags                   geometry
#> 1      <NA>                <NA>  POINT (-1.55581 53.68734)
#> 2      <NA> "name:signed"=>"no"  POINT (-1.34293 53.84462)
#> 3      <NA>                <NA> POINT (-1.517335 53.74997)
#> 4      <NA>                <NA> POINT (-1.514124 53.74169)
#> 5      <NA>                <NA> POINT (-1.516511 53.72566)
#> 6      <NA>                <NA> POINT (-1.552209 53.60796)
#> 7      <NA>                <NA> POINT (-1.499099 53.76189)
#> 8      <NA>                <NA> POINT (-1.477767 53.77228)
#> 9      <NA>                <NA> POINT (-1.414489 53.79955)
#> 10     <NA>                <NA> POINT (-1.338989 53.81439)
nrow(resp)
#> [1] 72897

@agila5
Copy link
Collaborator Author

agila5 commented Dec 14, 2019

Hi! Following the discussion in r-spatial/sf#1213 I tried to create some tests to find the best way to read .osm.pbf file.

# packages
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3

# download data
download.file("https://download.geofabrik.de/europe/great-britain/england/isle-of-wight-latest.osm.pbf", "isle-of-wight-latest.osm.pbf", mode = "wb") 
download.file("https://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf", "greater-london-latest.osm.pbf", mode = "wb")
download.file("https://download.geofabrik.de/europe/italy/nord-ovest-latest.osm.pbf", "nord-ovest-latest.osm.pbf", mode = "wb")

# test import from osm.pbf
read_from_osb_pbf <- function(path) {
  st_read(path, layer = "lines", quiet = TRUE)
}
# test import from gpkg
read_from_gpkg <- function(path) {
  gpkg_file <- paste0(tempfile(), ".gpkg")
  gdal_utils(
    util = "vectortranslate",
    source = path, 
    destination = gpkg_file, 
    options = c("-f", "GPKG", "lines")
  )
  res <- st_read(gpkg_file, quiet = TRUE)
  names(res)[which(names(res) == "geom")] <- "geometry"
  st_geometry(res) <- "geometry"
  res
}

# benchmarks
timings <- bench::mark(
  pbf = read_from_osb_pbf("isle-of-wight-latest.osm.pbf"), 
  gpkg = read_from_gpkg("isle-of-wight-latest.osm.pbf"), 
  iterations = 25
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
timings
#> # A tibble: 2 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 pbf           4.94s    5.09s     0.196    37.7MB    0.807
#> 2 gpkg          3.62s    3.86s     0.259    36.6MB    1.08

IMO It's quite clear that, from this testings, gpkg approach is better. My main concern is that the creation of a new gpkg file can be a problem for bigger .osm.pbf file. For example

path <- "nord-ovest-latest.osm.pbf"
gpkg_file <- paste0(tempfile(), ".gpkg")
system.time({
  gdal_utils(
    util = "vectortranslate",
    source = path, 
    destination = gpkg_file, 
    options = c("-f", "GPKG", "lines")
  )
})
#>    user  system elapsed 
#>  126.30   76.83  204.33
file.size(path) / 1.049e+6 # in MB
#> [1] 378.5979
file.size(gpkg_file) / 1.049e+6 # in MB
#> [1] 643.4351

Now the problem is how to compare pbf and gpkg approaches for bigger file size since none of the followings work

st_read("greater-london-latest.osm.pbf", layer = "lines")
#> Reading layer `lines' from data source `C:\Users\Utente\AppData\Local\Temp\Rtmp4IRONJ\reprex3b052bc5fd0\greater-london-latest.osm.pbf' using driver `OSM'
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> GDAL Error 1: Too many features have accumulated in points layer. Use
#> OGR_INTERLEAVED_READING=YES mode
#> Simple feature collection with 62773 features and 9 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -0.6124681 ymin: 51.2822 xmax: 0.3760979 ymax: 51.78736
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
st_read("greater-london-latest.osm.pbf", layer = "lines", options = c("INTERLEAVED_READING=YES"))
#> options:        INTERLEAVED_READING=YES 
#> Reading layer `lines' from data source `C:\Users\Utente\AppData\Local\Temp\Rtmp4IRONJ\reprex3b052bc5fd0\greater-london-latest.osm.pbf' using driver `OSM'
#> Simple feature collection with 0 features and 9 fields
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs

and if I manually add to my system the OGR_INTERLEAVED_READING variable default to YES then I get the same output as before, i.e. no feature is being read. The following works.

read_from_gpkg("greater-london-latest.osm.pbf")
#> Simple feature collection with 352555 features and 9 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -0.6124681 ymin: 51.22684 xmax: 0.399669 ymax: 51.78736
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#> First 10 features:
#>    osm_id             name     highway waterway aerialway barrier man_made
#> 1      73    Ballards Lane     primary     <NA>      <NA>    <NA>     <NA>
#> 2      74    Ballards Lane     primary     <NA>      <NA>    <NA>     <NA>
#> 3      75        High Road     primary     <NA>      <NA>    <NA>     <NA>
#> 4      79    East End Road     primary     <NA>      <NA>    <NA>     <NA>
#> 5     482 Cockfosters Road     primary     <NA>      <NA>    <NA>     <NA>
#> 6     488      Western Way residential     <NA>      <NA>    <NA>     <NA>
#> 7     489      The Linkway residential     <NA>      <NA>    <NA>     <NA>
#> 8     490    Sherrards Way residential     <NA>      <NA>    <NA>     <NA>
#> 9     491 Grasvenor Avenue residential     <NA>      <NA>    <NA>     <NA>
#> 10   1200      Hanger Lane       trunk     <NA>      <NA>    <NA>     <NA>
#>    z_order
#> 1        7
#> 2        7
#> 3        7
#> 4        7
#> 5        7
#> 6        3
#> 7        3
#> 8        3
#> 9        3
#> 10       8
#>                                                                                                                                                                                                                                 other_tags
#> 1                                                                                                                                 "abutters"=>"mixed","lit"=>"yes","maxspeed"=>"30 mph","oneway"=>"yes","ref"=>"A598","surface"=>"asphalt"
#> 2                                                                                                                            "abutters"=>"retail","lit"=>"yes","maxspeed"=>"30 mph","ref"=>"A598","sidewalk"=>"right","surface"=>"asphalt"
#> 3                                                                                                                                                                    "lit"=>"yes","ref"=>"A1000","abutters"=>"retail","maxspeed"=>"30 mph"
#> 4                                                                                                                                                                                          "lit"=>"yes","maxspeed"=>"30 mph","ref"=>"A504"
#> 5                                                                                                                                                                                          "lit"=>"yes","maxspeed"=>"30 mph","ref"=>"A111"
#> 6                                                                                                                                                "abutters"=>"residential","lit"=>"yes","maxspeed"=>"30 mph","maxspeed:proposed"=>"20 mph"
#> 7                                                                                                  "abutters"=>"residential","lit"=>"yes","maxspeed"=>"30 mph","maxspeed:proposed"=>"20 mph","postal_code"=>"EN5 2BX","surface"=>"asphalt"
#> 8                                                                                                                           "abutters"=>"residential","lit"=>"yes","maxspeed"=>"30 mph","maxspeed:proposed"=>"20 mph","surface"=>"asphalt"
#> 9                                                                                                                           "abutters"=>"residential","lit"=>"yes","maxspeed"=>"30 mph","maxspeed:proposed"=>"20 mph","surface"=>"asphalt"
#> 10 "cycleway"=>"track","lanes"=>"2","lit"=>"yes","maxspeed"=>"40 mph","maxspeed:enforcement"=>"average","not:name"=>"Hangar Lane (North Circular Road)","oneway"=>"no","operator"=>"Transport for London","ref"=>"A406","sidewalk"=>"both"
#>                          geometry
#> 1  LINESTRING (-0.1778849 51.6...
#> 2  LINESTRING (-0.193124 51.60...
#> 3  LINESTRING (-0.1767883 51.6...
#> 4  LINESTRING (-0.1979855 51.5...
#> 5  LINESTRING (-0.1607372 51.6...
#> 6  LINESTRING (-0.1886468 51.6...
#> 7  LINESTRING (-0.1883204 51.6...
#> 8  LINESTRING (-0.1896293 51.6...
#> 9  LINESTRING (-0.1892364 51.6...
#> 10 LINESTRING (-0.291608 51.51...

but I cannot compare it to the other option. Still, I think it's quite an extreme case to use much bigger pbf file than these examples so I think it's safe to assume that we should implement the gpkg approach.

Created on 2019-12-14 by the reprex package (v0.3.0)

@Robinlovelace
Copy link
Member

Robinlovelace commented Dec 14, 2019

Great summary @agila5, thanks for the detailed summary of pros (and cons?) of the gpkg approach. As far as I can see the only con is that gpkg files can be huge, right? Not such a big problem if you have a big computer 😺 impressive that your read-in all the roads for London on a laptop at all, let alone in an acceptable amount of time (how long did it take?).

@Robinlovelace
Copy link
Member

Just testing read-in times on my coputer now.

@Robinlovelace
Copy link
Member

Result: 12 seconds to read-in all of London. Not bad...

library(geofabric)
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
f = geofabric::gf_filename("Greater London")
file.exists(f)
#> [1] TRUE
file.copy(f, ".")
#> [1] TRUE
path = "Greater London.osm.pbf"
read_from_gpkg <- function(path) {
  gpkg_file <- paste0(tempfile(), ".gpkg")
  gdal_utils(
    util = "vectortranslate",
    source = path, 
    destination = gpkg_file, 
    options = c("-f", "GPKG", "lines")
  )
  res <- st_read(gpkg_file, quiet = TRUE)
  names(res)[which(names(res) == "geom")] <- "geometry"
  st_geometry(res) <- "geometry"
  res
}
system.time({london = get_geofabric("Greater London")})
#> Data already detected in ~/hd/data/osm/Greater London.osm.pbf
#> Old attributes: attributes=name,highway,waterway,aerialway,barrier,man_made
#> New attributes: attributes=name,highway,waterway,aerialway,barrier,man_made,maxspeed,oneway,building,surface,landuse,natural,start_date,wall,service,lanes,layer,tracktype,bridge,foot,bicycle,lit,railway,footway
#> Using ini file that can can be edited with file.edit(/tmp/RtmpiDxFrZ/ini_new.ini)
#> Warning in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
#> GDAL Error 1: Too many features have accumulated in points layer. Use
#> OGR_INTERLEAVED_READING=YES mode
#>    user  system elapsed 
#>   2.927   0.285   3.133
system.time({london = read_from_gpkg("Greater London.osm.pbf")})
#>    user  system elapsed 
#>  12.277   0.723  12.605

Created on 2019-12-14 by the reprex package (v0.3.0)

@agila5
Copy link
Collaborator Author

agila5 commented Jan 6, 2020

I think I found a problem with the gpkg approach. For example

# download data
rutland <- download.file(
  url = "http://download.geofabrik.de/europe/great-britain/england/rutland-latest.osm.pbf", 
  destfile = "rutland-latest.osm.pbf", 
  mode = "wb"
)

# create the ini_file
my_ini_file <- paste0(tempfile(), ".ini")
writeLines(
  text = geofabric::make_ini_attributes(c("maxspeed", "oneway", "lanes"), layer = "lines", append = TRUE), 
  con = my_ini_file
)
#> Old attributes: attributes=name,highway,waterway,aerialway,barrier,man_made
#> New attributes: attributes=name,highway,waterway,aerialway,barrier,man_made,maxspeed,oneway,lanes

# read as .osm.pbf
sf::st_read("rutland-latest.osm.pbf", layer = "lines", options = paste0("CONFIG_FILE=", my_ini_file))
#> options:        CONFIG_FILE=/tmp/RtmpV4dKBl/file4b6371112665.ini 
#> Reading layer `lines' from data source `/tmp/RtmphU4r4K/reprex4524395caa51/rutland-latest.osm.pbf' using driver `OSM'
#> Simple feature collection with 6987 features and 12 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -0.8754549 ymin: 52.21852 xmax: -0.2619499 ymax: 53.22897
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs

There are 12 columns as expected, i.e. the 9 default fields (6 + osm_id + z_value + geom) and the 3 additional fields.

# transform as gpkg and read that gpkg file. Set the same CONFIG_FILE option. 
my_gpkg_file <- paste0(tempfile(), ".gpkg")
sf::gdal_utils(
  util = "vectortranslate", 
  source = "rutland-latest.osm.pbf", 
  destination = my_gpkg_file, 
  options = c("-f", "GPKG", "-oo", paste0("CONFIG_FILE=", my_ini_file), "lines")
)

sf::st_read(my_gpkg_file, layer = "lines")
#> Reading layer `lines' from data source `/tmp/RtmpV4dKBl/file4b636ebbf131.gpkg' using driver `GPKG'
#> Simple feature collection with 6987 features and 9 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -0.8754549 ymin: 52.21852 xmax: -0.2619499 ymax: 53.22897
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs

Created on 2020-01-06 by the reprex package (v0.3.0)

In the gpkg approach there are only 9 attributes. How can we add the additional attributes? I used the "-oo" parameter because I read here that it should be used to set open options for the input dataset (like CONFIG_FILE).

@Robinlovelace
Copy link
Member

Hey @agila5 (and happy NY via GH : ) well spotted.

My first response is 🤦‍♂ , my second
image

Not sure what the best way forward is now? I suggest a question on https://gis.stackexchange.com/

@agila5
Copy link
Collaborator Author

agila5 commented Jan 6, 2020

Hey @agila5 (and happy NY via GH : ) well spotted.

Ahahhaha, happy NY 🚀

Not sure what the best way forward is now? I suggest a question on https://gis.stackexchange.com/

I will post a new question tomorrow!

@Robinlovelace
Copy link
Member

I will post a new question tomorrow!

@agila5 you may not need to, see OSGeo/gdal#2100

@Robinlovelace
Copy link
Member

Update, I still cannot work out how to do it from sf. This fails:

sf::gdal_utils(
  util = "vectortranslate", 
  source = "rutland-latest.osm.pbf", 
  destination = my_gpkg_file, 
  options = c("-f", "GPKG", "lines", "rutland-latest.osm.pbf", paste0("CONFIG_FILE=", my_ini_file))
)

I suspect an issue in sf may be required.

Heads-up @edzer, were're trying to work out how to do this from sf:

ogr2ogr -f gpkg -oo CONFIG_FILE=osmconf2.ini -oo INTERLEAVED_READING=YES osm3.gpkg greater-london-latest.osm.pbf

@edzer
Copy link

edzer commented Jan 6, 2020

have you tried to add option c("-oo", "CONFIG_FILE=osmconf2.ini")?

@agila5
Copy link
Collaborator Author

agila5 commented Jan 6, 2020

Hi @edzer and thank you for your help. I created the following reprex that should show my doubts:

rutland <- download.file(
  url = "http://download.geofabrik.de/europe/great-britain/england/rutland-latest.osm.pbf", 
  destfile = "rutland-latest.osm.pbf", 
  mode = "wb"
)

my_ini_file <- paste0(tempfile(), ".ini")
writeLines(
  text = geofabric::make_ini_attributes(c("maxspeed", "oneway", "lanes"), layer = "lines", append = TRUE), 
  con = my_ini_file
)
#> Old attributes: attributes=name,highway,waterway,aerialway,barrier,man_made
#> New attributes: attributes=name,highway,waterway,aerialway,barrier,man_made,maxspeed,oneway,lanes

# works since there are 12 fields
sf::st_read("rutland-latest.osm.pbf", layer = "lines", options = paste0("CONFIG_FILE=", my_ini_file))
#> options:        CONFIG_FILE=/tmp/Rtmp8fdjka/file323d1c80a035.ini 
#> Reading layer `lines' from data source `/tmp/Rtmp04PdXF/reprex30fd402197d5/rutland-latest.osm.pbf' using driver `OSM'
#> Simple feature collection with 6987 features and 12 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -0.8754549 ymin: 52.21852 xmax: -0.2619499 ymax: 53.22897
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs

my_gpkg_file <- paste0(tempfile(), ".gpkg")
sf::gdal_utils(
  util = "vectortranslate", 
  source = "rutland-latest.osm.pbf", 
  destination = my_gpkg_file, 
  options = c("-f", "GPKG", "-oo", paste0("CONFIG_FILE=", my_ini_file), "lines")
)
# doesn't work since there are 9 fields missing
sf::st_read(my_gpkg_file)
#> Reading layer `lines' from data source `/tmp/Rtmp8fdjka/file323d15106884.gpkg' using driver `GPKG'
#> Simple feature collection with 6987 features and 9 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -0.8754549 ymin: 52.21852 xmax: -0.2619499 ymax: 53.22897
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs

system(paste0("ogr2ogr -f GPKG -oo CONFIG_FILE=", my_ini_file, " test.gpkg rutland-latest.osm.pbf lines"))
# works since there are 12 fields
sf::st_read("test.gpkg")
#> Reading layer `lines' from data source `/tmp/Rtmp04PdXF/reprex30fd402197d5/test.gpkg' using driver `GPKG'
#> Simple feature collection with 6987 features and 12 fields
#> geometry type:  LINESTRING
#> dimension:      XY
#> bbox:           xmin: -0.8754549 ymin: 52.21852 xmax: -0.2619499 ymax: 53.22897
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs

Created on 2020-01-06 by the reprex package (v0.3.0)

I don't understand what's the difference between the options provided to sf::gdal_utils and the options provided to ogr2ogr.

have you tried to add option c("-oo", "CONFIG_FILE=osmconf2.ini")?

I added c("-f", "GPKG", "-oo", paste0("CONFIG_FILE=", my_ini_file), "lines") to the sf::gdal_utils functions and I used the same .ini file in both examples.

edzer added a commit to r-spatial/sf that referenced this issue Jan 6, 2020
@edzer
Copy link

edzer commented Jan 6, 2020

You're right, the options following -oo need special handling; should work now.

@agila5
Copy link
Collaborator Author

agila5 commented Jan 6, 2020

I just tested it with the same example as in the previous comment and it works, many thanks @edzer.

edzer added a commit to r-spatial/sf that referenced this issue Jan 6, 2020
edzer added a commit to r-spatial/sf that referenced this issue Jan 6, 2020
@agila5
Copy link
Collaborator Author

agila5 commented Jan 21, 2020

In the last weeks I think I solved the problem we had reading pbf files creating ad-hoc functions for translation of pbf formato into gpkg format and reading of gpkg format. I didn't push anything to the package repo because I'm still not sure about a few things but if you are working on geofabric during these days I can create a PR later in the afternoon so we discuss a few ideas there.

@Robinlovelace
Copy link
Member

Great thanks for the update, sounds like great progress Andrea. Yes please do put in a PR, will be great to have this issue solved!

@agila5
Copy link
Collaborator Author

agila5 commented Jul 10, 2020

Fixed with the new approach following the previous suggestions

@agila5 agila5 closed this as completed Jul 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants