Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug in trim_osmdata() #253

Open
Mashin6 opened this issue Nov 23, 2021 · 8 comments
Open

Possible bug in trim_osmdata() #253

Mashin6 opened this issue Nov 23, 2021 · 8 comments

Comments

@Mashin6
Copy link
Contributor

Mashin6 commented Nov 23, 2021

There might be possibly a bug in how trim_osmdata() filters the data. It seems to be removing some objects that should be in the final dataset.

library(tidyverse)
library(sf)
#> Linking to GEOS 3.8.1, GDAL 3.2.1, PROJ 7.2.1
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright

townsQ <- getbb("Southeastern Connecticut COG", featuretype = "boundary") %>% 
            opq()
townsQ$prefix <- '[out:xml][timeout:250];\n rel[name="Southeastern Connecticut COG"]; \n map_to_area->.a;(\n'
townsQ$features <- "[\"admin_level\"=\"8\"](area.a)"

towns <- townsQ %>% 
            osmdata_sf()

towns$osm_multipolygons |> ggplot() + geom_sf()

area <- getbb("Southeastern Connecticut COG", featuretype = "boundary", format_out = "polygon")
towns <- opq(bbox=area, timeout = 250) %>%
                add_osm_feature(key="admin_level", value="8") %>%
                osmdata_sf() %>%
                trim_osmdata(area)

towns$osm_multipolygons |> ggplot() + geom_sf()

packageVersion("osmdata")
#> [1] '0.1.8'
R.Version()$version.string
#> [1] "R version 4.1.1 (2021-08-10)"

Created on 2021-11-23 by the reprex package (v2.0.1)

@mpadge
Copy link
Member

mpadge commented Jan 2, 2022

Thanks @Mashin6, and apologies for the delay in responding here. It's not a bug, it reflects the operation of the exclude parameter in the trim_osmdata() function:

library (ggplot2)
library (osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
area <- getbb("Southeastern Connecticut COG", featuretype = "boundary", format_out = "polygon")
towns <- opq(bbox = area) %>%
                add_osm_feature(key="admin_level", value="8") %>%
                osmdata_sf()
plot1 <- function (towns, exclude = TRUE) {
    towns |>
        trim_osmdata(area, exclude = exclude) |>
        magrittr::extract2 ("osm_multipolygons") |>
        ggplot() +
        geom_sf()
}
plot1 (towns, exclude = TRUE)
#> Loading required namespace: sf

plot1 (towns, exclude = FALSE)

Created on 2022-01-02 by the reprex package (v2.0.1.9000)

@mpadge mpadge closed this as completed Jan 2, 2022
@Mashin6
Copy link
Contributor Author

Mashin6 commented Jan 3, 2022

Though e.g. Colchester is completely within SCCOG, because the same ways that define COG boundary also define the boundary of the town. So I would still expect it to be included.
Hmm unless the polygon from Nominatim has wrong shape. I guess more reason to have server side trimming.

www.openstreetmap.org/relation/11065863
www.openstreetmap.org/relation/11059185

@mpadge
Copy link
Member

mpadge commented Jan 3, 2022

Yeah, right, it's an issue with how to interpret multipolygons. The trim operation for Colchester is extracting the internal component of the multipolygon. In cases where internal components of multipolygons are all distinct and do not share any boundaries, they will generally (in OSM terms) define holes within the surrounding main polygon. It might thus be more appropriate to default to extracting the primary surrounding polygon, rather than the internal components. I'll re-open to implement that.

@mpadge mpadge reopened this Jan 3, 2022
mpadge added a commit that referenced this issue Jan 3, 2022
@mpadge
Copy link
Member

mpadge commented Jan 3, 2022

@Mashin6 The above commit improves the trim_osmdata() function, but your issue still remains. It is, however, due to a bug in the OSM data set, as this reprex illustrates:

library (ggplot2)
library (sf) # load for sub-setting
#> Linking to GEOS 3.9.1, GDAL 3.3.1, PROJ 8.0.1; sf_use_s2() is TRUE
library (osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
area <- getbb("Southeastern Connecticut COG", featuretype = "boundary", format_out = "polygon")
towns <- opq(bbox =area) |>
                add_osm_feature(key="admin_level", value="8") |>
                osmdata_sf()
mp <- towns$osm_multipolygons$geometry [[22]] |>
    st_sfc (crs = 4326)
ggplot (mp) + geom_sf ()

bb <- st_sfc (st_polygon (list (area)), crs = 4326)
st_within (mp, bb)
#> Sparse geometry binary predicate list of length 1, where the predicate
#> was `within'
#>  1: (empty)

That result shows that the multipolygon does not lie within the bounding box. The following shows in a bit more detail why that happens (using sp for simple return of direct integer values), including first necessary step of transforming to non-elliptical CRS:

mp <- st_transform (mp, 3857) [[1]] [[1]] [[1]] # multipolygon
bb <- st_transform (bb, 3857) [[1]] [[1]]
pip <- sp::point.in.polygon (mp [, 1], mp [, 2],
                             bb [, 1], bb [, 2])
table (pip)
#> pip
#>   0   1   3 
#>   2  83 705

Created on 2022-01-03 by the reprex package (v2.0.1.9000)

The point.in.polygon function returns:

  • 0 for points outside the polygon
  • 1 for points which lie on the polygon
  • 3 for points which lie within the polygon

And so that result shows that 2 points on the boundary of Colchester lie geographically outside the bounding area. The overpass API correctly filters because it uses relation memberships based on OSM ID values. That works in this case even though it arguably shouldn't because the result is geographically invalid. The osmdata approach remains geographically valid, and will return identical results to server-side filtering as long as those non-geographic results are also geographically correct. In this (hopefully outlier) case, they are not geographically correct, and so osmdata does what I would consider to be the appropriate thing. Does that make sense?

@mpadge mpadge closed this as completed Jan 3, 2022
@Mashin6
Copy link
Contributor Author

Mashin6 commented Jan 3, 2022

Thanks for the analysis. I think you are right, I checked and Nominatim version of geometry for SCCOG is different from current OSM version. It seems Nominatim does not update geometry in cases where only nodes are moved, but way or relation is not touched. Might be worth adding a warning message to trim_osmdata() about possibility that results might not always be correct.

Side note, map_to_area; in OverPass does spatial subsetting of objects and does not need IDs. But it also includes objects that are only partially inside the area.

@mpadge
Copy link
Member

mpadge commented Jan 3, 2022

I notionally agree with you about adding a note, and tried doing that, but the function actually does return results that are strictly correct, and according to the above example, in fact more correct than the internal overpass trimming. Correctness can only be judged in geometric terms, and the function in current form is geometrically correct. Feel free to have a go yourself if you can think of a smart way to phrase such a note (as an extension of the Note in trim_osmdata), but please ensure it clearly states that results will always be geometrically correct, even if they are ... unexpected, or whatever adjective might be used to describe cases like the above.

My attempt went something like this:

The operation is also a strictly geometric trimming, and may not necessarily yield expected results when used in combination with bounding polygons obtained from other sources, potentially including Nominatim.

That would only make sense if it were clear what expected results might look like, which it is not. Don't know how better to do that for now.

@Mashin6
Copy link
Contributor Author

Mashin6 commented Jan 4, 2022

Right, this is a very special case of using trim_osmdata. But since it is suggested in the help page of the function, having a sentence or two might save users some time when scratching their heads why were some objects being removed.

What about this?

Caution is advised when using polygons obtained from Nominatim via getbb(..., format_out = "polygon"|"sf_polygon"). These shapes can be outdated and thus could cause that trimming operation might not give results expected based on the current state of OSM data.

@mpadge
Copy link
Member

mpadge commented Jan 4, 2022

That sounds great - could you please add that via a pull request? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants