-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osmdata_sc function #148
Comments
Current can't handle points coz of this issue, but otherwise should be good to go. Plan is to generate
And the workaround for current lack of POINT objects in ping @mdsumner: This is a nifty solution for OSM structure in general, and requires neither modification to |
Sounds great, but is this function only code on your local computer? Note that I merged the binary branch today, and SC now only has object, edge, vertex tables (no object_link_edge) - there's no "edge_" ID, and "object_" is on edge. Do you think that's a bad idea? I feel like edges don't require an explicit ID, at least in most cases. |
If I may disagree - there are many cases where edges would require and explicit ID, plus I think that all levels of all tables should have IDs to enable consistent |
Sorry for intruding on a debate I know very little about but if |
If that makes any sense at all! If not feel free to ignore - excited to see where this newfangled |
@Robinlovelace yeah, I largely agree. I actually suspect that the simplicity of Ref to core |
Well true, I'm enamoured by having Dang, I have to revert to the original TRI and SC. |
Well, it's not hard to change (I'm finding!). I did have the edge/segment distinction before but wasn't clear. So, edge table gets .vx0, .vx1, edge_. The link table gets object_, edge_, direction_. Direction is a record of the input orientation. Either .vx0,.vx1 or .vx1,.vx0. So edge is unique, the link table records the instances. Makes sense? |
Topojson stores orientation, I think the direction of an arc around a polygon - presumably osm does the same for ways? |
There's no sense of orientation in OSM, everything is simply sequential, and inner polygons are simply tagged as such. But I've another quick Q for you Mike: What is the motivation behind having both |
In the initial implementation the edge table included "edge_" and "segment_", because I thought segment was a good name for the instances of an edge. But now I think it was confusing.
Edges are normalized by parallel sort of .vx0, .vx1 - so we don't know their original orientation - but it is recorded on the link table whether the edge was re-oriented or not - one will be TRUE and one FALSE when the edge is shared. e.g. in my local branch x <- SC(minimal_mesh)
purrr::map_int(x, nrow)
object object_link_edge edge vertex meta
2 16 15 14 1
print(x)
class : SC
type : Primitive
vertices : 14
primitives : 15 (1-space)
crs : NA 14 unique coordinates, (two are shared) 15 edges, one is repeated where the features touch. The sf object has 19 coordinates, 3 are repeats (at the end of each path) and 2 are shared. So 19 - 2 - 3 sc_coord(minimal_mesh)
# A tibble: 19 x 2 It could be one edge table, and maybe it should be - but I see this as pretty key. SC0 has a nested edge table with object implicitly, so it's not far way - or you can join back up from the link table. |
Oh, now I see -thanks! All good to leave as is. I'm nearly there ... |
Awright, behold the topology branch. SC0, SC, and TRI all work as intended. PATH and ARC will have to wait. SC derives from SC0, and SC can ingest TRI (which is cool), but not the other way around (TRI is PATH-based, not edge-based). SC0 takes points! SC is purely edges. There are plot and print methods. |
Awright, behold
It's got all sorts of nested polygons for islands and guff like that. |
And somewhat contradicting my comments in this commit, the full benchmarks now have |
I folded all new stuff into silicate master branch, just FYI |
That's great - that'll make it much easier for me to start delving in. Nice work! |
Would it be reasonable to write It may not work , but it'd help me to see an attempt to decompose OSM in that way. If you're adept at xml2 (maybe it's slow?) it'd be great to be able to attack the doc that way as well, at least for comparing. |
This is the first issue re-opener as anticipated above. |
Reopened to add proper vertex info, extending from this |
On where info belongs, I just meant in some circumstances - because, if you measure x, y, z, time, temperature, then all those belong on the vertices, they are uncontroversially measurements in "geometry". It's just that if we normalize these data (make unique in x, y or x, y, z ...) - then whatever wasn't in the unique-ifying set belongs on the instances of vertex table. I find this requirement bites me in different directions and I haven't sorted it all out yet. The other thing I've been thinking about is the models, PATH, SC etc. - it's starting to seem like the 'object' table should really be the paths or the edges, and higher level stuff exists on more tables. This came to me while thiking about plotmethods, it would be nice to be able to pass in n-colours when there are n-paths - rather than always working on the "feature level" grouping. Then your sphier ideas really come into it. I only think tables should be split when there's been some de-duplication, vertices can store anything, including text properties. Of course, it's also important that we don't get too carried away - I think the SC, TRI, PATH, and ARC structures as they are (and when they work properly) are pretty right. I get caught by the de-duplication thing when I split a DEL mesh for feature constants (e.g. height as SIDS79, because now unique in x, y, z not x, y) - and then I cannot re-triangulate that with Triangle because of the non-xy uniqueness. I feel like that was leading me down a tangled path. |
The last two commits have sorted out the basic table structure. FYI @mdsumner it now delivers perfectly standard tables for Still, however, no points, and no ways of re-mapping ID values in the |
hey I've so far been assuming that library(osmdata)
x <- opq ("hampi india") %>%
add_osm_feature (key="historic", value="ruins") %>%
osmdata_sc ()
library(dplyr)
nrow(distinct(x$object, object_)) == nrow(x$object)
setdiff(x$object$object_, x$object_link_edge$object_)
I'm having a look at multiple object instances (this is where sphier extension is really needed) - and how to handle those with |
Oh yes there is:
And we have:
Easy! Then what happens if we want to group the coordinates of all ways? (This will be needed if
Looks good. Full
And |
yes, but with caveat above that I posted ~20s before you ;) It's late so I'm out for today |
Ah damn ... I can't get There are also "meta" objects - the OSM "relations" - that do not map directly on to edges. They simply contain other objects which are internally referenced in the As you clearly realise, what I've tried to do here is effectively sneak in a bit of the |
I think it's cool, just needs silicate to be more robust - so we can plot and convert and so on. Print seems fine, and that implies a regime of rules about matching that doesn't assume stuff like this . Aand it's late ;) |
I am a bit worried about this, the process so far has been to unjoin tables up the chain, so (in my speak) you would have a "instances of the objects" table by going distinct on library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
x <- opq("hobart") %>% # Just returns the inner city, so is small and workable here
add_osm_feature(key = "highway") %>%
osmdata_sc ()
ujd <- unjoin::unjoin(x$object, object_, key_col = "obj_pk")
x1 <- x
x1$object <- ujd$obj_pk
## this step is weird, because downstream plot etc. is not robust to an object that
## has no edges (but we still have object_instance as a record)
x1$object <- x1$object %>% dplyr::filter(object_ %in% x1$object_link_edge$object_)
x1$object_instance <- ujd$data
library(silicate)
#>
#> Attaching package: 'silicate'
#> The following object is masked from 'package:stats':
#>
#> filter
plot(x1)
plot(SC0(x1)) ## even this works, cool but useless
plot(anglr::DEL(x1))
#> dropping untriangulatable objects Created on 2018-11-27 by the reprex package (v0.2.1) So, now x1 is now SC with primary keys from object down, and we have a dangling "what kind of extension object is this ..." question. I'm not wedded to this, just airing my thoughts. I'm a little worried about #89 which would otherwise have to do something similar every time it converted or plotted, so it begs the question of how we'd keep those extra data at all. |
I think your "robustness" is exactly what should be aimed for here, and this ought not be a real biggy. The extra data don't need to be kept at all, they just need to be presumed to be potentially present. In the slight change I made, the |
Cool, you are already onto it - that's what I'm in the middle of explaining in a reprex - I'll still add it above so you can see what I'd do. So, the reprex above is 1) and there's also 2)
My concerns still with 2) are that it provides some useability problems, but you're exactly right that these kind of things will occur anyway so it has to be dealt with. |
One more note, I'm happy for you to close this
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
x <- opq("hobart") %>% # Just returns the inner city, so is small and workable here
add_osm_feature(key = "highway") %>%
osmdata_sc ()
ujd <- unjoin::unjoin(x$object, object_, key_col = "obj_pk")
x1 <- x
x1$object <- ujd$obj_pk
## this step is weird, because downstream plot etc. is not robust to an object that
## has no edges (but we still have object_instance as a record)
x1$object <- x1$object %>% dplyr::filter(object_ %in% x1$object_link_edge$object_)
x1$object_instance <- ujd$data
x1$object <- tibble::tibble(object_ = "I am the one true object")
x1$object_link_edge$object_ <- "I am the one true object"
library(silicate)
#>
#> Attaching package: 'silicate'
#> The following object is masked from 'package:stats':
#>
#> filter
plot(anglr::DEL(x1))
```
![](https://i.imgur.com/IIVMeJF.png)
<sup>Created on 2018-11-27 by the [reprex package](https://reprex.tidyverse.org) (v0.2.1)</sup> |
What are your thoughts now on |
I'm still worried about this, I haven't been able to focus on it since it came up. Thus far I've been constantly assuming that everything in a model belongs and is connected, and so this idea of unlinked entities is disconcerting. I'm worried that "robustifying" becomes an endless tail chasing, where the simplest form of |
Okay, then I'll push on with things as they are, which means a mucked-up |
Ah ok, I need reassurance like this - we good. |
Do these osm data often have points lines and polygons? That is a natural split,shared vertices, shared edges, sets of objects with shared attributes - not necessarily heirarchical but can be. The obvious next case is objects with no geometry, so it's perfectly on-vision. |
They have just three classes of "nodes" (= points, but can also have attributes / features), "ways" = sequences of points whatever they may be (no necessary distinction between lines and polygons, but that can be made if desired), and "relations" = any and all higher-order relationships between sets of ways, points, or combinations. So in your terms above, "ways" define the edges, and "ways" = shared vertices, and "ways" can also define shared objects because the vertices, as well as the entire ways themselves, can have and share attributes. "relations" are then strict sets of objects with shared attributes. And importantly here, "relations" themselves have no geometry, rather the geometry merely inheres within the objects (to) which they relate. The two things that affect the current vision of an
and
In that case, "key" is serving to define an OSM "role", which is an arbitrary string describing the role of the member, but which is often used to define inner and outer components of multipolygon objects (as in this case), with "val" naming the member way. |
Aha, so we definitely need to be careful about edges - relations are completely general links between entity tables. But, from a geospatial perspective edges are a necessary decomposition of structured data. I feel like we are reaching for a "relations" concept that's not supported yet, much more like your sphier ideas. Silicate edges are definitely about linking vertices, I don't care about what the fields/columns/attributes are on the vertices, but the vertex table is what "edge" in silicate is about. I'm also inspired by models where location is the question, we have streams of data where we use location as the solution/s for wads of measured data - the fact that the animal travelled between those locations in an ordered sequence in time is not controversial - where it actuall was geographically at/near the nodes is the crux, the measured data is pretty clean but using them as a proxy for long/lat is messy. Relations, edges in abstract graphs that describe all these things are much more general than SC. SC can represent a more general graph, no question - but I'm concerned that we are over-reaching a bit here. The vertex table is not nodes, and so maybe that's the missing entity here to bridge to sphier. |
@mdsumner Another status request from my side here. This is the only outstanding issue prior to next release and major upgrade to Solutions:
The latter would be my current preference, and I presume yours too. I'll describe normalization below for reference here, and that will leave only two primary questions here for you:
NormalizationCurrent
The latter is the only potential conflict with current Sub-issue (1): Vertices are not nodesIn OSM, nodes have properties, yet these can or ought not be considered or stored as "objects" in the above schema, and nor are they vertices. These properties thus have to be stored in an additional table. There could be just one additional table called Sub-issue (2): Relations tablePutting nodes in their own table would enable a
The As long as |
I like the idea of I am still inclined to make silicate robust to dangles hanging around elsewhere, i.e. unlinked vertices or edges - I think they will occur, and could even be useful for implicit data or data that exists elsewhere. But, it's not a good starting point to allow anything. Sub (1) The nodes thing is tricky, but I think it's fair to assume that normal normalization is planar, and so the common assumption that these are nodes in x/y (even if those are implicit) is fair enough. I've only ever used different assumptions with triangles (break the mesh for discrete polygons that aren't neighbours in z), or for track data (unique in x, y and so other data goes on a link table - time, z, temperature whatever - this is a general concept of normalizing geometry but is rightly a specializing extension). Sub (2) I think I understand, I'm happy with your points overall. Finally, I haven't moved on this just because of other tricky work that's rather pressing but I should be able to do more before xmas. |
@mdsumner In the hope that we can judge how close we're getting by the scope of the questions, this one is tiny: The above commit implements what I described above, and ends with
I admit to no current understanding at all what this |
Yes all good, join_ramp is best ignored |
See this
silicate
issue for motivation here, and ping @mdsumnerThe text was updated successfully, but these errors were encountered: