Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan database re-import #1504

Closed
6 tasks
matthijsmelissen opened this issue Apr 28, 2015 · 114 comments
Closed
6 tasks

Plan database re-import #1504

matthijsmelissen opened this issue Apr 28, 2015 · 114 comments

Comments

@matthijsmelissen
Copy link
Collaborator

This is a management issue for the upcoming re-import of the database. A re-import of the database gives us the opportunity to change the keys that are available to the stylesheet.

We will use the lua tag transformation option from osm2pgsql, and enable the hstore option.

Questions to be answered

  • Decide for which keys we want to change the polygon vs linestring behaviour for closed ways.
  • Decide for which key/value combinations we want to change the polygon vs linestring behaviour for closed ways.
  • Is there anything in particular that needs to be done to prepare for hstore?
  • Do we want to move the z_order code for roads to lua?
  • Are there any other changes we want to make?

Action plan

  • Decide on the questions above.
  • Prepare SQL for hstore
  • Implement changes in lua file.
  • Contact server admins to discuss date of database reload.
  • Database reload.
  • Start resolving issues that need the new database scheme.

Requested changes

The following changes have been requested:

Keys to be added

Adding hstore will avoid the need to manually add these keys.

Keys to be added

Adding hstore will avoid the need to manually add these keys.

Keys to be treated as linestring

Keys to be treated as polygon

Key/value combinations to be treated as linestring

Pull request written, see osm2pgsql-dev/osm2pgsql#346.

Key/value combinations to be treated as polygon

Pull request written, see osm2pgsql-dev/osm2pgsql#346.

Also include as way

This has already been fixed in the current default lua style file.

@matthijsmelissen
Copy link
Collaborator Author

@pnorman @gravitystorm Could you have a look what else should be added to this issue?

@matthijsmelissen
Copy link
Collaborator Author

@pnorman wrote in #1243:

At the same time as introducing hstore, the number of fixed columns should be reduced, and this should lead to a noticeable performance gain, but this requires benchmarking work and query rewrites.

This is already quite a big project, and it lingers already for quite some time. Would it be an option to only enable hstore in the first database reload, and drop the columns that have become spurious due to hstore only in the second database reload (in 6-12 months)? I think that would also make things operationally much simpler, because we wouldn't need to coordinate the database reload and the stylesheet rollout that tightly.

As I said, it's quite a big project given the amount of developer time, and I'm afraid that if we want to do too much as the same time, we will keep pushing this issue forward forever. On the other hand, the performance benefit would be very welcome.

@imagico
Copy link
Collaborator

imagico commented Apr 28, 2015

Just for clarification - hstore will allow later use of keys that were not originally on the list?

Otherwise a larger scale brainstorming on what important keys are currently missing would be in order.

And it's leaf_type, not leaftype by the way.

@matthijsmelissen
Copy link
Collaborator Author

Just for clarification - hstore will allow later use of keys that were not originally on the list?

Yes, but only for objects that have another key that is in the list. For example, if we have an object with man_made=pipeline and location=underground, we would be able to access underground even though underground is not in the list. However, we wouldn't be able to access an object only tagged emergency=ambulance_station if emergency is not in the list.

Otherwise a larger scale brainstorming on what important keys are currently missing would be in order.

Yes, I think we should have a larger scale brainstorming anyway (but hstore should reduce the number of tags that we need).

And it's leaf_type, not leaftype by the way.

Thanks, corrected, things seem to have changed since the issue was created.

@imagico
Copy link
Collaborator

imagico commented Apr 28, 2015

Ok - so the most important thing is to have all primary keys on the list while secondary keys can also be used ad hoc later if the object has a known primary key.

What comes to mind is geological - not that widely used but widespread possible applications.

Other primary keys that are fairly widespread in use and could be considered for rendering are:

For secondary keys i would have glacier:type and seasonal

@pnorman
Copy link
Collaborator

pnorman commented Apr 28, 2015

This is already quite a big project, and it lingers already for quite some time. Would it be an option to only enable hstore in the first database reload, and drop the columns that have become spurious due to hstore only in the second database reload (in 6-12 months)? I think that would also make things operationally much simpler, because we wouldn't need to coordinate the database reload and the stylesheet rollout that tightly.

I've gotten most of the way through the style file converting to hstore, so I should soon be able to start seeing what columns shouldn't be in hstore.

@matthijsmelissen
Copy link
Collaborator Author

I've gotten most of the way through the style file converting to hstore, so I should soon be able to start seeing what columns shouldn't be in hstore.

That's good news! Any work in progress you can show?

@pnorman
Copy link
Collaborator

pnorman commented Apr 28, 2015

@matthijsmelissen
Copy link
Collaborator Author

Great work! It looks indeed like most work on rewriting the queries has been done.

Would it be easy to convert the data from the .style file to the .lua file?

I saw that you dropped for example aerialway from the .style file. Just to check my understanding, I suppose you'll need to add it back at a later point?

@pnorman
Copy link
Collaborator

pnorman commented Apr 28, 2015

I saw that you dropped for example aerialway from the .style file. Just to check my understanding, I suppose you'll need to add it back at a later point?

That's the unknown. There is no need to have any columns for specific tags, as anything not in a column is in with hstore.

That doesn't mean it's a good idea. For some, for example amenity, landuse, highway, natural, and building, it is almost certainly a good idea to have them as a column. To my knowledge, no one has done any rigorous testing of what should and should not be a column. That's my intent.

@matthijsmelissen
Copy link
Collaborator Author

That's the unknown. There is no need to have any columns for specific tags, as anything not in a column is in with hstore.

I'm still a bit confused about it all. How will osm2pgsql know whether to add an object with an aerialway as a row to the database? Currently we filter out objects that do not have any of the keys in the list of keys. Will this remain the case? If so, how does osm2pgsql know not to filter out aerialways if there is no aerialway in the list of keys?

However, we wouldn't be able to access an object only tagged emergency=ambulance_station if emergency is not in the list.

Do I understand it correctly that this is incorrect?

@pnorman
Copy link
Collaborator

pnorman commented Apr 28, 2015

However, we wouldn't be able to access an object only tagged emergency=ambulance_station if emergency is not in the list.
Do I understand it correctly that this is incorrect?

Yes, the statement you quoted is incorrect.

For this part of SQL rewrites I am importing with --hstore-all but this would be --hstore.

https://github.com/openstreetmap/osm2pgsql/blob/master/docs/usage.md#hstore might help explain the options a bit better. We would not be using --hstore-match-only, as it would require us to reload the database for every new tag key we wanted to render, and will always offer minimal gains for the standard style

Previous research showed

It is immediately obvious that --match-only has virtually no impact when used with the standard style because most objects have at least one tag used by default.style.

Other hstore options like -j/--hstore-all and --match-only are of minimal use except for specialized cases.

.style development is one of the limited situations where --hstore-all is worthwhile, as it reduces the number of DB reloads I need to do while changing the SQL.

@matthijsmelissen
Copy link
Collaborator Author

Thanks for the clarification, I updated the first post.

We should still decide if there are keys / values for which we want to change whether closed lines are interpreted as polygon or linestring, right?

@brycenesbitt
Copy link

Tag toilets:disposal would be nice, so that the option exists to render flush toilets differently from chemical and pit toilets. #1508

drinking_water or drinking_water=no would also be good, to render campgrounds and huts with no water with a modified symbol indicating the lack of water.

denotation may be there already. If not, it lets you render prominent trees more, um, prominently.

@matkoniecz
Copy link
Contributor

it lets you render prominent trees more, um, prominently

I think that more prominent rendering for named trees would be enough. I also see a problem with that tag (see http://wiki.openstreetmap.org/wiki/Talk:Key:denotation ).

@gravitystorm
Copy link
Owner

As @pnorman has explained, the discussion of which tags to "add" is unnecessary, so lets not spend time on it. All features will appear when we use --hstore and all keys will be available (either in hstore or in normal columns).

@brycenesbitt
Copy link

@gravitystorm even on review, the above discussion does not clearly reflect the magnitude of this shift, especially given the long history of the database limitations. Could you confirm:

  1. All keys will now be available to the rendering, even keys invented in the future. The list of new keys above does not represent a limited set of new rendering options.
  2. There is performance advantage for certain key/values to end up with dedicated columns.
  3. @pnorman is trying to work out a more optimal set of keys for dedicated columns.

@matthijsmelissen
Copy link
Collaborator Author

@brycenesbitt the best person to ask these questions to is @pnorman.

@mboeringa
Copy link

As @pnorman has explained, the discussion of which tags to "add" is unnecessary, so lets not spend time on it. All features will appear when we use --hstore and all keys will be available (either in hstore or in normal columns).

+1, it's great to have this, it will help you all a lot. This is what I already have available to me for my personal renderer in development (thanks to ESRI implementing it's own key/value storage for OSM based data).

@gravitystorm even on review, the above discussion does not clearly reflect the magnitude of this shift, especially given the long history of the database limitations. Could you confirm:

  1. All keys will now be available to the rendering, even keys invented in the future. The list of new keys above does not represent a limited set of new rendering options.
  2. There is performance advantage for certain key/values to end up with dedicated columns.
  3. @pnorman is trying to work out a more optimal set of keys for dedicated columns.

Ultimately, @pnorman indeed may answer best, but from what I know and read, I think these are all sound conclusions...

@pnorman pnorman self-assigned this May 1, 2015
@gravitystorm
Copy link
Owner

Could you confirm:

  1. Yes 2) Yes 3) Yes

:-)

@mboeringa
Copy link

As a mild reminder, this of course does not solve the true "cartographic" problems of how to cope with, and display, all of this data at various scales... that will still require thorough human evaluation and consideration (not even taking into account possible technical challenges when more features are going to be rendered).

But that should certainly not withhold on implementing this in any way as @gravitystorm already pointed out ;-)

@pnorman
Copy link
Collaborator

pnorman commented May 12, 2015

9 tags left in my conversion, then I can start benchmarking.

There's a bunch of SQL cleanups I want to do to master once we've worked out the .style file and before a more precise conversion.

@kocio-pl
Copy link
Collaborator

kocio-pl commented Jun 3, 2015

Are there any news of progress or outstanding problems regarding database re-import?

@pnorman
Copy link
Collaborator

pnorman commented Dec 7, 2016

I was looking at the keys which exclusively or almost exclusively indicate areas when used on a closed way.

We specifically want to avoid the situation where closing a loop changes semantics. We'd have to look at what tags indicate areas and what don't, not what keys.

@imagico
Copy link
Collaborator

imagico commented Dec 7, 2016

Another thing regarding the re-import i forgot but remembered when reading #2108 (comment) - would it be possible to have non-polygon relations in the database somehow, in particular route and site relations of course. The elegant way to represent them would be to have a separate relation table and an incidence table indicating memberships and roles. Another option would be representing them as GeometryCollection. I suppose this is all difficult due to lack of support in osm2pgsql currently but i wanted to bring it up none the less.

@pnorman
Copy link
Collaborator

pnorman commented Dec 7, 2016

separate relation table

The pgsql backend has four tables.

particular route and site relations of course

We currently have route relations, and should continue to have them. But I doubt we've tested that.

@imagico
Copy link
Collaborator

imagico commented Dec 7, 2016

The current way route relations are supported is not really great since it lacks membership information. See also #596. I guess the same problem would apply when representing relations in general as GeometryCollection.

But i see this is more an osm2pgsql issue and not really specific to this style.

@jojo4u
Copy link

jojo4u commented Dec 7, 2016

For rendering capitals from relations membership information would also be needed (#2365).

@pnorman
Copy link
Collaborator

pnorman commented Dec 7, 2016

But i see this is more an osm2pgsql issue and not really specific to this style.

Yep - different ways of processing relations or anything involving membership information is outside the scope of what we can do in the style.

@jojo4u
Copy link

jojo4u commented Dec 8, 2016

On http://lua.osm-carto.paulnorman.ca/#8/49.178/6.803 I see something called "Deutschland - Luxembourg / Luxemburg / Lëtzebuerg" (overpass) rendered. Probably the relation of type=multilinestring (1628284).
luxembourg

@jojo4u
Copy link

jojo4u commented Dec 8, 2016

Below z10 the lua branch paints railways over roads, resulting in blacker roads (location on osm.org)

non-lua
non-lua
lua
lua

@kkofler
Copy link

kkofler commented Dec 9, 2016

A re-import of the database gives us the opportunity to change the keys that are available to the stylesheet.

Does that mean we can finally get public_transport properly rendered and will be able to stop adding the legacy highway= or railway= tags for them (e.g., highway=bus_stop, railway=tram_stop, railway=platform) in addition to the modern tagging scheme?

@pnorman
Copy link
Collaborator

pnorman commented Dec 9, 2016

On http://lua.osm-carto.paulnorman.ca/#8/49.178/6.803 I see something called "Deutschland - Luxembourg / Luxemburg / Lëtzebuerg" (overpass) rendered. Probably the relation of type=multilinestring (1628284).

It's not that relation. I wouldn't expect it to be in the DB, but I checked, and it's not in any of the tables.

It looks to be this relation

osm_id          | -3900594
st_geometrytype | ST_Polygon
tags            | "name:de"=>"Deutschland - Luxemburg", "name:en"=>"Germany - Luxembourg", "name:fr"=>"Allemagne - Luxembourg", "name:lb"=>"Däitschland - Lëtzebuerg", "name:nl"=>"Duitsland - Luxemburg", "name:ru"=>"Франция - Германия", "name:gsw"=>"Frankrich - Ditschland", "name:hsb"=>"Francoska - Němska", "border_type"=>"nation", "protect_class"=>"5", "protection_title"=>"Naturpark"
name            | Deutschland - Luxembourg / Luxemburg / Lëtzebuerg
boundary        | administrative
admin_level     | 2

So a multipolygon handling issue.

@pnorman
Copy link
Collaborator

pnorman commented Dec 9, 2016

https://lua.osm-carto.paulnorman.ca/ moved to HTTPS and HTTP/2 - loading should be much faster now, and generally on par with osm.org, or faster given my low load.


Below z10 the lua branch paints railways over roads

This is a bit of a strange one, but I see what's happening. roads-low-zoom uses z_order.

osm2pgsql's C transform z_order puts railways between tertiary and secondary, while the lua transforms use the z_order we use at higher zooms, putting railways above all highways. This runs into the problem that there is no universal z_order. For another style I put railways into the ::casing attachment on low zooms to have road fillls drawn above them and into the ::fill attachment to have rails on top of everything at high zooms.

We could

  1. Keep the lua as-is and adjust the low zoom roads, either by
    1. Adding SQL like in the other roads layers
    2. Trying something with casing and fill attachments
  2. Add a z_order_low_zoom column with a different ordering.
  3. Drop any z_order column and do it all in SQL for all roads queries

I'm inclined towards 1 i for now and then look at 1 ii later. 3 is something we can do post-merge without a major version upgrade, because removing a column is possible without a reload.

cc @math1985

@jojo4u
Copy link

jojo4u commented Dec 9, 2016

Does that mean we can finally get public_transport properly rendered

This will be technically possible then, yes.

@pnorman
Copy link
Collaborator

pnorman commented Dec 13, 2016

So a multipolygon handling issue.

relation 5372584 is another case, but I think the first is easier to debug.

@pnorman
Copy link
Collaborator

pnorman commented Dec 20, 2016

The MP problem appears to be in filter_tags_relation_member and the member_superseded return. See https://github.com/openstreetmap/osm2pgsql/blob/master/docs/lua.md for more info.

For an example, see

require ("openstreetmap-carto")
_, _, ms, _, _, _ filter_tags_relation_member({type="multipolygon", landuse="forest"}, {{}}, {}, 1)
for k, v in pairs( m ) do
   print(k, v)
end

Output is

1       1

I'm going to be looking into this for ClearTables with the issue https://phabricator.wikimedia.org/T153799 and hope to reuse tests for here

@pnorman
Copy link
Collaborator

pnorman commented Dec 24, 2016

I think I've got the MP stuff solved and will do a PR against the lua branch. We are currently not handling road relations, which would block #596 and in turn #508 because we wouldn't have the information in the DB.

@pnorman
Copy link
Collaborator

pnorman commented Dec 28, 2016

I've reloaded https://lua.osm-carto.paulnorman.ca/ to use the latest code, the above problems with MPs are fixed, but it's possible it's introduced new ones.

@matthijsmelissen
Copy link
Collaborator Author

matthijsmelissen commented Dec 28, 2016

How does this affect adding differential data to the database? For example, what happens when a user changes an old-style multipolygon to a new-style one, and this change is appended with osm2pgsql?

@pnorman
Copy link
Collaborator

pnorman commented Dec 28, 2016

How does this affect adding differential data to the database? For example, what happens when a user changes an old-style multipolygon to a new-style one, and this change is appended with osm2pgsql?

Diffs go through the transforms, just like initial imports. The contents of the rendering tables should be the same regardless of if it was an initial import + diffs, or a reimport of new data.

@matthijsmelissen
Copy link
Collaborator Author

This sounds non-trivial to me (both in the old and new situation). What if only an outer element is changed, will the multipolygon be processed again as well? Will all elements be processed again if a tag on the multipolygon is changed? I see tens of ways this could be going wrong.

@pnorman
Copy link
Collaborator

pnorman commented Dec 29, 2016

The ways this could go wrong are the exact same ways that could go wrong with the C tag transforms - a relation is modified, a way is modified, or both are modified.

@osMatt
Copy link

osMatt commented Dec 29, 2016

I've reloaded https://lua.osm-carto.paulnorman.ca/ to use the latest code, the above problems with MPs are fixed, but it's possible it's introduced new ones.

I looked out for some differences and here's what I found:

@HolgerJeromin
Copy link
Contributor

Am i correct that the data is a few days old?

leisure=track is ignored if not on an multipolygon (ref #2238):

An open way leisure=track https://www.openstreetmap.org/way/144179903 is not rendered at all.

The closed way leisure=track https://www.openstreetmap.org/way/388151425 is also not rendered.

As this has no area tag i expected a loop (was an area on master branch).
I found hundreds of closed ways leisure=track in western Germany which no area tag. This will have an huge impact. Nearly all tracks with an soccer field in the middle have no area or MP.

@pnorman
Copy link
Collaborator

pnorman commented Dec 29, 2016

Am i correct that the data is a few days old?

The data is from 161212

leisure=track is ignored if not on an multipolygon (ref #2238):

In #2238 we decided that that leisure=track doesn't indicate an area, but I'm not sure we have styling for linear leisure=track yet.

@pnorman
Copy link
Collaborator

pnorman commented Dec 29, 2016

There seems to be a difference in rendering of nature reserves depending on the zoom level. Large multipolygon nature reserves with several outer rings have smaller rings rendered right away in the lua version - only the largest outer rings appear at comparable zoom levels in the existing rendering. See for example http://www.openstreetmap.org/relation/6267039 or http://www.openstreetmap.org/relation/4120974. However, that might also be only a difference like the order of streets and railways reported above.

In master with relation 6267039 there are about a dozen POLYGONs, all completely independent, and in lua there is one MULTIPOLYGON. This means that there's only one feature to render or not, as opposed to a dozen disjoint areas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests