Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v4.12.0 has unusable performance #3280

Closed
pnorman opened this issue Jun 26, 2018 · 39 comments
Closed

v4.12.0 has unusable performance #3280

pnorman opened this issue Jun 26, 2018 · 39 comments

Comments

@pnorman
Copy link
Collaborator

pnorman commented Jun 26, 2018

From openstreetmap/chef#168

We now have confirmation from @woodpeck that at z10 average render times have jumped from 1.5s with 4.11.0 to 33.7s with the 4.12.0 stylesheet.

This is slow enough to the point of unusable, and right now I'm recommending that people do not upgrade to v4.12.0.

The issue is one with the surface changes. If we can't reasonably quickly figure out what the problem is, we should revert the PR and release v4.12.1.

I will not have time to step through the SQL to identify problems in the near future.

@pnorman
Copy link
Collaborator Author

pnorman commented Jun 26, 2018

#2640 is the PR that appears to have introduced the problems.

@kocio-pl kocio-pl added this to the Bugs and improvements milestone Jun 26, 2018
@kocio-pl
Copy link
Collaborator

Interesting thing is that on z16+ (where I usually test things) I have not seen so visible problems - it looked a bit suspicious, but not anything clear enough to claim performance problems.

@imagico
Copy link
Collaborator

imagico commented Jun 26, 2018

If reverting #2640 indeed fixes the performance issues i would suggest to do that and review the change regarding what causes this without pressure. There is also #3279 which i would assume to be caused by the same change so a more thorough review overall might not be such a bad idea. The documentation now written by @sommerluk (https://github.com/sommerluk/roadpatternrendering) should help with that.

If a 20x performance loss is not recognized during the ~6 weeks this change was already in master this should however also be an indicator that some changes in procedure for better change evaluation and pre-release checks might not be such a bad idea.

@kocio-pl
Copy link
Collaborator

kocio-pl commented Jun 26, 2018

Sounds OK for me. Could you to take care of the revert and v4.12.1 release?

As of performance: we have special ticket for this, yet nobody came with a performance framework solution idea (#1287). When you have 20 zoom levels and the problem is visible only on some of them, manual checking is not an option IMO. Do you have some suggestion or solution for this?

@sommerluk
Copy link
Collaborator

Bad thing…

Up to z9, the layer roads-low-zoom is used. Starting from z10, we have roads-fill and bridges. Following the information at openstreetmap/chef#168 there is no problem at z0-z9. To roads-low-zoom should be okay. Finally, roads-fill is not so different from roads-low-zoom, so I suppose that bridges is the real problem, which is the one that has a SELECT DISTINCT in the SQL code. And in openstreetmap/chef#168 @woodpeck said that this is the query that runs really often. We could try to revert only the bridges SQL. (If support for “surface” on bridges is useful at all, was subject of discussion in the PR anyway.)…

@matthijsmelissen
Copy link
Collaborator

We could try to revert only the bridges SQL.

Our first priority now is to get master stable again. We can debug later, but for now we should revert entire #2640.

@imagico
Copy link
Collaborator

imagico commented Jun 26, 2018

@sommerluk - you could try the version i showed in #2640 (comment) or the CTE variant i sketched earlier in #2640 (comment). For just testing if the problem is indeed in that part you can also just test a static list of layer values.

But you'd need to reliably check the performance in some way. And yes, i think it is better to do that without a hurry due to the need to prepare a bugfix release.

@kocio-pl - No, i have not contributed or merged any changes for quite some time so i am not familiar with the more recent changes and would not be comfortable doing a revert & release.

@mboeringa
Copy link

mboeringa commented Jun 26, 2018

One thing that does stand out is also that this issue seems to be mainly on medium / low zoom. The more zoomed out you are, the less effective any spatial index is, and the more PostgreSQL will need to rely on its attribute indexes to prevent large scans over tables and many records.

Looking at the list of default carto suggested indexes, it strikes me the current list of indexes is really conservative, with only an absolute minimal number of indexes defined considering the number of keys used in a style like this. That saves a lot of disk space, but might affect low zoom rendering performance.

Maybe indexing a few more attributes / keys involved in the SQL, could really help boost performance. It currently seems that much of the rendering performance relies on the spatial index doing the main job, which is logical at zoomed in extent, but less so zoomed out.

E.g. in a really bad hypothetical situation zoomed out to Z0-Z2, you might end up doing a full table scan on 300M records to only fetch 10 or so..., just because there is no attribute index defined that could help selecting those few records required by the SQL.

@woodpeck
Copy link
Contributor

On a general note, doing a full table scan on z0-z2 would be a negligible performance hit since these zoom levels only contain one single meta tile. On the other hand, having to keep a spatial index up to date affects every single database update, so that has to be kept in mind.

In the concrete case at hand, as I wrote in openstreetmap/chef#168, there appears to be a full table scan on planet_osm_line going on on z10. Looking further into it, it becomes obvious that the SELECT DISTINCT layernotnull subquery is at fault. I can repair the problem by adding a ... AND "way" && ST_SetSRID(...) condition to that subquery (the sample query then takes only a few milliseconds as opposed ~40 seconds). This means you can likely fix the issue by adding a && !bbox! at that location, however I'm at the moment not sure if having one of these means that Mapnik will cease adding the spatial condition at the end, which would mean you need a second && !bbox! at the end of the query.

@pnorman
Copy link
Collaborator Author

pnorman commented Jun 26, 2018

On a general note, doing a full table scan on z0-z2 would be a negligible performance hit since these zoom levels only contain one single meta tile. On the other hand, having to keep a spatial index up to date affects every single database update, so that has to be kept in mind.

Unless there's some kind of partial index (e.g. WHERE boundary='administrative' AND admin_level IN ('2','3','4')) that is very selective, a sequential scan of the entire table will be faster than an index scan, and postgres will correctly chose to do it. The same is true on z3 and possibly z4.

Slow rendering within reason doesn't matter for z0-z4, because there are a total of 33 metatiles to be rendered. There are 16k z10 metatiles.

@sommerluk
Copy link
Collaborator

I will not oppose a revert…

@giggls
Copy link

giggls commented Jun 27, 2018

I can not confirm the suspicion that this is not happening on zoom-levels 16+
Here is what I get in zoomlevel 19 rendering a single tile only:

$ time render_single_tile.py -s osm.xml -o test4120.png -u /19/274450/180045.png
real	8m2.860s
user	0m1.448s
sys	0m0.472s

$ time render_single_tile.py -s osm.xml -o test4110.png -u /19/274450/180045.png
real	0m14.980s
user	0m1.208s
sys	0m0.400s

@kocio-pl
Copy link
Collaborator

Thanks for checking it, Sven!

This proves that automated testing framework is needed anyway. I was able to detect such problem before v4.10.0 (see #3159 (comment)), but not this time. I think that tiles I was using were not so slow to render, but still we need something more solid than just my gut feeling.

@matthijsmelissen
Copy link
Collaborator

I could not reproduce this on z19.

This was with a small database and old indexes. Perhaps the issue only shows up on databases of reasonable size?

@giggls
Copy link

giggls commented Jun 27, 2018

I did not change anything database related. I just called the single tile rendering script with both versions of the style. However this might be related to my hstore-only database layout.

@sommerluk
Copy link
Collaborator

@pnorman Thanks.

@kocio-pl
Copy link
Collaborator

Does it mean we can release v4.12.1 now? Could anyone do the actual release? I'm a bit busy with other things lately.

@kocio-pl
Copy link
Collaborator

And before the release - @giggls @woodpeck , could you just check if now the problem is fixed?

@giggls
Copy link

giggls commented Jun 28, 2018

Current master works fine again.

@mboeringa
Copy link

mboeringa commented Jul 1, 2018

Unless there's some kind of partial index (e.g. WHERE boundary='administrative' AND admin_level IN ('2','3','4')) that is very selective, a sequential scan of the entire table will be faster than an index scan, and postgres will correctly chose to do it. The same is true on z3 and possibly z4.

I think this is a bit to generic a statement. A sequential scan does have its cost (and a considerable one) as well. You do not need to have only very specific partial indexes to take advantage of indexing. In my experience, at least one important thing to look for are OR statements in your queries, and all keys separated by them.

E.g. look at the query below, which is quite typical for situations you may encounter in OSM, where due to tagging conflation and changes in tagging schemes, multiple keys may be needed to extract a specific thematic layer.

SELECT
	*
FROM
	gis.public.planet_osm_polygon_osv
WHERE
	( osm_man_made IN ('mineshaft')
	OR osm_disused_58_man_made IN ('mineshaft')
	OR osm_abandoned_58_man_made IN ('mineshaft'))
	AND ( osm_mineshaft_type IS NULL
	OR osm_mineshaft_type IN ('winding'))

Now look at the image below, which shows the query execution plan with just the osm_man_made field having an index. Unsurprisingly, this ends up in a sequential scan, because the OR clauses causes the need to evaluate the entire table for the osm_disused_58_man_made and osm_abandoned_58_man_made keys as well, which is fastest in a sequential scan if no other indexes are available. This does, however, come at a cost (this is the entire Europe extract by the way). Extracting just 612 records from a >165M record materialized view takes 165 seconds, so almost 3 minutes using the sequential scan (using SSD storage).

afbeelding

Now watch what happens when I do add indexes for the osm_disused_58_man_made and osm_abandoned_58_man_made fields as well:

afbeelding

Notice the time is now down to just 152 ms - so less than one second to extract the 612 records - using the three key indexes added, and PostgreSQL using all indexes to get them, instead of a sequential scan.

Of course, there are costs to indexing all of these fields in terms of disk space and maintaining them, especially in the environment with constant diff updates that openstreetmap-carto seems to operate in, so this is just an illustration to show that in specific cases it may still be beneficial to add indexes to solve a performance issue, although this all needs to be seen in the context of a primarily spatially accessed database as well, which diversifies the situation even more. Ultimately though, the best thing to do, is to simply test it for a specific query if it has a performance issue. Add the index(es) you think are relevant, and see how it does.

@sommerluk
Copy link
Collaborator

Okay, I’ve played around a little bit with the code, and honestly I do not understand the results. I’ve fixed #3279 in the code for unpaved rendering. Than I’ve loaded a current Ivory Coast extract in the database and than rendering with Kosmtik. I’ve taken the time until all times in the browser window are loaded. Doing rendering starting at z19 up to z0 gives these results:

zoom 4.12.1 (seconds)   with pattern (seconds)
19 12 17
18 4 8
17 3 16
16 3 9
15 3 10
14 6 25
13 14 55
12 11 40
11 16 65
10 12 60
9 4 6
8 5 13
7 7 8
6 6 6
5 10 6
4 4 5
3 4 4
2 3 4
1 2 2
0 2 1

The results are not very exact. Repeating the same steps, they can vary up to 100%.

Anyway, the real problem seems to start with z10 and higher.

Given that the reports that that the SQL is slower by a factor 20 and that I see here at z10 only a factor 5, I suppose that the SQL execution time is less important and the Mapnik rendering more important if the database is small (I have only Ivory Coast loaded), but that this will change when the database is big (whole planet)?

Now I have tried to isolate what is so time-consuming. Up to z9, we use the roads-low-zoom SQL query. Up from z10, we use the roads-fill SQL query and the corresponding bridges and tunnels SQL query. All these (roads-low-zoom, roads-fill, bridges…) are catched by the same code in roads.mss. I’ve called fixed coordinates at z20 in Kosmtik. Than I’ve switched directly to z10.

  • This takes about 20 seconds for v4.12.1 (significantly slower than the 12 seconds in the table above going from z9 to z10).
  • With the unpaved pattern, it takes 60 seconds.
  • Interesing: With roads.mss from the unpaved pattern but project.mml fro v4.12.1 (just with some NULL AS int_surface so that the code does not crash) it still takes 40 seconds in spite that nothing of the pattern is rendered. Even when I replace all surface-based code in project.mml with dummy NULL AS int_surface, it still takes 40 seconds. z0-z9 however work as fast as v4.12.1 does. That’s something I do not understand. The SQL is the same as in v4.12.1. It renders the same as v4.12.1. But it’s slower. The only difference I see are the attachment order in roads.mss. But can this really be the reason? Because than it should give performance problems also at z9. What’s so different at z10? As my modified v4.12.1 SQL query does not return any globalboundingbox objects nor any surface values, nothing of the comp-op operations is really triggered.

@imagico
Copy link
Collaborator

imagico commented Jul 8, 2018

I created two sketches for possible solutions based on the 4.12.0 code:

https://github.com/imagico/osm-carto-alternative-colors/tree/surface-fix1
https://github.com/imagico/osm-carto-alternative-colors/tree/surface-fix2

The first led to a ~80 times performance improvement for the query alone on a small test database, the second about 100 times faster.

Note i did not test if this is functionally the same, i.e. if it leads to the same rendering results. It should but none the less this should be tested. I have also not tested if there are other performance issues - this is just based on #3280 (comment) (which is pretty convincing)

@pnorman
Copy link
Collaborator Author

pnorman commented Jul 10, 2018

I think this is a bit to generic a statement

The statement you quoted is about z3 and z4 rendering. The rest of your comment is not really relevant to roads.


I created two sketches for possible solutions based on the 4.12.0 code:

Diffs are
v4.12.1...imagico:surface-fix1
v4.12.1...imagico:surface-fix2

sommerluk added a commit to sommerluk/openstreetmap-carto that referenced this issue Jul 20, 2018
@sommerluk
Copy link
Collaborator

Okay. At https://github.com/sommerluk/openstreetmap-carto/tree/unpaved20 the code for unpaved rendering, fixes #110 and does not re-introduce #3279 and has imagico:surface-fix2 applied. It still feels slow on z10-z13, but faster on the higher zoom levels. How could I measure the performance in a more objective way?

@imagico
Copy link
Collaborator

imagico commented Jul 20, 2018

Easiest way to benchmark queries (AFAIK) is to enable extended logging in postgres - as explained in:

https://gis.stackexchange.com/questions/60448/making-mapnik-verbose

You can also take specific queries from those logs and EXPLAIN ANALYZE them - as @woodpeck did.

@giggls
Copy link

giggls commented Jul 20, 2018

As an addition to this comment:
The output of EXPLAIN ANALYZE will likely be OK in any other case than "full table scan". The perfect output would be "Index Scan" using on of the geospatial indexes in our case.

@kocio-pl
Copy link
Collaborator

Maybe render_speedtest might be useful here, see #1287 (comment).

@sommerluk
Copy link
Collaborator

Okay, I need some more help with SQL setup. I don’t get working postgres -D /path/to/postgres/directory -c log_min_duration_statement=0.

Usually, I start PostGres with sudo systemctl start postgresql. I can find the path to my config file with # SHOW config_file; but when trying postgres -D /path/to/postgres/directory -c log_min_duration_statement=0 (as normal user) than it complains that I do not have access rights to the file. If I do the same as root user, than postgres informs that it cannot be executed as root user for security reasons. I suppose another type of PostGres setup is needed?

@imagico
Copy link
Collaborator

imagico commented Jul 21, 2018

You need to run postgres as the dedicated user created for that (usually called postgres). You stop the normal service running and then manually start it using sudo. Like:

sudo -u postgres /usr/lib/postgresql96/bin/postgres -D /var/lib/pgsql/data -c log_min_duration_statement=0 -c log_destination=stderr -c logging_collector=false 2> debug.log

@sommerluk
Copy link
Collaborator

@imagico Thanks a lot! That works!

The result of EXPLAIN ANALYSE for the query with fix2 from @imagico (bbox is copy&paste from the Postgres log from a Kosmtik run at z10) is:

                                                                                                                                                                                                                                                                                                                                               QUERY PLAN                                                                                                                                                                                                                                                                                                                                               
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=12176.48..12176.49 rows=1 width=358) (actual time=0.001..0.001 rows=0 loops=1)
   ->  Subquery Scan on bridges  (cost=12176.48..12185.33 rows=708 width=358) (never executed)
         ->  Sort  (cost=12176.48..12178.25 rows=708 width=386) (never executed)
               Sort Key: features.layernotnull, features.z_order, (CASE WHEN ("substring"(features.feature, 1, 8) = 'railway_'::text) THEN 2 ELSE 1 END), (CASE WHEN (features.feature = ANY ('{railway_INT-preserved-ssy,railway_INT-spur-siding-yard,railway_tram-service}'::text[])) THEN 0 ELSE 1 END), (CASE WHEN (features.feature = 'highway_road'::text) THEN 0 ELSE 1 END), (CASE WHEN (features.feature = ANY ('{highway_platform,railway_platform}'::text[])) THEN 0 ELSE 1 END), (CASE WHEN (features.access = ANY ('{no,private}'::text[])) THEN 0 WHEN (features.access = 'destination'::text) THEN 1 ELSE 2 END), (CASE WHEN (features.int_surface = 'unpaved'::text) THEN 0 ELSE 2 END)
               ->  Subquery Scan on features  (cost=6055.92..12142.96 rows=708 width=386) (never executed)
                     ->  Append  (cost=6055.92..12108.45 rows=708 width=362) (never executed)
                           CTE features_real
                             ->  Seq Scan on planet_osm_line planet_osm_line_1  (cost=0.00..6055.92 rows=504 width=559) (never executed)
                                   Filter: ((highway IS NOT NULL) AND (way && '01030000A0110F00000100000005000000010000E0FFFFEFC7010000E0FFFFEFC70000000000000000010000E0FFFFEFC7010000E0FFFFEF470000000000000000010000E0FFFFEF47010000E0FFFFEF470000000000000000010000E0FFFFEF47010000E0FFFFEFC70000000000000000010000E0FFFFEFC7010000E0FFFFEFC70000000000000000'::geometry) AND (bridge = ANY ('{yes,boardwalk,cantilever,covered,low_water_crossing,movable,trestle,viaduct}'::text[])))
                           ->  CTE Scan on features_real  (cost=0.00..10.08 rows=504 width=360) (never executed)
                           ->  Subquery Scan on layers_list  (cost=11.34..15.34 rows=200 width=360) (never executed)
                                 ->  HashAggregate  (cost=11.34..13.34 rows=200 width=4) (never executed)
                                       Group Key: features_real_1.layernotnull
                                       ->  CTE Scan on features_real features_real_1  (cost=0.00..10.08 rows=504 width=4) (never executed)
                           ->  Seq Scan on planet_osm_line  (cost=0.00..6027.06 rows=4 width=559) (never executed)
                                 Filter: ((railway IS NOT NULL) AND (way && '01030000A0110F00000100000005000000010000E0FFFFEFC7010000E0FFFFEFC70000000000000000010000E0FFFFEFC7010000E0FFFFEF470000000000000000010000E0FFFFEF47010000E0FFFFEF470000000000000000010000E0FFFFEF47010000E0FFFFEFC70000000000000000010000E0FFFFEFC7010000E0FFFFEFC70000000000000000'::geometry) AND (bridge = ANY ('{yes,boardwalk,cantilever,covered,low_water_crossing,movable,trestle,viaduct}'::text[])))
 Planning time: 1.608 ms
 Execution time: 0.104 ms

@sommerluk
Copy link
Collaborator

The database of the above example had loaded Ivory Coast only.

@imagico
Copy link
Collaborator

imagico commented Jul 21, 2018

I think you got the wrong query, probably something Kosmtik is running as an advance test. You need a query from rendering an actual tile with a real bounding box and without a zero limit. Preferably from a tile where the query returns some data.

And then you can compare the performance with and without the change at various zoom levels.

This will not be quantitatively representative for a full database but it will tell you if there are any serious issues.

@giggls
Copy link

giggls commented Jul 21, 2018

Maybe my render_single_tile.py https://github.com/giggls/openstreetmap-carto-de/blob/master/scripts/render_single_tile.py script might be useful for you. I use this script in combination with Makefile targets to do manual tests when merging upstream changes.

sommerluk added a commit to sommerluk/openstreetmap-carto that referenced this issue Jul 29, 2018
sommerluk added a commit to sommerluk/openstreetmap-carto that referenced this issue Aug 21, 2018
@sommerluk
Copy link
Collaborator

New PR at #3357

@giggls Thanks for the render_single_tile.py script. That’s nice! Could you take a look on performance of #3357 ?

@giggls
Copy link

giggls commented Aug 22, 2018

My makefile will probably also be useful for you then:
https://github.com/giggls/openstreetmap-carto-de/blob/upstream%2Bl10n/Makefile
Rendering time for running "make test" is roughly the same for me in your branch and origin/master now.

@kocio-pl
Copy link
Collaborator

I'm curious what are the numbers in you case? Are they comparable with #3355?

@giggls
Copy link

giggls commented Aug 22, 2018

Running "make test" which will render one single tile per zoom layer in my setup took 5m10s in master and unpaved20 branches (second pass, thus query cache filled).
In comparison rendering v4.12.0 is still ongoing. I will post the results later but I would expect an order of magnitude.

The fact if there are full table scans or not should be more meaningful than rendering times anyway.
I'm using a slightly different database layout and views to be able to render upstream carto thus @rrzefox should be able to provide more comparable results.

@giggls
Copy link

giggls commented Aug 22, 2018

v4.12.0 was 93 minutes

@sommerluk
Copy link
Collaborator

@Giggles Thanks a lot for testing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants