-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance testing framework #1287
Comments
My full-world tests actually take <1h to run (on fast hardware). They do cost more than 1$ to run on EC2.
For total time rendering, it's high-zoom biased I believe. |
EC2 is subject to about an order of magnitude performance variation between instances. For a particular benchmark configuration on amazon repeated many times on different instances configured the same, the 10% performance is 587 TPS and the 90% is 2537. A 200% decrease in rendering throughput would be a pretty catastrophic style change, but wouldn't be able to be distinguished on EC2. |
Do we already have some tools/scripts to test the rendering speed, even roughly? I think Kosmtik automation could be nice option. We could for example craft some exporting URLs with a big enough bbox and execute them for many zoom levels automatically, but maybe there is more elegant way of doing it. |
There are enough differences it's not a great option, except for horrible performance failures
It's essential the workload is realistic. The method I've used lately has been to randomly sample running queries and ignore the time spent in Mapnik. |
Do you have any scripts that other people could use and compare results or is it just manual testing? |
I haven't needed any scripts, it's all been one-line command line stuff. |
Could you share it anyway? Even if it's short, we don't have any standard tools to measure performance and compare results at the moment. |
|
I'm not sure, but I think this message relates to osm-carto PostgreSQL performance testing (comparison of current setup with partitioned tables): https://lists.openstreetmap.org/pipermail/dev/2018-March/030168.html |
There's a tool called
|
Don't render down to z14. Any rendering test that tries to render the world past z12 will give distorted results. |
What causes this distortion? |
The fact that it doesn't represent a realistic workload. All performance testing needs to test something that matters, and the time to render the world on z13+ doesn't matter because no one does it. In particular, the average complexity of metatiles will be different than a tile server's workload, as will be the balance between different zooms. There are 4x as many z14 tiles as z13 tiles, but not 4x as many z14 tiles rendered as z13 tiles rendered. https://planet.openstreetmap.org/tile_logs/renderd/renderd.yevaud.20150503.log.xz is an old log of what is rendered, taking into account the tile CDN and the renderd tile store |
This is very surprising. It's not something we've encountered at my job (web application performance testing). I would be interested in understanding this issue better, @pnorman do you remember what instance type this was? |
It isn't my test results, it was from a comprehensive test comparing lots of different cloud options. gp2 storage had just come out. I'm sure it's gotten better, but there's still going to be variation and before benchmarking, I'd want to test the machine with |
What if we use some smarter testing pattern, like: "old - new - old - new - old - new..." on the same machine (instead of old and new being done once and on different machines and time)? That would help to compare tests more directly and avoid non-systematic errors. Do we have a machine for such testing? Maybe Travis could be used as a first line of defense? |
As a partial solution, could we run |
How much data would you load for this? |
I am thinking of using a small European country, such as Portugal. I made a script that runs EXPLAIN on all queries in the MML. Even running it on Luxembourg would catch the addresses sequential scan (#3937): Commit 05dc392, just before the fix:
Commit e66889e with the fix:
However, even though it's quite succesful in this case, these kinds of bugs seem quite rare to me and I doubt the script brings great benifits in monitoring performance in general. |
Cartography is more important than performance, but performance is much harder to test! We've been heavily reliant on @pnorman to run performance tests on his setup but I'd like to open that up so that more people can become involved and that we can automate it as much as possible.
There's a whole load of hardware considerations that makes cross-server test results impossible to compare, but we should concentrate on being able to compare two different commits and give relative answers. It would be useful to have:
The tests should be roughly indicative of real world rendering patterns e.g. city-biased and mid-zoom biased.
The text was updated successfully, but these errors were encountered: