Skip to content

Reduce duplication of rendering effort #101

@zerebubuth

Description

@zerebubuth

From openstreetmap/chef#85, with apologies to anyone following the breadcrumbs...

The rendering machines are, currently, completely independent. This is great for redundancy and fail-over, as they are effectively the same. However, it means duplication of tiles stored on disk and tiles rendered. Duplication of tiles on disk is somewhat desirable in the case of fail-over, but duplicating the renders is entirely pointless.

Adding a 3rd server, therefore, is unlikely to reduce load by 1/3rd on the existing servers from rendering. However, a lot of the load comes from serving still-fresh tiles off disk to "back-stop" the CDN, which would be split amongst the servers (sort of evenly).

What would be great, as @pnorman and I were discussing the other day, is a way to "broadcast" rendered tiles in a PUB-SUB fashion amongst the rendering servers so that they can opportunistically fill their own caches with work from other machines. At the moment, it's no more than an idea, but it seems like a feasible change to renderd.

Currently the two servers are independent, and clients go to one based on geoip. This means that the rendering workload is not fully duplicated between the two servers, as users in the US tend to view tiles in the US and users in Germany tend to view tiles in Germany. This has been tested by swapping locations and seeing an increase in load.

Unfortunately, this doesn't scale well to higher numbers of servers.

Yesterday, yevaud rendered 963,135 distinct metatiles and orm rendered 859,036 of which 303,923 were the same. If only one copy of each of those were rendered, that would be an overall saving of 17%, which is nowhere near as large as I'd have hoped.

If a single server has a capacity of 1, then two have a capacity of 1.66.

If you assume that the statistics remain the same and that when rendering a tile there is a 17% chance that a specific other server has the tile, if you go to three servers then for a request there is a 31% chance one of the two other servers has the tile. With each server spending 31% of its capacity duplicating work, the total capacity is 2.07, an increase of 25%. If everything was distributed ideally it would be an increase of 50% (2x actual)

My gut tells me that the statistics will not be the same and 3 servers will be slightly better than this model, but it gives us a place to start.

With four servers, it is a 43% chance of duplicating work and a total capacity of 2.29, an increase of 11% instead of 33% (3x actual).

So we could set up three servers and not be too badly off for duplication, but beyond that it gets worse.

Metadata

Metadata

Assignees

No one assigned

    Labels

    service:tilesThe raster map on tile.openstreetmap.org

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions