-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change to tiles acceptable usage policy #113
Comments
A couple thoughts:
|
Give that Nominatim has similar problems, I would suggest to include the search acceptable usage policy in the discussion. Whatever the outcome, we should apply the same rules for tiles and search. |
Although I have views on this issue, given my conflicts of interest I recuse myself from this topic. |
@iandees thanks! Those are very good points.
PS: If anyone has strong feelings about this, please comment - this is an open issue and OWG wants to hear your views. |
Do we have any (approximate) breakdown[1] of sites vs apps; and any breakdown of how many sites use what % of capacity? My rough impression as an interested onlooker is that there are three broad categories of user, other than mapping tools:
I'm not overly bothered about 1 or 2, but to cut off 3 would concern me. Having OSM tiles on (say) a small shop's "Where we are" page is good visibility for OSM yet minimal load on the servers, even in aggregate. OSM also has an wider social role in encouraging people to "use [maps] in creative, productive, or unexpected ways", and to prevent small sites from using our maps as soon as they add their first AdSense embed or affiliate link would damage this role. On a broader issue, I worry that this might have the effect of making the best-funded service companies into gatekeepers for the OSM ecosystem. If long-tail users are immediately required to sign for with a 'starter plan' with a services company, this may reduce their attachment to OSM per se and to the broader ecosystem. But if I'm way off on the numbers then this is all moot. (And thank you for soliciting wider contributions! :) ) [1] I ask this knowing how much @zerebubuth enjoys a bit of numerical analysis |
I would take issue with the underlying hypothesis that OSM does not benefit from use of the OSMF provided services in "closed" environments, I can see no logical reason why this should be the case and would suggest that this be substantiated before changing rules in a way that might effect a larger number of users short term. Note: if we do change something we should take #114 in to account |
I think this is really shortsighted proposal. It for sure will get tile usage levels lower, but it will also move lots of people away from openstreetmap at all. Sending people to go read switch2osm when they're using osm.org tiles sounds like 'go away'. Switch2osm was started as a guide how to use OSM - not as a way to say "don't use osm, go install your own osm". Google's prices aren't that high compared to self-hosted OSM. Speaking for myself: I'm lazy nowadays. I know how to set up tiles and did that numerous times. I have two options now:
"Commercial apps can not use the server" - we're using QGIS with QuickMapServices plugin. If we buy commercial support for QGIS, helping its development, we cannot use osm.org tile cluster anymore? We're using Carto tiles as background for (hundreds of) GPS traces that had issues with routing, and we fix osm.org roads each time we identify an issue with them. Should this be banned, as it is business usage, or should this be embraced, as it is about improving the map, finding rare and obscure issues? What we lack is transparency, IMHO:
We're trying to have social enforcements where there are technical solutions. There were rumors about Varnish migration. We're not now bandwidth limited, we're rendering-power limited - how about implementing logic of:
|
Thank you komzpa for running a tile server. I believe identifying top websites using osm.org tiles and asking them to donate will send the wrong message. What if a website refuses for one reason or another? Or donates very little? Or donates once and never again in the following years? Would we then switch off access? That model would be seen as "give us money or else". Same with returning tiles "out of capacity, donate". https://switch2osm.org/ includes instructions how to setup a tile server but also lists tile provider companies. Pointing users to the website doesn't mean one has to run their own tile server. Not being bandwidth limited currently is lucky as bandwidth (at least for the hosting in London) is donated.
Isn't that up to CartoDB's tile policy? But you raise a good point here: editors (people fixing the map) should ideally never be restricted. |
This would also ban for example my local file that makes map of places with missing OSM tags (I make this map to find places for mapping) and all kind of things that are in development, even ones intended to be open for public. for reference - my file, version showing places with bicycle parkings missing in OSM, present according to data released by city Kraków: https://gist.github.com/matkoniecz/bac244f38693f307b3560e4e71bf8e04 |
This would be highly useful. I suspect that there are some sites/apps/scrappers like Pokemap with very high usage and long tail of tiny/optimized sites with acceptable levels of usage. |
Can you give link to that (I am curious how it was done from technical side). |
I'm afraid it was done in the simplest and least repeatable way possible: Taking the tile render server Apache logs and running |
I've read that this thread is for policy discussion, not for tech details, but just for better understanding I would like to know is it possible to deploy a configuration that will have 2 separate flows. One is for core activites (like mapping editors, tiles at openstreetmap.org etc) with higher quality of service and not for 3rd party usage; and another flow for all others where service is provided on a best-effort basis (if there is a capacity for it)? I agree with Komzpa that it is better to have a technical solution (if possible) - rather than a legal once. Because in this case we will be able to block users that are a source of an issue (bandwidth usage etc - it could be a non-profit website like fastpokemonmap), rather than that ones that are harmless but are closed-source/commercial, for example. |
Here's the breakdown based on renderer requests on yevaud for the preceding 24h period - this is cache misses, so will undercount any sites whose tiles are served entirely from cache. But those aren't the ones which cause us a problem anyway.
The Unfortunately, the 23,995 sites in the long tail together make up almost 40% of usage, yet are a very long, smooth tail:
Where
The sum of the "long tails" for user agents and referers is 61.72%, which means that almost 2/3rds of our requests are in your 3rd category. When taken individually, they are not problematic levels of use. As a whole, they do cause a problem. I think everyone would agree that we want to support mapping activity. The requirements for supporting this are clear; we want quick updates and lots of detail. The problems start to appear when we try to support generic web mapping use-cases, which have a different set of requirements; they are not so bothered about updates, and favour "less cluttered" maps over more detailed ones. In fact, it's better for the generic web mapping use case to update less frequently, as this means better cache hit ratios and faster load times. "Less cluttered" map tiles tend to be smaller, making them faster to load and making better use of edge and device caches. It is possible support all of this, with more hardware and more donations. However, using discretionary spending (and to a lesser extent, donated hardware) to support generic web mapping means we have fewer resources to support other OSMF-run services, such as the API, planet, Nominatim, etc... |
Maybe for start blocking things like
? 15% reduction is a big part (though most would probably start using proper identification rather than disappearing) and it would encourage proper identification by users. And maybe there are big, inefficient users hiding behind generic user agents. |
Maybe it would be a good idea to ping them about this discussion? |
That's a good idea, and what we've been doing for a few years now; tightening access requirements and trying to find "abusive" users. We're at the point where the vast majority (~62%) of tile accesses aren't coming from a small number of abusive users or apps, instead they come from a huge pool of small sites and apps which aren't directly related to OSM - other than using it as their free map. There might be some benefit to having all those sites displaying attribution to OSM (assuming they do, of course). On the other hand, there's a clear trade-off between using resources for tiles and for other things more directly related to mapping activities. |
I think there's an issue of trying to make savings instead of making growth. Can we translate all these numbers into costs, according to, say, Amazon's pricing? I've installed Orux Maps - they do show "Donate" dialog, proposing to donate them. Given Google Play reporting 10 000+ "donate app" installs with $2,74 price that translates into estimated income of $20 000. Orux has GPS tracking support. It might be hard to trace directly, but osm.org has GPX upload feature - how many of those are from Orux users? If we have need for capacity in other fields, then please light those fields up as such. |
But is there any technical problem with blocking remaining uses without proper user agent? |
I think it also has social impact that should be well-thought. How exactly blocking is performed? Does it have just generic "Access blocked" message, or "Get your developer send correct user-agent, and donate"? |
Hello all; from OruxMaps: I think that the benefits of OruxMaps are a bit overestimated. There are approximately +20,000 donations, but along 7 years. The donation was 2 € until last year. For Google is the 30%, VAT 21%, and the additional 25% tax approx. for my country, + paying a hosting,... The calculation of the current profit is now more easy. Recently I increased the donation to 3 €, because it almost cost me money. Sure there are many payment applications that use more intensely osm and do not identify itself. OruxMaps also allows users to upload the GPX track files to OSM servers. In this way I think that helps osm. If I have to pay for the services of map servers, honestly I think that would have to remove them from the app, leaving only support for offline maps. There are other excellent servers that the app can not use because of their prices. I can make specific donations, but if I have to pay for all the map servers, it would be impossible. I could offer users to donate directly to the provider of the maps, or pay affordable prices. I really do not know which is the best solution. |
While OSM tries to provide recent tiles for everyone, it causes high render workload and low cache hitrate, but recent tiles is needed only by minority of users. I think current situation could be solved by splitting map cache into two parts: one for signed-in OSM users, with high rendering priority as it works now for everyone, and second for everyone else with big cache size, when tile invalidated in a week or even in a month. It should reduce tile renderers usage and focus it on editors without blocking hot resources like Pokemon map or something like it. |
There's no need to split up the cache itself. It can be achieved with rather simple means:
What do you think? |
My few thoughts... I mostly disagree with a differentiated policy for open source / commercial / publicly-accessible / behind a firewall or VPN. I think it is a nightmare just to define such a policy and I like better the current one, with rule#1 heavy use. I believe it is very much in OpenStreetMap spirit that commercial use is also accepted.
For me the only sector OSMF/OWG ought to support (more deserving) is anything contribution-related, core OpenStreetMap project; not about any external project or website being open or closed, free or commercial. Just adjust the rate for overall restriction and keep a whitelist for website and core projects. I like the direction of #113 (comment). |
Be careful with "commerical", since it's vague. Is the BBC commerical? |
@zerebubuth Thank you; that's really interesting, though makes it clear that there are no easy answers! Depending how the stats are presented, you could make a case that we are serving 11.74% to osm.org; 22.62% to generic webmapping uses; and 65.64% to scrapers, apps, super-heavy users and other potential "abusers". My guess is that a non-commercial restriction would be lucky to reduce that 22.62% by much more than 5%, to (say) 17%. To get even that much of a reduction from the 24,000 sites would require intensive policing, which given the scale of the issue would likely be carried out by the community at large rather than by OWG. Given the (shall we say) alacrity with which some community members have approached attribution issues in the past, I fear that would be counter-productive for the goodwill of OSM. So for an alternative suggestion which might be more closely aligned to the numbers:
where n and 17 are figures decided by OWG. (Slight side-issue: I wonder how much mapping activity is generated by the 22.62% of webmapping uses. In other words, people or organisations (however defined) who add content to OSM so that they can show it on the map on their own sites. I suspect it's significant in terms of mapping activity, but insignificant in terms of server load; if so, any policy should seek to preserve that linkage.) |
Tiles are a great advertisement platform for OSM. What do you propose to say about OSM on these banners?
So, while we have this platform, it is essential to send correct message if we want to.
Scraping will soon go away - if scraper sees their cache poisoned by such tiles, they will stop. |
Uhh the current tile usage policies already say "Valid HTTP User-Agent identifying application". Why aren't "Android", "CFNetwork" and "libwww-perl" already blocked? |
There are two reoccurring points in the discussion where I strongly disagree:
I did like the original suggestion because it fits well into the open data theme. From a more practical point of view it might be better to draw a line that is easier to implement. @systemed suggestions are probably better in that regard and would have my support as well (I would even go as far as supporting restricting the highest zoom levels for everybody but I'm not sure how much gain there is). |
We have 20x globally distributed caches. See: http://dns.openstreetmap.org/tile.openstreetmap.org.html , hardware is listed here: https://hardware.openstreetmap.org/#tile-caches |
The tile.openstreetmap.org uses GeoDNS to point to a local cache to ensure fast response and regionally hot cache. We peak at just over 1Gb/s traffic outbound from caches: http://munin.openstreetmap.org/openstreetmap/tile.openstreetmap/index.html |
The 20x caches are monitored and automatically rebalanced if there is an outage. Same for rendering backends. |
To give a scale of the traffic tile.openstreetmap.org services; we current serve over 6800GB/day in over 528 million requests. (average of around 13.6KB per request) |
Do I understand right that rendering servers use separate caches, and there is no sync between caches? |
The caches do peer with nearby caches. The stack is: Each tile cache server (20) has memory and a filesystem cache. Each render backend server (3) has a local disk cache and renders on demand if missing. |
Is there replication of cache between rendering backend servers? |
No. The render backend servers cache is "hot" for the regions they serve. eg: Sticky, tile-cache-A normally uses render-backend-A, tile-cache-B normally uses render-backend-B... tile-cache-B might switch to render-backend-A if there is an outage of B, but will switch back when B is available again. Syncing cache between the backends would be non-trivial and a fairly expensive (CPU + IO + Network) operation. 2 of the 3 render backend servers currently use nearly 100% of the available disk space for cache. |
To follow @cleder tile. / no-cache. suggestion, what are actually the cache setting on the CDN servers? |
My understanding is, that developers may not always be able to influence neither Referer nor User-Agent, e.g. when setting up a website which loads tiles using Leaflet/OpenLayers. Could we update the tile usage policy to ask (not require) developers to make their contact data available through their client's requests, e.g. by appending an URL parameter, that will be visible in the log files? Something like We could extend the tile usage policy it like this:
If enough developers implement this, we should have a clear picture of who is using how much bandwidth. The tile usage policy already states
but obviously it's difficult to contact the responsibles. |
@grinapo WMF tends to have a fairly liberal view on third party using our services, so yes, in principle it does make sense for WMF to expose its tiles to anyone, with the usual limitations (we reserve the right to block abusive / excessive traffic, we ask that anyone planning to send statistically significant amount of traffic makes contact first, ...). We still have a few things that needs to be sorted out on our side. You can follow the discussion on https://phabricator.wikimedia.org/T141815. |
If they're setting up a website then the browsers will send referer headers by default which is fine.
No. #101 discusses this in more detail, I'd say progress on that is in the hands of developers right now, not ops. |
Many commercial providers require developers to sign up and get some sort of APIKEY these days for their application / web site. Maybe this could be an approach to better control the overall tile usage, or when referrers and user agents are unavailable and most importantly, to have a feedback channel rather than a black hole. I don't know if we should go as far as providing some paid plans (and create some direct competition to commercial providers). We could as well just defer power users to commercial offerings if they have used up their quota. Also, paid plans would raise questions about SLA's. Regarding "legacy" users: maybe we could treat all those clients without APIKEY as some kind of "micro plan" user with very low usage limits, giving developers some incentive for signing up. On a Mapzen page, I read that those APIKEYs can even be used in a way that they're cache friendly, which was one of my initial concerns before writing this. |
There are many techniques to limit or control access, and I'd like to have a discussion about them after we've reached some kind of consensus on what the policy for access should be. At the moment there's a very wide range of views about who we should be serving tiles to - just mappers, everyone, etc? I think perhaps the question of "should we do activity X or not" is very different from the dilemma we're faced when running the servers, as it suggests that the alternative to "action X" is inaction, rather than some other improvement to the OSM servers. Although not perfect, some better questions might be:
Of course, in real life it's neither is a binary decision. Underneath all the real-world complexity, however, it is true that time or money can only be spent once, and there never could be enough of either. If you'd like to help out then:
|
Hi, a lot of important considerations have been made in this thread already and i have little to add there. One observation: From the numbers @zerebubuth provided it seems the long tail is - as to be expected - fairly thick at the beginning - in other words: the top 100-200 of the 24k sites using tiles account for probably at least 20 percent of the total traffic of the tile servers. It might be a good idea to have a closer look at these, maybe even make this list publicly available on a regular basis. Since we are only talking about tiles that are not served from cache here i have to admit i have very little idea what to expect in such a list. |
I quite like the idea of prioritising tile rendering for contributors and puting everybody else on the lazy-render queue (assuming that such prioritising would help performance). A few people here have suggested url schemes to achieve this, but what about using the ip address of actual contributors instead ? When a changeset or trace is uploaded, remember the uploader's ip for 24 hours. When scheduling a render, lookup the requester's ip in the contributor db. There are a few key/value stores that support sets, TTLs, distribution, and HA, making the db part easy enouh, |
@vincentdephily : Interesting idea to recognize actual mappers and give them priority in rendering! IPs by themselves may not be such a sharp indicator, since there are mechanisms like NAT and Carrier-grade NAT that may give multiple users (even at different locations) the same IP address. Editors, e.g. JOSM know the credentials of its users. How about using OAuth to enable this priority access? A word of caution: May such a mechanism invite people to do dummy edits to gain priority access? |
The idea of two tile services or linking edits to rate-limiting doesn't solve the problem of resources, it makes it worse! With more moving components, even more time will be spent on this secondary service instead of primary ones like the API. Linking edits with rate-limiting would require development work, and I'm pretty sure most of the developers involved do not consider it a priority. |
I volunteer to help maintaining and making software adjustments to the tile service, if overload on current sysadmin team is really the issue. |
A better way to go would be to let people contribute cachies proxies, without necessarily runing their own renderers (which is much more demanding). As these proxies installed by some web/app developper will cumulate requests for many users of these website/apps, their tile usage (from the same few source IPs) would would much higher than normal users, they currently cannot do that without being blocked for excessive use. However, if the proxy also allows people around the world using these caches, not just the users of the webapp/site), we could transform it into a more powerful CDN than the existing CDN. But now is the usage exploding for serving prerendered tiles, or because there are too many tiles to redraw in renderers for the OSM.org's Mapnik style ? This is a second problem that a simple CDN cannot solve: in the past there was Tiles@Home for delegating the rendering to many contributed renderers but this has stopped. So How can we improve the efficiency of caches and help them coordinate better to distribute the workload/IO storage/network usage? Isn't it a more general problem not specific of OSM but to any web application whose content will be delivered and used by many people (including for example wikimedia wikis)? Isn't there any discussion in the development of wellknown caching proxies? Is there a way to help them synchronize each other faster and more efficiently (possibly using a sideband protocol). Some P2P protocol could help building and securing a large network of interconnected proxies, with signed contents (using some "blockchain" ?) to avoid other kind of abuses (for relaying other illegal or pirated contents). But may be there's also some research in this domain in the W3C or HTTP workgroup. For software distribution now everyone uses efficient protocols such as Rsync and Torrents. These protocols are not easy to implement on small mobile devices (not really P2P capable), but if any developer of a web app or website wants some performance and reliability of their app for their mobile users, they need to do something and implement some serverside helpers: implmenting a cache and synchronizing it efficiently while still being able to use the power of the rest of the network and contribute to it by some share is something to think about. For now developers fear installing their servers. But we have to convince the developers that things won't go better if all the hard job is systematically handed out and taken for free from another highly demanded source. Before we have a better protocol, all we can do (and should do now) is to extend our too small CDN with more servers worldwide. Not all these caches need to be directly managed by the foundation, we just need an agreement and some service quality monitoring tools to avaluate the best working mirror. |
Participating in this thread super-late but I only saw it today, so: I feel strongly in agreement with @Komzpa’s Nov 1 comment that OSM tiles offer both infrastructure and persuasion for the project, as well as the Oct 28 comment suggesting that tile usage offers an opening to seek support and new collaborators. I don't love a policy change which limits the usability of OSM tiles like this. It’s clear from OWG members participating here that this is a source of stress for the services, so I would prefer to see a strategy which seeks to expand technical and financial support for all tile usage. It does make sense to offer two tiers of tile usage, though: one for editors who need to view the latest updates (10-15%?) and another for sites and apps who don’t anything close to that level of freshness. The boundary between the two will always be fluid and hard to define, but supplying a right way to use tiles for games and apps may help underscore the value of the tiles as well as elicit support for their hosting. |
@migurski Really good to see you in this thread as I think you're probably better positioned than pretty much anyone here to help make things better. It's clear that the demise of MQ Open's unmetered tile service caused additional stress to OSMF's tileservers, which since the early days of the project have principally been provided "for editors who need to view the latest updates". It's undisputed, I think, that there are wider benefits if tiles made from OSM data are readily available to young/small-scale "sites and apps", though there is rightly much concern at heavy users (Pokemon Go trackers, etc.) freeloading off services which are funded principally through donations. I would therefore suggest that this is an ideal opportunity for Mapzen, as the best-funded organisation with significant raster tile infrastructure expertise, to step into the role formerly occupied by MQ Open and provide an OSM-based free-access tile product of the sort that you and @Komzpa are advocating. If you were feeling generous you could eschew your own branding and arrange with OWG for it to be available at public-tile.openstreetmap.org, but actually most users seemed happy with the MQ Open arrangement and maybe Mapzen-provided free tiles with Mapzen branding would fulfil everyone's wishes here. How about it? |
I think it’d be really interesting, at least as a speculative conversation. @zerebubuth’s team are the caretakers of Mapzen’s vector tile infrastructure. I’m also very personally familiar with the role of MQ Open — for many years, my standard recommendation to Code for America fellowship projects depended MQ because it was a reliable, free source of easy commodity tiles. These days, Stamen seems to fill that role with Toner and Terrain cartography. None of these options look quite like the OSM-Carto designs, though, which strikes me as important to deliver on the promotional idea. Just anecdotally, I've received mailers from local real estate agents at my house who’ve screen-capped OSM tiles and provided correct attribution. It was surprising in a good way. It also sounds from Evgen’s suggestion and further comments from Grant that there is a technical opportunity here to introduce shared caches? I respect that Matt asked for policy-only opinions here, but this does feel like a situation where the right policy will be strongly influenced by feedback from operations, fundraising, and technology. |
I have a couple of points against going to external vendor for this kind of service:
Regarding building more automated analysis tools mentioned in @zerebubuth's comment, I've sent a request to open raw munin data to operations@osmfoundation.org shortly before New Year - I hoped to dig it during New Year holidays, I'm still waiting for a response. |
I think this issue has stagnated. There is clearly a lot of opposition to changing the tile usage policy in the way that we hoped to, but no clear way forward is apparent either that resolves any of the problems that OWG continues to face. I will close this issue since there is nothing more to be done here now, and there has been no progress in the last six months. I encourage everyone to read through the list of ways you can help that @zerebubuth posted earlier. |
Blocking users of apps that are not identifiable would be a bad thing. Yes we should think about creating an alternate tile service with long cache and less frequent updates, and its own CDN: instead of blocking users, this could easily be used to redirect immediately requests from unidentifiable sources, or from sources that are known to be imprecise ("Android" for any mobile app not using any API key or tuning their referrer or not using any OAuth user identification), or in any case as a fallback (where the live renderer cannot support the current charge, including for editing users, or if an authenticated user is using more than some threshold). The libraries used in sommon framework should be able to detect and manage the switch to another cache, without failing and without mixing the two caches but I don't know if this is possible with existing versions of Leaflet or similar frameworks to setup them so they support alternate sources with distinct cache delays. The HTTP(S) protools allows setting cache properties but only per resource, and not globally from a source or domain, or for more specific paths (such as zoom levels): it would allow avoiding requery a "fast source" when it would reply that there's another preferable source to use globally for the same client or for a specific subpath. This could be defined by sending documented cookies, or by adding a custom "service info" API to tileservers before using them to query actual tiles for a specific zoom level). May be that API could be also tuned to support selection of time servers per region (a bounding box of x/y tile positions: the API could return a list of bounding boxes, each one associated to the prefered source to query and its delay). But existing frameworks (Leaflet, and so on...) should support it (ideally it should be proposed and documented in the TMS protocol so that common application frameworks will support it by default) |
Another footnote: opencyclemap (aka. thunderforest) started to watermark tiles requested without an API key actually informing the users that what they were asked to do. It would be possible to notify instead of blocking the undefined user agents about the problem and check the stats whether anything changes. |
The usage of tile OSM tile infrastructure by the OpenStreetMap website was recently measured at approximately 11% of tile server output, so the vast majority of OSM's tiles are rendered to support 3rd party sites and apps. Although we want to support use of OpenStreetMap data, there are some use cases for which OSM and the surrounding ecosystem receives little or no benefit.
It has been suggested that OWG introduce two new rules to the tiles acceptable usage policy:
The technical implementation of this is a detail, so please keep discussion to the policy of whether or not we want to begin to restrict usage in this way.
The text was updated successfully, but these errors were encountered: