Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too frequent download attepts #577

Open
woodpeck opened this issue Aug 20, 2024 · 6 comments
Open

Too frequent download attepts #577

woodpeck opened this issue Aug 20, 2024 · 6 comments
Labels

Comments

@woodpeck
Copy link

I maintain the site download.geofabrik.de.

In my log files I am seeing a client identifying themselves as "mediagis/nominatim-docker:4.2.4" which attempts to download the same .osm.pbf file every 15 seconds (on average - this has been going on for days now). The server replies with a "304 not modified" response.

I don't know if this is standard behaviour or if there is maybe some sort of malfunction.

Since the download.geofabrik.de server only updates files once per day, it does not make sense to ask for an new file every 15 seconds. A site intent on speedy updates would not ask for full .osm.pbf files anyway but consume updates instead.

Sites making thousands of requests per day for a prolonged time will be blocked from accessing download.geofabrik.de.

It would be good if mediagis image could make sure not to issue an undue number of requests to download.geofabrik.de.

@woodpeck woodpeck added the bug label Aug 20, 2024
@leonardehrenfried
Copy link
Collaborator

leonardehrenfried commented Aug 20, 2024

Hi Frederik,

first of all, thanks for maintaining download.geofabrik.de and I'm sorry this image is giving you grief.

We have had a very similar problem previously: #416

The bottom line is that this image makes it very easy to build a nominatim instance. This leads to an unfortunate situation where less skilled users set up installations that lead to the sort of problems you're seeing here.

My guess is that an installation encounters some sort of error and then retries the installation forever. The nature of container images is that there is no knowledge of the previous instance. To put it bluntly, a careless user ruins the experience for everyone else.

Can you block the user agent by IP address? That would be totally reasonable from my point of view.

@woodpeck
Copy link
Author

Yes, once an IP has racked up a 5-digit access counts within a few days that IP will usually be blocked. The sad thing is that we can't convey a message back to them saying "you have been blocked because of X".

I can't see details about the request in my log file but since the server sends a "not modified" I guess the request must contain an "if-modified-since" header, so the client must have some state else it could not know which timestamp to put in the if-modified-since?

@leonardehrenfried
Copy link
Collaborator

Again, I'm really sorry for causing you problems.

I think rather than fiddle with the setup and deal with this problem for a long time, I would just make the image repository private so that some obstacles are being erected for new users. I don't have the energy and attention to keep dealing with careless users anymore.

cc @philipkozeny

@woodpeck
Copy link
Author

That would probably be a somewhat extreme reaction ;) a short look at the log files tells me that i've had over 100 different IPs asking for something with a mediagis user agent in the last 24 hours, and only 10 of them made more than 100 requests (5 made more than 1000) - you'd probably throw the baby out with the bathwater.

@philipkozeny
Copy link
Collaborator

Apologies for the trouble you're experiencing with the image! Just a quick question: Are you noticing this behavior with the latest 4.4 version, or only with earlier versions? I ask because we recently removed the "restart: always" option from the contrib Docker template.

@leonardehrenfried
Copy link
Collaborator

That's a good point. We probably want to remove the restart:always from all of our docs, even the old ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants