-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow-down on 429 maybe not optimal #1750
Comments
Another thing i just noticed is that when you visit one of the URLs that mwoffliner gets rate-limited on in the zimfarm... you also see the 429. So did their CDN / cache, cache the 429 and shows it to everyone, or is the backend rate-limiting the cache?
|
@uriesk I agree that current system is unoptimal. But all of this really depends how the WAF is configured. It can vary from Mediawiki instance to other instance. We should investigate if there is not a bug in HTTP 429 response caching at Wikimedia. |
Now the image gives Internal Server Error 500. Causing mwoffliner to slow down for no reason. |
Looking at it a bit more, most cases look like this: 500 Error -> Immediately try again -> 500 Error -> try again -> 500 -> try again -> ... -> Rate Limited So a basic improvement would be to wait a minute before retrying after error. The weird upstream caches might not have much of an effect afterall.
|
I'm in favour of a proper backoff strategy using fibonacci or similar. Current strategy is primitive. I guess it should be doable using a module which already exists. But at the core of the problem is the HTTP 500. Like you said, errors by generating the thumbnail are pretty common on Wikimedia image backend. |
We will never get a satisfying backoff strategy, cause that |
@uriesk agree that images might better be treated differently than text content |
Currently, when we get a 429, we slow down by running less downloads in parallel. So basically by indirectly limiting bandwidth.
But if the rate-limit upstream is by requests per minute, this is not optimal, because we will eventually run into lots of small files, hitting the limit sooner and slowing down much further than we have to, when we eventually download larger files again.
Do we know how the upstream wikimedia rate-limit works on downloading files?
The text was updated successfully, but these errors were encountered: