-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
50x errors and open files limit #20
Comments
I think that |
I have concurrent_requests set to 16. Could 8*16=128 concurrent requests count as "too aggressively", keeping in mind, that aquarium uses tor (I suppose for bypassing emitting requests from one ip"). Either way, I reduced 16 concurrent requests to 8 and increased splash_wait from 10 to 50. At the first sight it reduced number of 50x errors. But still have some "rows" of 20-30 50x requests, but less often. Will keep an eye on final results after finish. Another question: If I see this in logs:
Does it mean, that resource has returned 503 or is it status code from splash? In aquarium logs I see, that containers are restarting pretty often. So could it be, that request is routed to container, that is in process of restarting? |
Hi,
I am using aquarium to scrape some data from websites. My configuration is:
For the several list of sites I am experience some issues. Scrapy logs are showing following info:
<container_name> | 2019-02-04 13:23:50 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET <site_name> via http://172.22.0.1:8050/render.json> (failed 1 times): 503 Service Unavailable
for first 20-30 urls; then scrapy successfully scrapes for about 3 or 5 urls and then again 20 to 30 503 errors. Also there are 502 and 504 errors, but in smaller amounts.
At the same time I see following logs from aquarium:
splash0_1 | 2019-02-04 13:23:50.346828 [-] Open files limit: 1048576
splash0_1 | 2019-02-04 13:23:50.346965 [-] Can't bump open files limit
Also, idk if it's important, user, that starts docker process has 1024 and 4096 soft and hard limit respectively.
At the end of scraping there are following results:
At the same time with same setup another sites have been scraped successfully.
Also, on successful scraping there are only around 100k files in the output folder, so even if scrapy does not close all of the opened files I don't see the reason why 1 million limit on open files should be bumped.
What could be the issue?
The text was updated successfully, but these errors were encountered: