-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider replacing the requests library with pycurl #244
Comments
Support this! Have been having ConnectionError timeouts and max retries exceeded. Passing in keyword args to Requests lib through get_session is kind of a hassle (and not sure if it will be any better than the defaults). |
I'd be in favor of this as well. I've been running into similar timeout issues downloading large files (> 2GB). A simple hack that seems to be working for me is to manually crank up the timeout here: internetarchive/internetarchive/files.py Line 243 in 98be93d
I just noticed that a much longer timeout is used in the upload case: internetarchive/internetarchive/item.py Line 592 in d093cde
request_kwargs could be similarly exposed in the download case as a short term solution. |
I wrote a library similar to internetarchive for my own use and use libcurl in the background for uploads and downloads. Using Curl generally makes the whole transfer very reliable and would probably avoid many of the transfer related errors which are listed as issues on this repository, but there are some problems due to the Internet Archive's error codes or responses e. g. it is nice to have Curl's Retry-After support for uploads in case of a too high server load and the server correctly returns a Retry-After response in this case which Curl can handle without any further intervention. But the server also wrongly returns a status 500 (the proper code would be 400) if the identifier is already in use, which results in an endless loop if not handled separately and when bundled with Curl's retry behaviour, so there are definitely some caveats, which are mostly due to errors on the Archive's side. I have not looked at PyCurl in particular, but it can be a bit unidiomatic to use a wrapper for libcurl instead of using a native web request library. But that depends entirely on how well PyCurl is written and might not be a problem, just something to keep in mind. |
Thanks for the feedback @maxz, I appreciate it! I haven't really considered this change in a while, and not sure it's something that I'd still consider doing at this time. I'll keep this open in case anyone else has feedback though. Thanks again! |
I keep thinking about ways to move to PyCurl, although I did not come to a satisfying solution yet. One way would be to write a wrapper which provides the same interface that the current session classes provide. The wrapper would be a bit of a dirty solution which would further increase the cruft that we already have in some parts of The wrapper and the rewrite could also be combined, so that the wrapper serves as a stand-in as long as the relevant parts have not been rewritten. As previously mentioned, I think it would have some real benefits for the reliability of transfers. Did you drop the idea because it is so much work or what was the matter, @jjjake? |
i know this is a bit of a dumb reply but, can a requests-compatible pycurl would work on this? like this |
It probably could, depending on how well it covers all of Curl and how well it follows the behaviour and API of Requests. But a very important factor would also be how easily we can access the underlying Curl API, because if it only is Requests with another base, then we won't gain the features necessary for better up- and download control. It's probably not a feature that anyone would want to work on right now, especially since we are currently converting everything to be type annotated and still have some Python 2 cleanup to do. |
I'm also considering alternatives to pycurl, but so far it seems to be the best fit. These changes would be mostly invisible to users (e.g. return Response objects similar to the requests.Response objects that are now returned, etc.).
Feedback is welcome.
The text was updated successfully, but these errors were encountered: