Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhandled HTTPErrors When Negative Offset Not Supported #15

Closed
EmmaJaneBonestell opened this issue Jan 12, 2023 · 5 comments · Fixed by #16
Closed

Unhandled HTTPErrors When Negative Offset Not Supported #15

EmmaJaneBonestell opened this issue Jan 12, 2023 · 5 comments · Fixed by #16

Comments

@EmmaJaneBonestell
Copy link

EmmaJaneBonestell commented Jan 12, 2023

When trying to use remotezip with PyPi, I discovered that while their server supports range requests, it does not support using a negative offset to get {content-length - bytes}-{content-length}.

When raise_for_status is called, an HTTP 501 error is returned (it could theoretically also be a 405) and remotezip aborts.

def _request(self, kwargs):
if self._session:
res = self._session.get(self._url, stream=True, **kwargs)
else:
res = requests.get(self._url, stream=True, **kwargs)
res.raise_for_status()

Reproducer:

from remotezip import RemoteZip

url = "https://files.pythonhosted.org/packages/71/6d/95777fd66507106d2f8f81d005255c237187951644f85a5bd0baeec8a88f/paramiko-2.12.0-py2.py3-none-any.whl"
  with RemoteZip(url) as wzip:
      wzip.extract('METADATA')

I was going to do a pull request, but not being super good with Python myself, I found it was not obviously fixible (to me at least) with a simple try/catch since it's going through constructors, etc.

Checking for a 501 or 405 error before raise_for_status and falling back to getting self.__file_size by a separate http request for content-length should fix this.

Something like:

def _request(self, kwargs): 
     if self._session: 
         res = self._session.get(self._url, stream=True, **kwargs) 
     else: 
         res = requests.get(self._url, stream=True, **kwargs) 
if res._status_code == 501 or 405:
    (do whatever needs to be done so that
    self._file_size = requests.get(self._url, stream=True).headers['Content-Length']
    )
res.raise_for_status()

Edit: I also submitted a bug/feature request to PyPI/warehoise about this on their server; I don't anticipate they will implement it quickly, but if they do, the given reproducer may not work.

@EmmaJaneBonestell
Copy link
Author

EmmaJaneBonestell commented Jan 12, 2023

There's probably a much better and less verbose way to do this, but I did fix it with the following:

def _request(self, kwargs):
        if self._session:
            res = self._session.get(self._url, stream=True, **kwargs)
        else:
            res = requests.get(self._url, stream=True, **kwargs)
        if res.status_code == 501 or 405:
            if self._session:
                content_size = self._session.get(self._url, stream=True).headers['Content-Length']
            else:
                content_size = requests.get(self._url, stream=True).headers['Content-Length']
            size = int(content_size) - int(kwargs['headers']['Range'].lstrip("bytes=-"))
            new_range = str(size) + '-' + content_size
            kwargs['headers']['Range'] = "bytes=%s" % (new_range)
            if self._session:
                res = self._session.get(self._url, stream=True, **kwargs)
            else:
                res = requests.get(self._url, stream=True, **kwargs)
        res.raise_for_status()
        if 'Content-Range' not in res.headers:
            raise RangeNotSupported("The server doesn't support range requests")
        return res.raw, res.headers['Content-Range']

@gtsystem
Copy link
Owner

Hi, I have few questions:

  1. Why do you think 405 is a valid code for this scenario? My interpretation of "Method Not Allowed" is that a specific http method is not allowed (example GET, POST, etc.), but this doesn't seems to be the case.
  2. In general this solution double the number of requests needed to get a file: 1. request negative range 2. fallback to request file size 3. request central directory 4. request the actual file. Do you think it's still better than downloading the full payload?
  3. Would it be reasonable to pass in the constructor a parameter to inform the library that negative ranges are not supported? In this case we will need 3 requests instead of 4..

@EmmaJaneBonestell
Copy link
Author

@gtsystem

  1. After reading RFC-9110 more closely, I agree it should just be 501, not 405.
  2. I think whether or not it's worth the extra requests would depend on how large the entire file is, how bandwidth constrained the user is, etc. For small files with my quick fix, a crude benchmark showed a less than 0.4 second overhead vs fetching the entire file.
  3. A constructor parameter would be much better, since not supporting negative ranges seems to be uncommon, it's simple for end-users to add that parameter, and it would reduce the number of requests.

gtsystem added a commit that referenced this issue Jan 15, 2023
… miss support for suffix range requests.
gtsystem added a commit that referenced this issue Jan 15, 2023
* v0.12.0: Fixes #15. Add new parameter `support_suffix_range` for backends that miss support for suffix range
@gtsystem
Copy link
Owner

@EmmaJaneBonestell, added support_suffix_range in v.0.12.0 (See usage example below). Let me know if it solve your problem.

>>> url = "https://files.pythonhosted.org/packages/71/6d/95777fd66507106d2f8f81d005255c237187951644f85a5bd0baeec8a88f/paramiko-2.12.0-py2.py3-none-any.whl"
>>> from remotezip import RemoteZip
>>> with RemoteZip(url, support_suffix_range=False) as wzip:
...   wzip.printdir()
... 
File Name                                             Modified             Size
paramiko/__init__.py                           2022-11-04 22:23:48         3988
paramiko/_version.py                           2022-11-04 22:32:40           81
paramiko/_winapi.py                            2021-11-28 21:07:48        11358
paramiko/agent.py                              2022-11-04 22:23:48        13221
paramiko/auth_handler.py                       2022-11-04 22:23:48        36804
paramiko/ber.py                                2022-11-04 22:23:48         4355
paramiko/buffered_pipe.py                      2022-11-04 22:23:48         7468
paramiko/channel.py                            2022-11-04 22:23:48        49338
...

@EmmaJaneBonestell
Copy link
Author

@gtsystem
Works perfectly! Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants