Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: invalid literal for int() with base 16: b'' #505

Closed
edsu opened this issue Jul 6, 2021 · 5 comments
Closed

ValueError: invalid literal for int() with base 16: b'' #505

edsu opened this issue Jul 6, 2021 · 5 comments

Comments

@edsu
Copy link
Member

edsu commented Jul 6, 2021

I'm seeing this again while running twarc stream and am confused why twarc isn't catching/restarting:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 685, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 425, in _error_catcher
    yield
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 752, in read_chunked
    self._update_chunk_length()
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 689, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/models.py", line 750, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 560, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 781, in read_chunked
    self._original_response.close()
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/lib/python3/dist-packages/urllib3/response.py", line 443, in _error_catcher
    raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ed/.local/bin/twarc2", line 8, in <module>
    sys.exit(twarc2())
  File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/decorators.py", line 27, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/home/ed/.local/lib/python3.8/site-packages/twarc/decorators2.py", line 127, in __call__
    return self.f(*args, **kwargs)
  File "/home/ed/.local/lib/python3.8/site-packages/twarc/command2.py", line 1155, in stream
    for result in T.stream(event=event):
  File "/home/ed/.local/lib/python3.8/site-packages/twarc/client2.py", line 539, in stream
    for line in resp.iter_lines():
  File "/usr/lib/python3/dist-packages/requests/models.py", line 794, in iter_lines
    for chunk in self.iter_content(chunk_size=chunk_size, decode_unicode=decode_unicode):
  File "/usr/lib/python3/dist-packages/requests/models.py", line 753, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
@igorbrigadir
Copy link
Contributor

@edsu
Copy link
Member Author

edsu commented Jul 8, 2021

On the one hand it's good there's a fix coming. On the other I'm disappointed I haven't had time to work on a fix in twarc that should be able to catch this and reconnect.

@igorbrigadir
Copy link
Contributor

Yes, on the bright side, #477 should help address future disconnects

@edsu
Copy link
Member Author

edsu commented Jul 9, 2021

I think the twarc.decorators2.catch_requests_exceptions isn't working with twarc.Twarc2.stream because even though the call to Twarc2.stream() is getting wrapped, it returns a response which is then streamed from with response.iter_lines(). It is during this iteration using iter_lines() that we need additional exception handling.

edsu added a commit that referenced this issue Jul 9, 2021
This commit reuses twarc.decorators2.catch_request_exceptions in the
context of streaming responses with iter_lines. Hopefully this will
address #505 but it will require testing by people who continue seeing
the error in the wild.
edsu added a commit that referenced this issue Jul 9, 2021
This commit adds an internal method Twarc2._stream which is used by both
the Twarc2.stream and Twarc2.sample methods to connect to the Twitter
API and stream results. It is wrapped with the catch_request_exceptions
decorator, which ought to catch an request errors, log them and retry,
and hopefully this will finally address #505. 🤞
edsu added a commit that referenced this issue Jul 11, 2021
The catch_request_exceptions decorator wasn't working well with
streaming responses which return an object successfully, but later throw
an exception when iter_lines() is called and iterated over. Now there is
a somewhat duplicative bit of logic in Twarc2._stream for catching
request exceptions, trying again with an exponential backoff. Refs #505
@igorbrigadir
Copy link
Contributor

I think this is fixed now - https://twittercommunity.com/t/filtered-stream-request-breaks-in-5-min-intervals/153926/20?u=igorbrigadir on twitter's end. Unfortunately if there's still doubt about our own error handling, it may not be possible to reproduce, unless we mock a connection and manually insert some bogus b'' data or something like that.

@edsu edsu closed this as completed Jul 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants