Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing of infinite loop during connection setup #2084

Closed
wants to merge 2 commits into from
Closed

Fixing of infinite loop during connection setup #2084

wants to merge 2 commits into from

Conversation

sibiryakov
Copy link
Contributor

@sibiryakov sibiryakov commented Jul 15, 2020

Here's a stack trace we had our logs flooded with.

[07/15/2020 08:51:14.799: ERROR/kafka.producer.sender] Uncaught error in kafka producer I/O thread
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/kafka/producer/sender.py", line 60, in run
self.run_once()
File "/usr/local/lib/python3.6/site-packages/kafka/producer/sender.py", line 160, in run_once
self._client.poll(timeout_ms=poll_timeout_ms)
File "/usr/local/lib/python3.6/site-packages/kafka/client_async.py", line 580, in poll
self._maybe_connect(node_id)
File "/usr/local/lib/python3.6/site-packages/kafka/client_async.py", line 390, in _maybe_connect
conn.connect()
File "/usr/local/lib/python3.6/site-packages/kafka/conn.py", line 426, in connect
if self._try_handshake():
File "/usr/local/lib/python3.6/site-packages/kafka/conn.py", line 505, in _try_handshake
self._sock.do_handshake()
File "/usr/local/lib/python3.6/ssl.py", line 1077, in do_handshake
self._sslobj.do_handshake()
File "/usr/local/lib/python3.6/ssl.py", line 689, in do_handshake
self._sslobj.do_handshake()
OSError: [Errno 0] Error

The problem is Python 3.6 is returning OSError, which is not expected. Such exception is propagated to the caller and code making recycling of such connection is not executed. Therefore, Producer is guaranteed to get the same exception on a next call to poll().

Throwing of OSError doesn't seem to be documented even in latest Python docs. See 3.8 docs, but there are signs of it in 3.8 source code.

This change is Reviewable

Copy link

@moshez moshez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @sibiryakov)


kafka/conn.py, line 514 at r1 (raw file):

            log.warning('SSL connection closed by server during handshake.')
            self.close(Errors.KafkaConnectionError('SSL connection closed by server during handshake'))
        # Other SSLErrors will be raised to user

Sounds like this comment is no longer accurate, unless there are other possible SSL errors?

Copy link

@moshez moshez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions (waiting on @sibiryakov)


kafka/conn.py, line 511 at r1 (raw file):

            pass
        except (SSLZeroReturnError, ConnectionError, TimeoutError, SSLEOFError, ssl.SSLError, OSError) as e:
            log.exception(e)

Why do we need the traceback here? It doesn't add information, we know what the problem is -- it's that an error filtered up from the networking level. When a problem happens, this will spew a lot of logs.

@sibiryakov
Copy link
Contributor Author

sibiryakov commented Aug 7, 2020

@moshez

Sounds like this comment is no longer accurate, unless there are other possible SSL errors?

No, it is. There are other SSL errors https://docs.python.org/3/library/ssl.html?highlight=ssl#exceptions

@sibiryakov
Copy link
Contributor Author

continuation is here #2100

@sibiryakov sibiryakov closed this Aug 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants