Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #1083: Deal with brokers that disappear, reappear with different IP address #1085

Closed

Conversation

kaiterramike
Copy link
Contributor

When KafkaClient connects to a broker in _maybe_connect, it inserts into self._conns a BrokerConnection configured with the current host/port for that node. The BrokerConnection remains there forever, though, so if the broker's IP or host ever changes, KafkaClient has no way to deal with this.

The fix is to compare the latest metadata with the current node's connection, and if the host/IP has changed, decommission the old connection and allow a new one to be created.

There's also a common race condition on broker startup where the initial metadata request sometimes returns an empty list of brokers, but subsequent requests behave normally. So, we must deal with broker being None here. This change is conservative in that it doesn't remove the connection from self._conns unless the new broker metadata contains an entry for that same node with a new IP/port.

When KafkaClient connects to a broker in _maybe_connect, it inserts into self._conns a BrokerConnection configured with the current host/port for that node.  The BrokerConnection remains there forever, though, so if the broker's IP or host ever changes, KafkaClient has no way to deal with this.

The fix is to compare the latest metadata with the current node's connection, and if the host/IP has changed, decommission the old connection and allow a new one to be created.

There's also a common race condition on broker startup where the initial metadata request sometimes returns an empty list of brokers, but subsequent requests behave normally.  So, we must deal with broker being None here.  This change is conservative in that it doesn't remove the connection from self._conns unless the new broker metadata contains an entry for that same node with a new IP/port.
@kaiterramike
Copy link
Contributor Author

Tested with a running producer and consumer connected to a single Kafka broker, which I repeatedly killed and restarted, sometimes with a new IP. Let me know if there's any docs that describe how to run the other automated tests.

In testing, producer and consumer always reconnected, but under some circumstances a few messages were lost: depending on when the reconnect happens during the broker's boot procedure, the error message returned could be different, so sometimes Sender._can_retry will return true and sometimes not. Different issue.

@jeffwidman
Copy link
Collaborator

jeffwidman commented Apr 25, 2017

This fixes #1083

under some circumstances a few messages were lost: depending on when the reconnect happens during the broker's boot procedure, the error message returned could be different, so sometimes Sender._can_retry will return true and sometimes not. Different issue.

Can you file a issue for this? Don't want to lose track of it.

@dpkp
Copy link
Owner

dpkp commented Apr 29, 2017

Looks good! Test failures are unrelated, caused by a pylint bug.

@dpkp
Copy link
Owner

dpkp commented Apr 29, 2017

I've fixed the pylint errors in master. Can you rebase to get a clean test run?

@jeffwidman
Copy link
Collaborator

bump @originsmike

@dpkp
Copy link
Owner

dpkp commented Jun 17, 2017

_maybe_connect is called before every network operation, so I think we should be very conservative about when to run this code. Can we skip this check if the existing socket is connected and let a normal connection error propagate if a broker disappears? Once the socket is disconnected, the next _maybe_connect should handle removing the stale conn object.

dpkp pushed a commit that referenced this pull request Jun 19, 2017
…1085)

When KafkaClient connects to a broker in _maybe_connect,
it inserts into self._conns a BrokerConnection configured
with the current host/port for that node.  The BrokerConnection
remains there forever, though, so if the broker's IP or host
ever changes, KafkaClient has no way to deal with this.

The fix is to compare the latest metadata with the current
node's connection, and if the host/IP has changed, decommission
the old connection and allow a new one to be created.

There's also a common race condition on broker startup where
the initial metadata request sometimes returns an empty list
of brokers, but subsequent requests behave normally.  So, we
must deal with broker being None here.  This change is conservative
in that it doesn't remove the connection from self._conns unless
the new broker metadata contains an entry for that same node
with a new IP/port.
@dpkp
Copy link
Owner

dpkp commented Jun 19, 2017

Pushed to master w/ fixup to only check disconnected nodes. Thanks!

@dpkp dpkp closed this Jun 19, 2017
88manpreet pushed a commit to Yelp/kafka-python that referenced this pull request Aug 25, 2017
…pkp#1085)

When KafkaClient connects to a broker in _maybe_connect,
it inserts into self._conns a BrokerConnection configured
with the current host/port for that node.  The BrokerConnection
remains there forever, though, so if the broker's IP or host
ever changes, KafkaClient has no way to deal with this.

The fix is to compare the latest metadata with the current
node's connection, and if the host/IP has changed, decommission
the old connection and allow a new one to be created.

There's also a common race condition on broker startup where
the initial metadata request sometimes returns an empty list
of brokers, but subsequent requests behave normally.  So, we
must deal with broker being None here.  This change is conservative
in that it doesn't remove the connection from self._conns unless
the new broker metadata contains an entry for that same node
with a new IP/port.
88manpreet pushed a commit to Yelp/kafka-python that referenced this pull request Aug 25, 2017
88manpreet pushed a commit to Yelp/kafka-python that referenced this pull request Aug 25, 2017
…pkp#1085)

When KafkaClient connects to a broker in _maybe_connect,
it inserts into self._conns a BrokerConnection configured
with the current host/port for that node.  The BrokerConnection
remains there forever, though, so if the broker's IP or host
ever changes, KafkaClient has no way to deal with this.

The fix is to compare the latest metadata with the current
node's connection, and if the host/IP has changed, decommission
the old connection and allow a new one to be created.

There's also a common race condition on broker startup where
the initial metadata request sometimes returns an empty list
of brokers, but subsequent requests behave normally.  So, we
must deal with broker being None here.  This change is conservative
in that it doesn't remove the connection from self._conns unless
the new broker metadata contains an entry for that same node
with a new IP/port.

# Conflicts:
#	kafka/client_async.py
88manpreet pushed a commit to Yelp/kafka-python that referenced this pull request Aug 25, 2017
88manpreet pushed a commit to Yelp/kafka-python that referenced this pull request Oct 6, 2017
…pkp#1085)

When KafkaClient connects to a broker in _maybe_connect,
it inserts into self._conns a BrokerConnection configured
with the current host/port for that node.  The BrokerConnection
remains there forever, though, so if the broker's IP or host
ever changes, KafkaClient has no way to deal with this.

The fix is to compare the latest metadata with the current
node's connection, and if the host/IP has changed, decommission
the old connection and allow a new one to be created.

There's also a common race condition on broker startup where
the initial metadata request sometimes returns an empty list
of brokers, but subsequent requests behave normally.  So, we
must deal with broker being None here.  This change is conservative
in that it doesn't remove the connection from self._conns unless
the new broker metadata contains an entry for that same node
with a new IP/port.
88manpreet pushed a commit to Yelp/kafka-python that referenced this pull request Oct 6, 2017
88manpreet pushed a commit to Yelp/kafka-python that referenced this pull request Jul 16, 2018
…pkp#1085)

When KafkaClient connects to a broker in _maybe_connect,
it inserts into self._conns a BrokerConnection configured
with the current host/port for that node.  The BrokerConnection
remains there forever, though, so if the broker's IP or host
ever changes, KafkaClient has no way to deal with this.

The fix is to compare the latest metadata with the current
node's connection, and if the host/IP has changed, decommission
the old connection and allow a new one to be created.

There's also a common race condition on broker startup where
the initial metadata request sometimes returns an empty list
of brokers, but subsequent requests behave normally.  So, we
must deal with broker being None here.  This change is conservative
in that it doesn't remove the connection from self._conns unless
the new broker metadata contains an entry for that same node
with a new IP/port.
88manpreet pushed a commit to Yelp/kafka-python that referenced this pull request Jul 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants