Remove external calls to disconnectBroker #338

eapache · 2015-03-12T17:13:20Z

It is now only called from one place in the client. This looks simple but is
actually super-subtle, and depends on lazy broker connections (PR #309).

disconnectBroker does a whole bunch of different things:

calls Close on the broker connection
adds the address to the internal deadBrokerAddrs map even if the broker is
not a seed, which I think is wrong, since resurrectDeadBrokers will use it to
repopulate the seedBrokerAddrs list
rotates seedBrokers (if the broker was a seed broker)
removes it from the brokers map (otherwise)

In the producer and consumer where we used to call disconnectBroker:

We now call Close directly on the broker.
The broker we are dealing with is not a seed broker, so the seedBrokers do not
need rotating, and I don't think it's a problem that it no longer gets added
to deadBrokerAddrs.
The reason we removed it from the broker map was so that the next request for
that broker would trigger a metadata request and reopen the connection. The
producer and consumer both manually trigger metadata requests when necessary,
and the fact that we now have lazy connection opening means simply closing it
(which we do, see first bullet) is enough to cause the connection to reopen
the next time it is requested, even if no metadata refresh is requested.

In a subsequent PR, the last remaining call will be inlined and refactored. This will effectively fix the last non-trivial issue spawned from #15.

@Shopify/kafka

It is now only called from one place in the client. This looks simple but is actually super-subtle, and depends on lazy broker connections (PR #309). disconnectBroker does a whole bunch of different things: - calls `Close` on the broker connection - adds the address to the internal `deadBrokerAddrs` map even if the broker is not a seed, which I think is wrong, since resurrectDeadBrokers will use it to repopulate the seedBrokerAddrs list - rotates seedBrokers (if the broker was a seed broker) - removes it from the brokers map (otherwise) In the producer and consumer where we used to call disconnectBroker: - We now call `Close` directly on the broker. - The broker we are dealing with is not a seed broker, so the seedBrokers do not need rotating, and I don't think it's a problem that it no longer gets added to `deadBrokerAddrs`. - The reason we removed it from the broker map was so that the next request for that broker would trigger a metadata request and reopen the connection. The producer and consumer both manually trigger metadata requests when necessary, and the fact that we now have lazy connection opening means simply closing it (which we do, see first bullet) is enough to cause the connection to reopen the next time it is requested, even if no metadata refresh is requested.

eapache · 2015-03-13T14:25:57Z

The lazy connections have been merged so this is ready for review. The code is trivial (not much to review there), I'm more interested in somebody verifying that my logic is correct around deadBrokerAddrs, connection bouncing, etc.

As part of #309 this has already undergone some decent practical testing using the vagrant cluster and willem's stressproducer.

wvanbergen · 2015-03-13T14:38:48Z

Did we run any consumer tests by any chance?

eapache · 2015-03-13T14:53:22Z

Just ran some of those using your new console-consumer, but I'm not too concerned. The changes are symmetric so if the producer works fine then the consumer should (and does) also.

wvanbergen · 2015-03-13T17:14:55Z

I think the logic here is sound.

I don't think it's necessary to Close() the broker connection for every error we see in the producer or consumer, but this is one of these optimizations that are easy to fuck up.

What is the number of (producer) retries we potentially waste under relatively normal operations?

Produce requests fails (e.g. not leader for partition), so it asks for new metadata.
Metadata update request fails. This is already unlikely, because it happens only if it completely drains the Metadata retries as well, right?

eapache · 2015-03-13T17:39:17Z

I don't think it's necessary to Close() the broker connection for every error we see in the producer or consumer

You don't? If a broker gives us a network error or an unexpected protocol error then I don't see what choice we have besides disconnecting from that one and asking metdata for another one?

What is the number of (producer) retries we potentially waste under relatively normal operations?

No more than we ever have?

wvanbergen · 2015-03-13T19:13:43Z

I meant, not for every error. E.g. if we get a NotLeaderForPartition, we don;t technically have to disconnect. Or does that error never end up in abort?

Anyway, that's an optimization we don't really need.

wvanbergen · 2015-03-13T19:14:03Z

Remove external calls to disconnectBroker

eapache · 2015-03-13T19:16:24Z

I meant, not for every error. E.g. if we get a NotLeaderForPartition, we don;t technically have to disconnect. Or does that error never end up in abort?

abort cleans up the entire worker - the only errors that end up there are ones returned from broker.Fetch(), nothing at the protocol layer.

eapache force-pushed the die-disconnect-broker-die branch 3 times, most recently from f3953ca to ae411cd Compare March 12, 2015 21:34

eapache mentioned this pull request Mar 13, 2015

Lazily connect to brokers in the client #309

Merged

eapache force-pushed the die-disconnect-broker-die branch from ae411cd to a778a8e Compare March 13, 2015 14:22

wvanbergen added a commit that referenced this pull request Mar 13, 2015

Merge pull request #338 from Shopify/die-disconnect-broker-die

20f98a6

Remove external calls to disconnectBroker

wvanbergen merged commit 20f98a6 into master Mar 13, 2015

eapache deleted the die-disconnect-broker-die branch March 13, 2015 19:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove external calls to disconnectBroker #338

Remove external calls to disconnectBroker #338

eapache commented Mar 12, 2015

eapache commented Mar 13, 2015

wvanbergen commented Mar 13, 2015

eapache commented Mar 13, 2015

wvanbergen commented Mar 13, 2015

eapache commented Mar 13, 2015

wvanbergen commented Mar 13, 2015

wvanbergen commented Mar 13, 2015

eapache commented Mar 13, 2015

Remove external calls to disconnectBroker #338

Remove external calls to disconnectBroker #338

Conversation

eapache commented Mar 12, 2015

eapache commented Mar 13, 2015

wvanbergen commented Mar 13, 2015

eapache commented Mar 13, 2015

wvanbergen commented Mar 13, 2015

eapache commented Mar 13, 2015

wvanbergen commented Mar 13, 2015

wvanbergen commented Mar 13, 2015

eapache commented Mar 13, 2015