Producer refactor #544

eapache · 2015-09-27T02:33:04Z

This is an experimental refactor of the entire aggregator/flusher pipeline. If successful it would close #433 as well as properly solve the issue papered over by #538.

I did a decent job of making each commit stand-alone and self-explanatory, so you can read commit-at-a-time, or just treat it as entirely new code. The total diff against the old code is impossible to follow.

High-level, this is what I did:

Introduce a produceSet structure which collects messages, rejecting those which fail to encode and incrementally constructing a ProduceRequest out of those which succeed.
Keep a running produceSet in the aggregator instead of just a []*ProducerMessage. Move all the "how many bytes/messages in this batch so far?" logic into the produceSet.
Merge the flusher and the aggregator into a single brokerProducer. Revive the flusher as a trivial goroutine whose sole purpose is to let the aggregator select on the result of its network calls.
Clean up a bunch of now-dead code.

This new layout has a number of benefits:

Messages live solely in the produceSet, which makes them much easier to track vs. having them spread across a slice, a map, and a long-lived request object.
Messages only even enter the produceSet when they encode properly, which means that a whole raft of logic can go away around handling requests which encoded successfully but contained no successful messages. This also means that we no longer need to nil out array entries for failed messages, getting rid of another bunch of ugly logic.
Retry state is handled message-at-a-time as the messages enter the brokerProducer, rather than batch-at-a-time as they enter the flusher. This is much easier to follow, cleans up a few hacks that were necessary for the aggregator to correctly pass along flagged messages, and makes for a sane state machine that clearly terminates in all cases.
The produceSet can track how many bytes it has accumulated in each individual partition. It also has access to the actual encoded length, not just the estimate provided by the Encoder interface. This makes it more precise, and lets it accumulate much larger batches safely. In particular, this lets it accumulate requests larger than MaxMessageSize (usually 1MB) when using compression.
It is now substantially easier to add support for request pipelining if that project ever gets off the ground on the server side.

@wvanbergen
cc @kvs @andremedeiros @bai @cep21

P.S. I don't want to make any claims about performance without measurements to back them up, but in theory this is a performance win too: we do fewer passes over each message, we generate less garbage for the collector, and we do more work without needing to context-switch.

Useless abstraction right this instant, but will be very useful soon.

Add an eachPartition to the produceSet

Now that nobody uses `batch` after the set is constructed, we no longer have to nil out its entries when we return errors.

Temporary breakage ensues.

Get rid of keyCache and valueCache

This simplifies a lot of things because we can filter out messages before we batch them (except when they fail to encode).

Clean up logging

This moves key and value encoding earlier in the process without the need for the old keyCache/valueCache.

State management is now sane, we don't have to flush just because we see a chaser message.

We never build the request now unless we have messages, which means the request can never be nil, which means the input into the errors channel can never be nil.

That was a bad pattern because it could easily have gotten out-of-sync with what was actually sent to the flusher due to subtle bugs. Instead, have the flusher pass back the set it sent alongside the response. This forces us to use a struct, which lets us combine the `responses` and `errors` channels too.

The problem is that receiving a response can change the retry state of a partition (or the entire broker). This potentially requires us to retry any messages we'd had buffered for that partition, which changes the size of the produceSet we'd accumulated, potentially even making it empty again. Things are complicated further by the fact that we must be prepared to handle responses while forcing blocking flushes, midway through the message processing. This does a quick fix, but I'm not really happy with it: - Flush the buffer in the appropriate response/error cases. - Move the blocking flush to the beginning of the message processing so any state changes it causes are picked up correctly for that message. - When we handle a response which leaves our buffer empty, don't try and send it. This isn't perfect in that a response which shrinks our buffer below the flush threshold may still end up getting flushed, but at least we won't flush a buffer that's entirely empty.

eapache · 2015-09-27T14:08:27Z

Added the requested comment, cleaned up the (mis)-use of the pending variable, and then ran into a much bigger problem (third new commit) that I'm not particularly comfortable with yet. TBD.

Turn the "if wouldOverflow then flush" logic into a more complete `waitForSpace` method which handles all of that logic including re-checking error states and buffer sizes as necessary. Extract the "when should we be setting `output`" logic into the end of every single iteration. This requires tracking an additional `timerFired` instance variable on the `brokerProducer`, but is much easier to follow and allows us to be more precise when e.g. a response flushes part but not all of our buffer. Extract the "if this is a chaser, change state" logic out of `retryReason` and rename it to `needsRetry`. This makes it a pure function and avoids unnecessary work in `waitForSpace`.

eapache · 2015-09-27T16:21:33Z

OK, another refactor and I think I'm happy with this again. This most recent commit ended up solving another problem with the version currently in master: if a response is received which invalidates some of the messages we have buffered to the point where we no longer meet our flush conditions, we now stop trying to flush until the conditions are re-met.

eapache · 2015-09-27T19:35:37Z

Worth noting: this morning's rework has partially lost us benefit point 3 (retry state being handled message-at-a-time) since we have to handle responses arriving when a batch is already collected. I think the only way around that would be to not collect batches while a response is pending, which is of course unacceptably slow in the common success case.

eapache · 2015-09-30T15:40:06Z

I am going to extract this into several stand-alone chunks which should be easier to reason about and review.

First stand-alone chunk extracted from #544. This introduces and uses a produceSet structure which takes care of collecting messages, rejecting ones which don't encode, and turning the rest into a `ProduceRequest`. This has several knock-on effects: - we no longer nil out array entries, so nil checks in several places (e.g. `returnErrors`) can be removed - parseResponse gets re-indented since it loops over the partitions in the set via a callback now rather than a pair of nested loops - groupAndFilter is much simpler, a lot of its logic now lives in the produceSet - the flusher has to use the set to return errors/successes/retries, rather than `batch` which may now contain messages not in the eventual request - keyCache and valueCache can be removed

Second stand-alone chunk extracted from #544, (first chunk: #549). This uses the `produceSet` struct in the aggregator as well, and moves the `wouldOverflow` and `readyToFlush` methods to methods on the `produceSet`. Knock-on effects: - now that we do per-partition size tracking in the aggregator we can do much more precise overflow checking (see the compressed-message-batch-size-limit case in `wouldOverflow` which has changed) which will be more efficient in high-volume scenarios - since the produceSet encodes immediately, messages which fail to encode are now rejected from the aggregator and don't count towards batch size - we still have to iterate the messages in the flusher in order to reject those which need retrying due to the state machine; for simplicity I add them to a second produceSet still, which means all messages get encoded twice; this is a definite major performance regression which will go away again in part 3 of this refactor

eapache added 24 commits September 26, 2015 19:36

Introduce the concept of a produceSet

4fec243

Useless abstraction right this instant, but will be very useful soon.

Reorder some functions

ab96424

Simplify parseResponse into parseBlock

54d226d

Add an eachPartition to the produceSet

Use eachPartition more

a5d3f67

Now that nobody uses `batch` after the set is constructed, we no longer have to nil out its entries when we return errors.

Construct the produceSet in the aggregator

e7b10b0

Temporary breakage ensues.

Move byte and msg-count tracking into the set

3d9d041

Move key and value encoding back to buildRequest

f39d3be

Get rid of keyCache and valueCache

Fix offsets

3405614

Communicate responses back to the aggregator

b72df33

Move closing and currentRetry logic into aggregator

ef50cfe

This simplifies a lot of things because we can filter out messages before we batch them (except when they fail to encode).

Rename aggregator to brokerProducer

dc5a714

Clean up logging

Much needed cleanup

64ae474

Yet another useless (for now) abstraction

2fd7980

Construct message-sets as we go

72a1f9e

This moves key and value encoding earlier in the process without the need for the old keyCache/valueCache.

Remove useless flush trigger

4f2a39c

State management is now sane, we don't have to flush just because we see a chaser message.

Don't set the timer if we couldn't add the message

54716f8

Enable per-partition compressed size limits

02e6f7c

Nil checks are now unnecessary

50a7916

Remove more dead code

ac14315

We never build the request now unless we have messages, which means the request can never be nil, which means the input into the errors channel can never be nil.

Remove dead parameter

3a54e97

Misc. fixes and cleanup

8a9748b

Comment to explain kafka's weird compression handling

7d7bf29

eapache force-pushed the producer-refactor branch from 1480a1f to 5fc13db Compare September 27, 2015 16:12

Destructure the flusher, it does nothing complex

78bec97

eapache closed this Sep 30, 2015

This was referenced Sep 30, 2015

Refactor the producer, part 1 #549

Merged

Refactor the producer, part 2 #550

Merged

eapache deleted the producer-refactor branch October 16, 2015 14:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Producer refactor #544

Producer refactor #544

eapache commented Sep 27, 2015

eapache commented Sep 27, 2015

eapache commented Sep 27, 2015

eapache commented Sep 27, 2015

eapache commented Sep 30, 2015

Producer refactor #544

Producer refactor #544

Conversation

eapache commented Sep 27, 2015

eapache commented Sep 27, 2015

eapache commented Sep 27, 2015

eapache commented Sep 27, 2015

eapache commented Sep 30, 2015