fix: avoid starvation in subscriptionManager #2109

dnwe · 2022-01-12T14:26:29Z

The first few fetches from Kafka may only fetch data from one or two partitions, starving the rest for a very long time (depending on message size / processing time)

Once a member joins the consumer groups and receives its partitions, they are fed into the "subscription manager" from different go routines. The subscription manager then performs batching and executes a fetch for all the partitions. However, it seems like the batching logic in subscriptionManager is faulty, perhaps assuming that case: order prioritizes which case should be handled when all are signaled which is not the case, according to the go docs (https://golang.org/ref/spec#Select_statements):

If one or more of the communications can proceed, a single one that can proceed is chosen via a uniform pseudo-random selection. Otherwise, if there is a default case, that case is chosen. If there is no default case, the "select" statement blocks until at least one of the communications can proceed.

For example - if you receive 64 partitions, each will be handled in their own go routine which sends the partition information to the bc.input channel. After an iteration there is a race between case event, ok := <-bc.input which will batch the request and case bc.newSubscriptions <- buffer which will trigger an immediate fetch of the 1 or 2 partitions that made it into the batch.

This issue only really affects slow consumers with short messages. If the condition happens with 1 partition being in the batch (even though 63 extra partitions have been claimed but didn't make it into the batch) the fetch will ask for 1MB (by default) of messages from that single partition. If the messages are only a few bytes long and processing time is minutes, you will not perform another fetch for hours.

Contributes-to: #1608 #1897

dnwe · 2022-01-12T14:27:12Z

@pavius I extracted this change out of your original issue and fork so we can test and review it in isolation — thanks for your work on this, please could you sign the CLA?

dnwe · 2022-01-13T07:45:14Z

consumer.go

@@ -752,7 +752,7 @@ func (c *consumer) newBrokerConsumer(broker *Broker) *brokerConsumer {
 	bc := &brokerConsumer{
 		consumer:         c,
 		broker:           broker,
-		input:            make(chan *partitionConsumer),
+		input:            make(chan *partitionConsumer, 4096),


@pavius can you explain the switch to unbuffered channel w/ size 4096 here?

This was quite a while ago, so I can only guess that I wanted to reduce the chance writers would block trying to write to the channel.

dnwe · 2022-01-31T23:23:28Z

Hmm. I've been trying to debug the failing test on this one, and after fixing up the test to be a bit more event driven I do seem to get to a state of deadlock in the consumer where subscriptionManager is calling bc.wait <- none{} but nothing is receiving from that channel because subscriptionConsumer is stuck trying to recv on the range loop for newSubscriptions := range bc.newSubscriptions

The first few fetches from Kafka may only fetch data from one or two partitions, starving the rest for a very long time (depending on message size / processing time) Once a member joins the consumer groups and receives its partitions, they are fed into the "subscription manager" from different go routines. The subscription manager then performs batching and executes a fetch for all the partitions. However, it seems like the batching logic in `subscriptionManager` is faulty, perhaps assuming that `case:` order prioritizes which `case` should be handled when all are signaled which is not the case, according to the go docs (https://golang.org/ref/spec#Select_statements): ``` If one or more of the communications can proceed, a single one that can proceed is chosen via a uniform pseudo-random selection. Otherwise, if there is a default case, that case is chosen. If there is no default case, the "select" statement blocks until at least one of the communications can proceed. ``` For example - if you receive 64 partitions, each will be handled in their own go routine which sends the partition information to the `bc.input` channel. After an iteration there is a race between `case event, ok := <-bc.input` which will batch the request and `case bc.newSubscriptions <- buffer` which will trigger an immediate fetch of the 1 or 2 partitions that made it into the batch. This issue only really affects slow consumers with short messages. If the condition happens with 1 partition being in the batch (even though 63 extra partitions have been claimed but didn't make it into the batch) the fetch will ask for 1MB (by default) of messages from that single partition. If the messages are only a few bytes long and processing time is minutes, you will not perform another fetch for hours. Contributes-to: #1608 #1897 Co-authored-by: Dominic Evans <dominic.evans@uk.ibm.com>

dnwe requested a review from bai as a code owner January 12, 2022 14:26

ghost added the cla-needed label Jan 12, 2022

dnwe mentioned this pull request Jan 12, 2022

Solve the slow consumer, subscription problem #1899

Closed

2 tasks

dnwe commented Jan 13, 2022

View reviewed changes

ghost removed the cla-needed label Jan 21, 2022

dnwe force-pushed the dnwe/subscription-starvation branch from 5770556 to 3483f0f Compare January 31, 2022 15:36

dnwe force-pushed the dnwe/subscription-starvation branch 4 times, most recently from 2e30c2c to c59477d Compare February 25, 2022 14:05

dnwe force-pushed the dnwe/subscription-starvation branch from a2a3ee4 to dadcd80 Compare February 25, 2022 15:34

dnwe added the fix label Feb 25, 2022

dnwe merged commit 0ad6651 into main Feb 25, 2022

dnwe deleted the dnwe/subscription-starvation branch February 25, 2022 16:04

pavius mentioned this pull request Mar 27, 2022

Fix deadlock introduced in #2109 #2196

Closed

This was referenced May 11, 2022

Update module github.com/Shopify/sarama to v1.33.0 shift/vflow#33

Merged

fix(deps): update module github.com/shopify/sarama to v1.33.0 shortlink-org/shortlink#3960

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: avoid starvation in subscriptionManager #2109

fix: avoid starvation in subscriptionManager #2109

dnwe commented Jan 12, 2022

dnwe commented Jan 12, 2022

dnwe Jan 13, 2022

pavius Jan 21, 2022

dnwe commented Jan 31, 2022

fix: avoid starvation in subscriptionManager #2109

fix: avoid starvation in subscriptionManager #2109

Conversation

dnwe commented Jan 12, 2022

dnwe commented Jan 12, 2022

dnwe Jan 13, 2022

Choose a reason for hiding this comment

pavius Jan 21, 2022

Choose a reason for hiding this comment

dnwe commented Jan 31, 2022