KIP-74: Add fetch response size limit and implement round-robin on client side #1812

nepal · 2016-09-01T17:01:52Z

This PR is implementation of KIP-74 which is originally motivated by KAFKA-2063.

Your comments are greatly appreciated.

ijuma · 2016-09-01T23:44:43Z

core/src/main/scala/kafka/server/KafkaConfig.scala

@@ -278,6 +279,7 @@ object KafkaConfig {
  val ReplicaFetchMaxBytesProp = "replica.fetch.max.bytes"


I wonder if we should deprecate this in favour of a replica.fetch.max.partition.bytes. It seems quite confusing that this one applies to a single partition when every other replica.fetch.* setting applies to the request. Maybe something for later.

gets my vote

It's tempting to do this now while we're still in the KIP window. We'll be less inclined to do it in the future if it requires a new KIP.

What you say is true, but I think it's too late to add this to the KIP as we don't have a great story for deprecating properties.

ijuma · 2016-09-02T01:04:55Z

Thanks for the PR @nepal. It looks to me like it's going in the right direction. Left some comments/questions and looking forward to the missing bits (when do you think you'll have them?). By the way, I would add one point to your to-do list: tests.

ijuma · 2016-09-02T12:07:53Z

clients/src/main/java/org/apache/kafka/common/requests/FetchRequest.java

+        this(replicaId, maxWait, minBytes, fetchData, ProtoUtils.latestVersion(ApiKeys.FETCH.id), maxBytes);
+    }
+
+    public FetchRequest(int replicaId, int maxWait, int minBytes, Map<TopicPartition, PartitionData> fetchData, int version, int maxBytes) {


I suggest making the version the first parameter.

Since the ordering is relevant here, would it be better to use LinkedHashMap in the signature?

Not sure whether it is good idea to accept only LinkedHashMap in constructor - it is used for all versions of FetchRequest. Maybe we should just add comment that FetchRequest will preserve the ordering of fetchData.

Using LinkedHashMap for all versions should be fine, right? The previous versions don't care about ordering, so either way is fine. Or am I missing something?

junrao · 2016-09-02T17:57:11Z

@nepal : Thanks for the patch. A couple of other comments.

Could you change the upgrade doc about the protocol change since we now require setting inter.broker.protocol for rolling upgrade to 0.10.1?
In seems that we need to change FetchResponse.scala to take into consideration of the ordering of the partitions.

ijuma · 2016-09-06T10:44:52Z

@nepal any idea when you will update the PR with the missing bits? Let us know if you need any help making progress on this. Thanks!

nepal · 2016-09-06T15:50:47Z

I am currently working on missing bits. I think it will take three days. Maybe we can do some intermediate version of this PR which can be accepted if I am blocking someone?

ijuma · 2016-09-06T15:55:01Z

Thanks for the update @nepal, 3 days is fine. Probably easier to get the one PR in since it's not too complicated.

hachikuji · 2016-09-06T18:23:20Z

clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java

+     * <code>fetch.max.bytes</code>
+     */
+    public static final String FETCH_MAX_BYTES_CONFIG = "fetch.max.bytes";
+    private static final String FETCH_MAX_BYTES_DOC = "The maximum amount of data the server should return for a fetch request. This is not an absolute maximum - if there is a single message which is larger than fetch.max.bytes, it will still be returned.";


Do you think it's worth mentioning here that the consumer may send multiple parallel fetches and that this setting only applies to each fetch individually? In a future KIP, it might make sense to add a max.in.flight.fetches setting so that users have better control over memory. I went ahead and opened a JIRA to collect comments on this: https://issues.apache.org/jira/browse/KAFKA-4133.

Another approach might be KIP-72 which provides a memory pool to control the memory footprint.

Oversized in the context of replication.

KAFKA-2063: Address Jun's comments, remove warnings and add producer test

…ch-response-size * apache/trunk: KAFKA-4093; Cluster Id (KIP-78) KAFKA-4173; SchemaProjector should successfully project missing Struct field when target field is optional KAFKA-4183; Corrected Kafka Connect's JSON Converter to properly convert from null to logical values KAFKA-3776: Unify store and downstream caching in streams

KAFKA-2063: Preserve behaviour of fetch requests version 2 and below and add tests

ijuma · 2016-09-17T13:29:12Z

This PR is complete, so @nepal, can you please remove the WIP from the PR title and update the PR description to be up to date? The PR title and description became the commit message of the squashed commit that gets merged.

junrao · 2016-09-17T16:57:13Z

core/src/main/scala/kafka/server/ReplicaManager.scala

@@ -25,15 +25,12 @@ import kafka.api._
 import kafka.cluster.{Partition, Replica}
 import kafka.common._
 import kafka.controller.KafkaController
-import kafka.log.{LogAppendInfo, LogManager}
+import kafka.log.{FileMessageSet, LogAppendInfo, LogManager}


unused import FileMessageSet

junrao · 2016-09-17T16:57:19Z

core/src/main/scala/kafka/server/FetchDataInfo.scala

@@ -19,4 +19,5 @@ package kafka.server

 import kafka.message.MessageSet

-case class FetchDataInfo(fetchOffsetMetadata: LogOffsetMetadata, messageSet: MessageSet)
+case class FetchDataInfo(fetchOffsetMetadata: LogOffsetMetadata, messageSet: MessageSet,
+                         messageSetIncomplete: Boolean = false)


To be clearer, messageSetIncomplete probably should be firstMessageSetIncomplete?

Agreed, this also came up in a conversation between Jason and myself and I was thinking of renaming it along the lines you suggest. :)

junrao · 2016-09-17T16:58:31Z

Thanks for the latest patch. Made a pass on the server side changes. They look good to me. Just a couple of minor comments, both can be addressed in a followup patch if needed.

junrao · 2016-09-17T17:18:31Z

core/src/main/scala/kafka/server/ReplicaFetcherThread.scala

        Runtime.getRuntime.halt(1)
    }
  }

-  def warnIfMessageOversized(messageSet: ByteBufferMessageSet, topicAndPartition: TopicAndPartition): Unit = {


This warning may still be useful during rolling upgrade?

I thought that it would fix itself once the rolling upgrade is over, but it's probably safer to keep it for now (as you said). Will add it back and add a comment that it only applies to older brokers.

hachikuji · 2016-09-17T17:29:00Z

clients/src/main/java/org/apache/kafka/common/internals/PartitionStates.java

+ * This class is a useful building block for doing fetch requests where topic partitions have to be rotated via
+ * round-robin to ensure fairness and some level of determinism given the existence of a limit on the fetch response
+ * size. Because the serialization of fetch requests is more efficient if all partitions for the same topic are grouped
+ * together, we do such grouping in the method `set`.


For followup: I think it's worth elaborating a little that this heuristic can diverge a little over time from the ideal as partitions are fetched in different orders and partition leadership changes.

hachikuji · 2016-09-17T17:36:14Z

clients/src/main/java/org/apache/kafka/common/requests/FetchRequest.java

+    /**
+     * Create a replica fetch request for the current version.
+     */
+    public static FetchRequest fromReplica(int replicaId, int maxWait, int minBytes, int maxBytes,


Static factory methods are totally the way to go for these objects. Having a name makes it much less error-prone and keeps us from having to jump through hoops to keep argument lists different.

Are OK with the other constructors remaining as they are for now?

Yeah, I'm fine with it. I was just noting that this is a pattern I'd like to see more of.

hachikuji · 2016-09-17T18:10:54Z

core/src/main/scala/kafka/api/FetchRequest.scala

  }
+
+  def shuffle(requestInfo: Seq[(TopicAndPartition, PartitionFetchInfo)]): Seq[(TopicAndPartition, PartitionFetchInfo)] =
+    random.shuffle(requestInfo)


Do you think it would be a little safer to shuffle by topic and by partition separately to keep the fetch request size consistent with older versions?

That's a good idea. I'll do this in a follow-up and include some tests.

hachikuji · 2016-09-17T18:37:04Z

core/src/main/scala/kafka/server/ReplicaManager.scala

+      } catch {
+        // NOTE: Failed fetch requests metric is not incremented for known exceptions since it
+        // is supposed to indicate un-expected failure of a broker in handling a fetch request
+        case utpe: UnknownTopicOrPartitionException =>


minor: Would it be worthwhile to combine some of these cases since they are all handled the same?

hachikuji · 2016-09-17T18:56:28Z

@nepal @ijuma Fantastic work on this KIP! I left a couple minor comments which can be addressed in a followup (if needed). Overall, LGTM.

…ch-response-size * apache/trunk: KAFKA-3492; Secure quotas for authenticated users

As per Jun's suggestion.

As per Jason's suggestion

… optimal over time

…t <= 2 Suggested by Jun

…ch-response-size * apache/trunk: MINOR: Update the README.md to include a note about GRADLE_USER_HOME KAFKA-4157; Transient system test failure in replica_verification_test.test_replica_lags HOTFIX: changed quickstart donwload from 0.10.0.0 to 0.10.0.1 HOTFIX: Increase timeout for bounce test

KAFKA-2063: Merge conflicts with trunk and address review feedback

ijuma · 2016-09-18T15:31:18Z

One failure in the system test run:

test_id: 2016-09-17--001.kafkatest.tests.connect.connect_distributed_test.ConnectDistributedTest.test_bounce.clean=True
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/549/console

It looks unrelated as we had the same failure in trunk recently (http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-09-17--001.1474133882--apache--trunk--ecc1fb1/report.html).

The Jenkins PR builder timed out after 180 minutes while running a Streams test, but all client and core tests passed (I've restarted the build). The tests passed locally.

This commit adds support for version 3 of the FetchRequest API. The KIP can be found here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-74%3A+Add+Fetch+Response+Size+Limit+in+Bytes the PR here: apache/kafka#1812 and the JIRA here: https://issues.apache.org/jira/browse/KAFKA-2063 Should document the fact that the per-partition limits take precedence (so the returned message may be larger than the requested limit).

ijuma reviewed Sep 1, 2016
View reviewed changes

ijuma reviewed Sep 2, 2016
View reviewed changes

nepal force-pushed the kip-74 branch from 4a03265 to cbcc814 Compare September 6, 2016 15:00

hachikuji reviewed Sep 6, 2016
View reviewed changes

ijuma and others added 7 commits September 16, 2016 13:22

Remove message size warnings from TopicCommand and ConfigCommand

4e7e604

Add test to verify that oversized messages produced with ack=all succeed

d206a59

Oversized in the context of replication.

Merge pull request #4 from ijuma/kafka-2063-bound-fetch-response-size

8f09fae

KAFKA-2063: Address Jun's comments, remove warnings and add producer test

Introduce FetchRequestTest

03694d0

Preserve behaviour of fetch request version 2 and add test to verify

7534e9e

Merge pull request #5 from ijuma/kafka-2063-bound-fetch-response-size

b75ee68

KAFKA-2063: Preserve behaviour of fetch requests version 2 and below and add tests

nepal changed the title ~~KIP-74: Add Fetch Response Size Limit in Bytes (KIP-74) [WIP]~~ KIP-74: Add fetch response size limit and implement round-robin on client side Sep 17, 2016

junrao reviewed Sep 17, 2016

View reviewed changes

hachikuji reviewed Sep 17, 2016

View reviewed changes

ijuma and others added 8 commits September 17, 2016 21:10

Merge remote-tracking branch 'apache/trunk' into kafka-2063-bound-fet…

4b9326c

…ch-response-size * apache/trunk: KAFKA-3492; Secure quotas for authenticated users

Rename messageSetIncomplete to firstMessageSetIncomplete

6f1df09

As per Jun's suggestion.

Group exception clauses since the result is the same

31e1589

As per Jason's suggestion

Mention that the way we group topics and partitions deviates from the…

bf6b4db

… optimal over time

Restore replication warning due to oversized messages if fetch reques…

3ba807b

…t <= 2 Suggested by Jun

Improve protocol description for fetch request version 3

2edf038

Merge pull request #6 from ijuma/kafka-2063-bound-fetch-response-size

3a790e7

KAFKA-2063: Merge conflicts with trunk and address review feedback

asfgit closed this in d04b099 Sep 18, 2016

jeffwidman mentioned this pull request Oct 4, 2017

KIP-74 / KAFKA-2063: max_partition_fetch_bytes should now be a soft limit dpkp/kafka-python#1232

Closed

guozhangwang mentioned this pull request Oct 25, 2019

KAFKA-9048 Pt1: Remove Unnecessary lookup in Fetch Building #7576

Merged

3 tasks

		@@ -278,6 +279,7 @@ object KafkaConfig {
		val ReplicaFetchMaxBytesProp = "replica.fetch.max.bytes"

KIP-74: Add fetch response size limit and implement round-robin on client side #1812

KIP-74: Add fetch response size limit and implement round-robin on client side #1812

Uh oh!

Conversation

nepal commented Sep 1, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ijuma commented Sep 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrao commented Sep 2, 2016

Uh oh!

ijuma commented Sep 6, 2016

Uh oh!

nepal commented Sep 6, 2016

Uh oh!

ijuma commented Sep 6, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hachikuji Sep 6, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ijuma commented Sep 17, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrao commented Sep 17, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hachikuji commented Sep 17, 2016

Uh oh!

ijuma commented Sep 18, 2016

Uh oh!

Uh oh!

nepal commented Sep 1, 2016 •

edited

Loading

ijuma commented Sep 2, 2016 •

edited

Loading

ijuma commented Sep 6, 2016 •

edited

Loading

hachikuji Sep 6, 2016 •

edited

Loading