KAFKA-14020: Performance regression in Producer by artemlivshits · Pull Request #12365 · apache/kafka

artemlivshits · 2022-06-29T20:56:05Z

As part of KAFKA-10888 work, there were a couple regressions introduced:

A call to time.milliseconds() got moved under the queue lock, moving it back outside the lock. The call may be expensive and cause lock contention. Now the call is moved back outside of the lock.
The reference to ProducerRecord was held in the batch completion callback, so it was kept alive as long as the batch was alive, which may increase the amount of memory in certain scenario and cause excessive GC work. Now the reference is reset early, so the ProducerRecord lifetime isn't bound to the batch lifetime.

Tested via manually crafted benchmark, lock profile shows ~15% lock contention on the ArrayQueue lock without the fix and ~5% lock contention with the fix (which is also consistent with pre-KAFKA-10888 profile).

Alloc profile shows ~10% spent in ProducerBatch.completeFutureAndFireCallbacks without the fix vs. ~0.25% with the fix (which is also consistent with pre-KAFKA-10888 profile).

Will add a proper jmh benchmark for producer (looks like we don't have one) in a follow-up change.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

As part of KAFKA-10888 work a call to time.milliseconds() got moved under the queue lock, moving it back outside the lock. The call may be expensive and cause lock contention.

junrao · 2022-06-30T01:16:52Z

@artemlivshits : Thanks for the PR. Does this resolve all the perf regression reported in #12342 ?

artemlivshits · 2022-06-30T01:57:20Z

Haven't checked the streams benchmark yet. But it is a regression that is visible in the lock profile, so from that perspective seems to be a net positive.

divijvaidya · 2022-07-04T09:56:37Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

+                    // Update the current time in case the buffer allocation blocked above.
+                    // NOTE: getting time may be expensive, so calling it under a lock
+                    // should be avoided.
+                    nowMs = time.milliseconds();


This change has some other side-effects to the code logic:

In case of retries (multiple iteration of while loop) when buffer allocation blocks, prior to this change tryAppend on line 283 was being called with older nowMs. After this change it's a more recent time. This is a positive change.

In case of retries, when buffer allocation does not occur, prior to this change the time was computed inside appendNewBatch hence, was guaranteed to the latest. After this change, there might be threads blocked on synchronized or the time consumed by previous retry isn't factored in the nowMs being passed to appendNewBatch. Hence, nowMs being passed to appendNewBatch might be stale by some amount (depending on how long threads were waiting to acquire the block). Is that acceptable?

@divijvaidya : For 2, before KAFKA-10888, nowMs is also computed before synchronized. So, it has the same behavior as this PR.

Looking at the code, I am not sure if nowMs is strictly needed. nowMs is used to populate ProducerBatch.lastAppendTime. However, since KAFKA-5886, expiration is based on createTime and not on lastAppendTime. lastAppendTime is only used to upper bound lastAttemptMs. This may not be needed. @hachikuji : Could we just get right of ProducerBatch.lastAppendTime?

Did we reach a conclusion regarding this?

I synced up with Jun offline, in this change it makes sense to preserve the current behavior (too close to release).

Do we want to file a JIRA for changing this for the next release?

Filed KAFKA-14083.

As part of KAFKA-10888 work the reference to ProducerRecord was held in the batch completion callback, so it was kept alive as long as the batch was alive, which may increase the amount of memory in certain scenario and cause excessive GC work. Now the reference is reset early, so the ProducerRecord lifetime isn't bound to the batch lifetime.

vvcephei · 2022-07-12T21:07:01Z

Hey @artemlivshits , I just re-ran the Streams benchmarks that originally found the regression. It looks like it's resolved, as of your latest commit!

As a reminder, this was the baseline for "good" performance:
Commit: e3202b9 (the parent of the problematic commit)
TPut: 118k±1k

And when I ran the same benchmark on 3a6500b , I got:
TPut: 117k±1k

junrao

@vvcephei : Thanks for the update.

@artemlivshits : Thanks for the updated PR. Just one minor comment.

junrao · 2022-07-12T22:34:42Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

        private final Callback userCallback;
        private final ProducerInterceptors<K, V> interceptors;
-        private final ProducerRecord<K, V> record;
+        private ProducerRecord<K, V> record;


Should we make record volatile since it's being updated and read by different threads?

Will do. And partition then should be volatile as well, then.

Fix test failures.

ijuma · 2022-07-13T07:36:00Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

            this.record = record;
+            // Note a record would be null only if the client application has a bug, but we don't want to
+            // have NPE here, because the interceptors would not be notified (see .doSend).
+            topic = record != null ? record.topic() : null;


Can you elaborate on this? What kind of application bug would surface itself in a silent way like this?

https://github.com/apache/kafka/blob/trunk/core/src/test/scala/integration/kafka/api/PlaintextConsumerTest.scala#L1041 has a test, which effectively codifies the contract. I would agree that it's weird to have contract about null handling, but at this point I'd rather preserve whatever behavior is codified.

I don't think this answered my question. What application bug would result in this?

Would passing null record not be a bug? I've changed the comment to not mention that it would be a bug.

If a user passes in a null record in send(), we will be throwing a NullPointException somewhere. So, we probably could just throw an exception early in that case without going through the callback and fix the test accordingly. We probably could do that in a separate PR in trunk.

I looked at the test and it seems to check that an exception is thrown? As @junrao said, this can be done by validating what send receives instead of polluting the whole codebase. I'm OK if we file a JIRA for that and do it as a separate PR. But we should remove this code when we do that.

It checks that the exception is thrown and then it checks that interceptors are called. Probably the test is just sloppy and could use a different error condition. KAFKA-14086

ijuma · 2022-07-13T07:36:48Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

-                    ? ProducerInterceptors.extractTopicPartition(record)
-                    : new TopicPartition(record.topic(), partition);
+            if (partition != RecordMetadata.UNKNOWN_PARTITION)
+                return new TopicPartition(topic, partition);


It's a bit surprising to allocate every time a method like this is called. Can we not allocate the topic partition once and reuse it?

The topicPartition is called once in success case (maybe twice in error case). I'll add a comment.

Even so, the way this method is used can change over time. And then you end up with a lot of unexpected allocation. The way I suggested is more robust.

ijuma · 2022-07-13T07:38:30Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

            }
+
+            // Reset record to null here so that it doesn't have to be alive as long as the batch is.
+            record = null;


It's a bit surprising that a method called setPartition resets the record. Maybe we can make the method name clearer. It would also be useful for the comment to state why we no longer need the record after this.

The method overrides a callback that is called setPartition to reflect what the caller does with it (it sets partition). I agree that it's a little non-intuitive to do a state transition here but there doesn't seem to be a better place to do it if we want to preserve the behavior -- we need record until here to do the tracing and we cannot do tracing earlier because we may not know the partition; at the same time, we don't want to keep it longer.

We can add another callback method since this is an internal interface, right? This kind of thing leads to a lot of maintenance pain down the line.

I think it would be non-intuitive to control record lifetime from the RecordAccumulator.append (that calls the callback) -- here we know that we don't need the record once partition is set, but the RecordAccumulator.append doesn't know it (in fact, it doesn't even know what we have the record). But I can add change it if you think this would make it easier to understand.

Thanks for the explanation. Is there a possibility the trace could be done in the caller context? Or is that missing some of the required information?

It's missing the partition info. Previously, partition was calculated before doing RecordAccumulator.append so we could do tracing in the doSend, but now the partition may be calculated at the beginning of RecordAccumulator.append, so tracing needs to happen after it's known, but before the actual append proceeds.

I find that the code complexity to achieve this trace logging is a bit high. I have some ideas on how to improve it, but we can leave that for latter. A simple suggestion for now would be to change setPartition to onPartitionAssigned or something like that. This would indicate a general callback that can do anything once the partition is known.

Since the only info we need from record is record.partition(), could we keep record.partition() in the instance instead of the whole record? Since record.partition() is much smaller, maybe there is no need to nullify it in setPartition()?

Updated to extract all record info in the constructor.

ijuma · 2022-07-13T07:39:50Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

@@ -1491,20 +1495,25 @@ public void setPartition(int partition) {

            if (log.isTraceEnabled()) {


A couple of lines above we use a language level assert. In Kafka, we typically use assert like methods like the Objects class since the language level asserts are disabled by default.

The intention of an assert is to run in tests, but be disabled in prod, so if my understanding is correct, this is the proper usage.

That's not considered best practice in Java and we don't typically do it in Kafka. What would be the reason to run it in tests only?

All my previous life I was using asserts extensively in C/C++, they provide both validation and contract documentation. They do redundant validation in builds that are used in system tests without adding perf cost in prod. I can remove it, if it's not compatible with style, though I don't think this is just style -- using asserts makes a material difference in early bug detection and in code comprehension.

It seems we have been mostly using Objects.requireNonNull for null assertion in our code. It doesn't seem to add too much overhead and helps identify issues in production early on. For consistency, perhaps we could use Objects.requireNonNull instead of assert.

@ijuma : What do you recommend that we use for assertions like assert partitionInfo == stickyPartitionInfo.get()?

The point is that you can run these checks in prod without measurable cost. Then why limit it to tests?

Correct @junrao, Objects.requireNonNull would be the recommended way to assert non null. The reference equality check is less common, we could add our own utility method in Utils for that or inline it.

The main thing is to get the appropriate signal if this happens in prod when the cost is low (both examples would be in that category).

Ok, so what's the suggestion in this change? Should I leave the code as is or remove assert? Creating a new utility seems to be out of scope for this change. We could have an offline discussion about asserts, I would be happy to see them used more often in Kafka.

For the line in this method, you could do something like:

if (partition < 0) throw new IllegalArgumentException("partition should be positive, but it was " + partition):

Which is more informative and idiomatic and checks the more general case that we expect partitions to be positive. But I see that we have sprinkled the same check in other methods. So, having a assertPartitionIsPositive would probably be a better approach. In any case, since this code was introduced in a different change, we can file a JIRA and do it as a separate PR.

I am happy to discuss more, but we should be clear about terminology. Language level asserts in Java aren't used much. Checking preconditions through API boundaries is useful. Within a given boundary, it's best to use the type system to avoid having noise all over the code.

KAFKA-14085

ijuma · 2022-07-13T07:40:50Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

+                    // Update the current time in case the buffer allocation blocked above.
+                    // NOTE: getting time may be expensive, so calling it under a lock
+                    // should be avoided.
+                    nowMs = time.milliseconds();


Did we reach a conclusion regarding this?

ijuma

Thanks for the PR. A few comments below.

Address review comments.

artemlivshits · 2022-07-15T22:31:39Z

@junrao, @ijuma thank you for reviews, I've updated the PR.

junrao

@artemlivshits : Thanks for the updated PR. LGTM. Waiting for the tests to pass.

junrao · 2022-07-15T22:48:39Z

@jsancio : We plan to cherry-pick this to 3.3 branch since this fixes a performance issue in KAFKA-10888.

Address review comments.

artemlivshits

Addressed @ijuma comments.

artemlivshits · 2022-07-18T18:57:50Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

            this.record = record;
+            // Note a record would be null only if the client application has a bug, but we don't want to
+            // have NPE here, because the interceptors would not be notified (see .doSend).
+            topic = record != null ? record.topic() : null;


Would passing null record not be a bug? I've changed the comment to not mention that it would be a bug.

artemlivshits · 2022-07-18T19:06:24Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

@@ -1491,20 +1495,25 @@ public void setPartition(int partition) {

            if (log.isTraceEnabled()) {


All my previous life I was using asserts extensively in C/C++, they provide both validation and contract documentation. They do redundant validation in builds that are used in system tests without adding perf cost in prod. I can remove it, if it's not compatible with style, though I don't think this is just style -- using asserts makes a material difference in early bug detection and in code comprehension.

artemlivshits · 2022-07-18T19:09:39Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

            }
+
+            // Reset record to null here so that it doesn't have to be alive as long as the batch is.
+            record = null;


I think it would be non-intuitive to control record lifetime from the RecordAccumulator.append (that calls the callback) -- here we know that we don't need the record once partition is set, but the RecordAccumulator.append doesn't know it (in fact, it doesn't even know what we have the record). But I can add change it if you think this would make it easier to understand.

artemlivshits · 2022-07-18T19:10:58Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

-                    ? ProducerInterceptors.extractTopicPartition(record)
-                    : new TopicPartition(record.topic(), partition);
+            if (partition != RecordMetadata.UNKNOWN_PARTITION)
+                return new TopicPartition(topic, partition);


artemlivshits · 2022-07-18T19:53:24Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java

+                    // Update the current time in case the buffer allocation blocked above.
+                    // NOTE: getting time may be expensive, so calling it under a lock
+                    // should be avoided.
+                    nowMs = time.milliseconds();


Filed KAFKA-14083.

junrao

@artemlivshits : Thanks for the updated PR. A couple of more comments.

junrao · 2022-07-18T21:00:29Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

            this.record = record;
+            // Note a record would be null only if the client application has a bug, but we don't want to
+            // have NPE here, because the interceptors would not be notified (see .doSend).
+            topic = record != null ? record.topic() : null;


If a user passes in a null record in send(), we will be throwing a NullPointException somewhere. So, we probably could just throw an exception early in that case without going through the callback and fix the test accordingly. We probably could do that in a separate PR in trunk.

junrao · 2022-07-18T21:24:25Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

@@ -1491,20 +1495,25 @@ public void setPartition(int partition) {

            if (log.isTraceEnabled()) {


It seems we have been mostly using Objects.requireNonNull for null assertion in our code. It doesn't seem to add too much overhead and helps identify issues in production early on. For consistency, perhaps we could use Objects.requireNonNull instead of assert.

@ijuma : What do you recommend that we use for assertions like assert partitionInfo == stickyPartitionInfo.get()?

Address review comments.

artemlivshits

Thank you for review. @junrao, @ijuma -- I've addressed the comments.

artemlivshits · 2022-07-19T16:25:36Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

            this.record = record;
+            // Note a record would be null only if the client application has a bug, but we don't want to
+            // have NPE here, because the interceptors would not be notified (see .doSend).
+            topic = record != null ? record.topic() : null;


It checks that the exception is thrown and then it checks that interceptors are called. Probably the test is just sloppy and could use a different error condition. KAFKA-14086

artemlivshits · 2022-07-19T16:45:18Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

@@ -1491,20 +1495,25 @@ public void setPartition(int partition) {

            if (log.isTraceEnabled()) {


KAFKA-14085

artemlivshits · 2022-07-19T16:51:01Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

            }
+
+            // Reset record to null here so that it doesn't have to be alive as long as the batch is.
+            record = null;


Updated to extract all record info in the constructor.

junrao

@artemlivshits : Thanks for the updated PR. Just a minor comment.

junrao · 2022-07-19T17:23:07Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

            this.interceptors = interceptors;
-            this.record = record;
+            // Extract record info as we don't want to keep a reference to the record during
+            // whole lifetime of the batch.


Could we move these two lines to the immediate line before where we set recordPartition?

I think it applies to all 3 fields: topic, recordPartition and recordLogString - we extract all this info from the record, so the comment is before we do that (in the PR it's kind of hard to see because of the inline discussion). Let me know if you think otherwise.

Thanks for the explanation. Sounds good.

jsancio · 2022-07-19T17:25:34Z

@jsancio : We plan to cherry-pick this to 3.3 branch since this fixes a performance issue in KAFKA-10888.

Sounds good @junrao . I set the fix version for KAFKA-14020 to 3.3.0.

junrao

@artemlivshits : Thanks for the explanation. LGTM. Waiting for the tests to pass.

junrao · 2022-07-19T18:09:15Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

            this.interceptors = interceptors;
-            this.record = record;
+            // Extract record info as we don't want to keep a reference to the record during
+            // whole lifetime of the batch.


Thanks for the explanation. Sounds good.

junrao · 2022-07-19T23:52:31Z

@artemlivshits : Are the test failures related to the PR?

@ijuma : Do you have any other comments?

Fix test break.

artemlivshits · 2022-07-20T01:42:54Z

Are the test failures related to the PR?

Yes, just pushed the fix.

artemlivshits · 2022-07-20T04:11:02Z

Looked at the failed tests, seem unrelated and pass locally.

ijuma · 2022-07-20T07:48:19Z

@junrao the updated version looks good to me. Thanks @artemlivshits for the patience and iterations.

junrao

@artemlivshits : Thanks for the latest PR. LGTM

As part of KAFKA-10888 work, there were a couple regressions introduced: A call to time.milliseconds() got moved under the queue lock, moving it back outside the lock. The call may be expensive and cause lock contention. Now the call is moved back outside of the lock. The reference to ProducerRecord was held in the batch completion callback, so it was kept alive as long as the batch was alive, which may increase the amount of memory in certain scenario and cause excessive GC work. Now the reference is reset early, so the ProducerRecord lifetime isn't bound to the batch lifetime. Tested via manually crafted benchmark, lock profile shows ~15% lock contention on the ArrayQueue lock without the fix and ~5% lock contention with the fix (which is also consistent with pre-KAFKA-10888 profile). Alloc profile shows ~10% spent in ProducerBatch.completeFutureAndFireCallbacks without the fix vs. ~0.25% with the fix (which is also consistent with pre-KAFKA-10888 profile). Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>

junrao · 2022-07-20T15:30:36Z

cherry-picked the PR to 3.3.

etolbakov · 2022-07-21T12:41:49Z

Hello Artem @artemlivshits,

I was studying the PR/KAFKA-14020 ticket and decided to share a minor observation.
It seems that extractTopicPartition method from ProducerInterceptors could be turned in the private (or even inlined).

kafka/clients/src/main/java/org/apache/kafka/clients/producer/internals/ProducerInterceptors.java

Line 125 in badfbac

    
           public static <K, V> TopicPartition extractTopicPartition(ProducerRecord<K, V> record) {

Happy to make a PR in case you see it reasonable.

Regards, Eugene

artemlivshits · 2022-07-22T00:10:44Z

Hi @etolbakov, making this method private sounds reasonable to me. Thank you for suggestion.

KAFKA-14020: Performance regression in Producer

90f46b6

As part of KAFKA-10888 work a call to time.milliseconds() got moved under the queue lock, moving it back outside the lock. The call may be expensive and cause lock contention.

artemlivshits mentioned this pull request Jun 30, 2022

KAFKA-10888: Sticky partition leads to uneven produce msg #12049

Merged

3 tasks

divijvaidya reviewed Jul 4, 2022

View reviewed changes

vvcephei mentioned this pull request Jul 12, 2022

KAFKA-10888: Revert #12049 f7db603 #12342

Closed

3 tasks

junrao reviewed Jul 12, 2022

View reviewed changes

KAFKA-14020: Performance regression in Producer

dd06044

Fix test failures.

ijuma reviewed Jul 13, 2022

View reviewed changes

KAFKA-14020: Performance regression in Producer

8a69729

Address review comments.

junrao approved these changes Jul 15, 2022

View reviewed changes

KAFKA-14020: Performance regression in Producer

7f3b046

Address review comments.

artemlivshits commented Jul 18, 2022

View reviewed changes

junrao reviewed Jul 18, 2022

View reviewed changes

KAFKA-14020: Performance regression in Producer

78ce7ba

Address review comments.

artemlivshits commented Jul 19, 2022

View reviewed changes

junrao reviewed Jul 19, 2022

View reviewed changes

junrao approved these changes Jul 19, 2022

View reviewed changes

KAFKA-14020: Performance regression in Producer

581c46d

Fix test break.

junrao approved these changes Jul 20, 2022

View reviewed changes

junrao merged commit badfbac into apache:trunk Jul 20, 2022

artemlivshits deleted the KAFKA-14020 branch July 20, 2022 16:15

etolbakov mentioned this pull request Jul 22, 2022

KAFKA-14020: change method visibility #12430

Closed

		@@ -1491,20 +1495,25 @@ public void setPartition(int partition) {

		if (log.isTraceEnabled()) {

Conversation

artemlivshits commented Jun 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

junrao commented Jun 30, 2022

Uh oh!

artemlivshits commented Jun 30, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvcephei commented Jul 12, 2022

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ijuma Jul 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ijuma Jul 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ijuma Jul 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

artemlivshits commented Jun 29, 2022 •

edited

Loading

ijuma Jul 19, 2022 •

edited

Loading

ijuma Jul 13, 2022 •

edited

Loading

ijuma Jul 17, 2022 •

edited

Loading