-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-16143: New JMX metrics for AsyncKafkaConsumer #17199
base: trunk
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @FrankYang0529 for working on this!
I am intentionally not going to look at this too in-depth because it's a draft at the moment. However, it appears that the KafkaConsumerMetrics
object is being used in both the application thread and background thread. From a quick glance, KafkaConsumerMetrics
doesn't look thread-safe 🤔
If that's the case, we'll need to figure out how we can avoid potential synchronization issues.
Thanks!
@FrankYang0529—I wanted to check in and see if you had any questions or needed anything to progress on this. Thanks! |
Hi @kirktrue, sorry for late. I will make this PR ready today. |
b1b848f
to
f68c568
Compare
Hi @kirktrue, good catch. There is kafka/clients/src/main/java/org/apache/kafka/common/metrics/Sensor.java Lines 230 to 244 in bb6ebd8
The PR is almost ready. I will wait to see CI result. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @FrankYang0529!
This looks pretty comprehensive. I haven't had time to dive as deep as I would like, but I left a first pass of comments.
Thanks!
kafkaConsumerMetrics.recordBackgroundEventQueueTime(time.milliseconds() - event.addedToQueueMs()); | ||
long startMs = time.milliseconds(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the metric determine how long it takes to process the background event queue, or how long it takes to process background events? If it's the latter, we want to update the metric inside each loop, but if it's the former we should update once outside the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this metric determine how long a background event is taking to be dequeued. In BackgroundEventHandler#add
, we run BackgroundEvent#setAddedToQueueMs
. When we start to process it, we can use current time - event.addedToQueueMs
to know how long an event is in the queue.
@@ -139,6 +155,9 @@ void runOnce() { | |||
processApplicationEvents(); | |||
|
|||
final long currentTimeMs = time.milliseconds(); | |||
final long timeSinceLastPollMs = lastPollTimeMs != 0L ? currentTimeMs - lastPollTimeMs : currentTimeMs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the first invocation of runOnce()
, timeSinceLastPollMs
will be something like 1728954137284
. Is that correct?
public ConsumerNetworkThread(LogContext logContext, | ||
Time time, | ||
BlockingQueue<ApplicationEvent> applicationEventQueue, | ||
CompletableEventReaper applicationEventReaper, | ||
Supplier<ApplicationEventProcessor> applicationEventProcessorSupplier, | ||
Supplier<NetworkClientDelegate> networkClientDelegateSupplier, | ||
Supplier<RequestManagers> requestManagersSupplier) { | ||
this(logContext, time, applicationEventQueue, applicationEventReaper, applicationEventProcessorSupplier, | ||
networkClientDelegateSupplier, requestManagersSupplier, Optional.empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to remove this constructor? Can't we call the existing constructor with Optional.empty()
?
public NetworkClientDelegate( | ||
final Time time, | ||
final ConsumerConfig config, | ||
final LogContext logContext, | ||
final KafkaClient client, | ||
final Metadata metadata, | ||
final BackgroundEventHandler backgroundEventHandler) { | ||
this(time, config, logContext, client, metadata, backgroundEventHandler, Optional.empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to remove this constructor? Can't callers invoke the existing constructor with Optional.empty()
?
public void setAddedToQueueMs(long addedToQueueMs) { | ||
this.addedToQueueMs = addedToQueueMs; | ||
} | ||
|
||
public long addedToQueueMs() { | ||
return addedToQueueMs; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public void setAddedToQueueMs(long addedToQueueMs) { | |
this.addedToQueueMs = addedToQueueMs; | |
} | |
public long addedToQueueMs() { | |
return addedToQueueMs; | |
} | |
public void setEnqueuedMs(long enqueuedMs) { | |
enqueuedMs = enqueuedMs; | |
} | |
public long enqueuedMs() { | |
return enqueuedMs; | |
} |
@@ -73,7 +90,9 @@ public ApplicationEventHandler(final LogContext logContext, | |||
*/ | |||
public void add(final ApplicationEvent event) { | |||
Objects.requireNonNull(event, "ApplicationEvent provided to add must be non-null"); | |||
event.setAddedToQueueMs(System.currentTimeMillis()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to get the current time in milliseconds from the Time
object that was passed in to the constructor.
void setAddedToQueueMs(final long addedToQueueMs) { | ||
this.addedToQueueMs = addedToQueueMs; | ||
} | ||
|
||
long addedToQueueMs() { | ||
return addedToQueueMs; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a bit nit-picky, but can we change it to:
void setAddedToQueueMs(final long addedToQueueMs) { | |
this.addedToQueueMs = addedToQueueMs; | |
} | |
long addedToQueueMs() { | |
return addedToQueueMs; | |
} | |
void setEnqueuedMs(final long enqueuedMs) { | |
this.enqueuedMs = enqueuedMs; | |
} | |
long enqueuedMs() { | |
return enqueuedMs; | |
} |
public void setAddedToQueueMs(long addedToQueueMs) { | ||
this.addedToQueueMs = addedToQueueMs; | ||
} | ||
|
||
public long addedToQueueMs() { | ||
return addedToQueueMs; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public void setAddedToQueueMs(long addedToQueueMs) { | |
this.addedToQueueMs = addedToQueueMs; | |
} | |
public long addedToQueueMs() { | |
return addedToQueueMs; | |
} | |
public void setEnqueuedMs(long enqueuedMs) { | |
this. enqueuedMs = enqueuedMs; | |
} | |
public long enqueuedMs() { | |
return enqueuedMs; | |
} |
public BackgroundEventHandler(final Queue<BackgroundEvent> backgroundEventQueue) { | ||
this(backgroundEventQueue, Optional.empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with the other changes. I'd rather make the callers pass in Optional.empty()
.
@@ -42,6 +51,8 @@ public BackgroundEventHandler(final Queue<BackgroundEvent> backgroundEventQueue) | |||
*/ | |||
public void add(BackgroundEvent event) { | |||
Objects.requireNonNull(event, "BackgroundEvent provided to add must be non-null"); | |||
event.setAddedToQueueMs(System.currentTimeMillis()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here: we need to update the constructor to provide a Time
object and then use that here.
f68c568
to
33e8f2c
Compare
Hi @kirktrue, I address all comments. Could you take a look when you have time? Thank you. |
33e8f2c
to
16b5b03
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @FrankYang0529!
Overall, I think it's headed in the right direction. I have mostly minor requests, but the one that's slightly larger is class hierarchy around the KafkaConsumerMetrics
. If you want to punt on that, file a Jira and I'll take care of it later.
I didn't get to the unit tests yet, though 😢
Thanks!
public void setEnqueuedMs(long enqueuedMs) { | ||
this.enqueuedMs = enqueuedMs; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that we're introducing mutability into all the events. I understand that this changes a lot less code than having the constructor take the creation time or something.
Can you add a comment to the enqueued
variable that states that because of its mutability that it should not be used in hashCode()
or equals()
or toStringBase()
(or something along those lines)?
@@ -37,6 +37,7 @@ public enum Type { | |||
* {@link #equals(Object)} and can be used in log messages when debugging. | |||
*/ | |||
private final Uuid id; | |||
private long enqueuedMs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as per above with ApplicationEvent
.
kafkaConsumerMetrics.ifPresent(metrics -> metrics.recordApplicationEventQueueTime(time.milliseconds() - event.enqueuedMs())); | ||
long startMs = time.milliseconds(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we swap the ordering of these two lines to avoid the extra call to time.milliseconds()
?
kafkaConsumerMetrics.ifPresent(metrics -> metrics.recordApplicationEventQueueTime(time.milliseconds() - event.enqueuedMs())); | |
long startMs = time.milliseconds(); | |
long startMs = time.milliseconds(); | |
kafkaConsumerMetrics.ifPresent(metrics -> metrics.recordApplicationEventQueueTime(startMs - event.enqueuedMs())); |
@@ -162,15 +171,20 @@ void runOnce() { | |||
private void processApplicationEvents() { | |||
LinkedList<ApplicationEvent> events = new LinkedList<>(); | |||
applicationEventQueue.drainTo(events); | |||
kafkaConsumerMetrics.ifPresent(metrics -> metrics.recordApplicationEventQueueSize(applicationEventQueue.size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we drain applicationEventQueue
, it'll be empty, right? If, so why not just do:
kafkaConsumerMetrics.ifPresent(metrics -> metrics.recordApplicationEventQueueSize(applicationEventQueue.size())); | |
kafkaConsumerMetrics.ifPresent(metrics -> metrics.recordApplicationEventQueueSize(0)); |
private ApplicationEventProcessor applicationEventProcessor; | ||
private NetworkClientDelegate networkClientDelegate; | ||
private RequestManagers requestManagers; | ||
private volatile boolean running; | ||
private final IdempotentCloser closer = new IdempotentCloser(); | ||
private volatile Duration closeTimeout = Duration.ofMillis(DEFAULT_CLOSE_TIMEOUT_MS); | ||
private volatile long cachedMaximumTimeToWait = MAX_POLL_TIMEOUT_MS; | ||
private long lastPollTimeMs = 0L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only ever written to and read from the same thread, right?
@@ -42,6 +47,8 @@ public BackgroundEventHandler(final Queue<BackgroundEvent> backgroundEventQueue) | |||
*/ | |||
public void add(BackgroundEvent event) { | |||
Objects.requireNonNull(event, "BackgroundEvent provided to add must be non-null"); | |||
event.setEnqueuedMs(System.currentTimeMillis()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add a Time
variable to this class so that we can do this instead:
event.setEnqueuedMs(System.currentTimeMillis()); | |
event.setEnqueuedMs(time.milliseconds()); |
import java.util.concurrent.TimeUnit; | ||
|
||
import static org.apache.kafka.clients.consumer.internals.ConsumerUtils.CONSUMER_METRICS_SUFFIX; | ||
|
||
public class KafkaConsumerMetrics implements AutoCloseable { | ||
private final GroupProtocol groupProtocol; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to create a subclass of KafkaConsumerMetrics
instead of adding group protocol-specific bits on here? I see that the ShareConsumerImpl
uses a custom KafkaShareConsumerMetrics
that at first glance is entirely a subset of KafkaConsumerMetrics
, so there's some refactoring that could be done here.
private Sensor timeBetweenNetworkThreadPollSensor; | ||
private Sensor applicationEventQueueSizeSensor; | ||
private Sensor applicationEventQueueTimeSensor; | ||
private Sensor applicationEventQueueProcessingTimeSensor; | ||
private Sensor backgroundEventQueueSizeSensor; | ||
private Sensor backgroundEventQueueTimeSensor; | ||
private Sensor backgroundEventQueueProcessingTimeSensor; | ||
private Sensor unsentRequestsQueueSizeSensor; | ||
private Sensor unsentRequestsQueueTimeSensor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make these final
too? It'll be a bit ugly in the else
block of the constructor to mark them all as null
. But that goes away if we make this a subclass.
metricGroupName, "The number of seconds since the last poll() invocation."); | ||
metricGroupName, "The number of seconds since the last poll() invocation."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove these whitespace changes? They seem extraneous.
|
||
if (groupProtocol == GroupProtocol.CONSUMER) { | ||
Arrays.asList( | ||
timeBetweenNetworkThreadPollSensor.name(), | ||
applicationEventQueueSizeSensor.name(), | ||
applicationEventQueueTimeSensor.name(), | ||
applicationEventQueueProcessingTimeSensor.name(), | ||
backgroundEventQueueSizeSensor.name(), | ||
backgroundEventQueueTimeSensor.name(), | ||
backgroundEventQueueProcessingTimeSensor.name(), | ||
unsentRequestsQueueSizeSensor.name(), | ||
unsentRequestsQueueTimeSensor.name() | ||
).forEach(metrics::removeSensor); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another place that having a subclass would help.
16b5b03
to
f121073
Compare
Signed-off-by: PoAn Yang <payang@apache.org>
f121073
to
90af010
Compare
Add following metrics to AsyncKafkaConsumer:
Committer Checklist (excluded from commit message)