KAFKA-15974: Enforce that event processing respects user-provided timeout#15640
KAFKA-15974: Enforce that event processing respects user-provided timeout#15640cadonna merged 160 commits intoapache:trunkfrom
Conversation
Yes, the network layer changes are captured in KAFKA-16200 and build on top of this PR. |
| } | ||
|
|
||
| @Test | ||
| void testEnsureEventsAreCompleted() { |
There was a problem hiding this comment.
Why did you remove this test without replacement?
There was a problem hiding this comment.
Actually seems to me that we shouldn't have this test here (and maybe this is why @kirktrue removed it before?). As I see it, this unit test is testing something that is not the ConsumerNetworkThread's responsibility (and that's why it ends up being complicated, having to mimic the reaper behaviour and spying). It is testing that events are completed, and that's the reaper.reap responsibility, so seems to me we need to:
- test that the
ConsumerNetworkThreadcalls the reaper with the full list of events -> done already in the testCleanupInvokesReaper - test that the
CompletableEventReaper.reap(Collection<?> events)completes the events -> done in CompletableEventReaperTest (testIncompleteQueue and testIncompleteTracked)
In the end, as it is, we end up asserting a behaviour we're mocking ourselves in the doAnswer, so not much value I would say? Agree with @cadonna that we need coverage, but I would say that we have it, on my points 1 and 2, and this should be removed. Makes sense?
There was a problem hiding this comment.
Yes, the test was a little suspect in terms of its value-add, so I'd removed it.
I was planning to file a Jira to move several of the tests (including this one) from ConsumerNetworkThreadTest to ApplicationEventProcessorTest. Then we could fix up some of the funkiness in this test as a separate task.
There was a problem hiding this comment.
That is all fine! I was not arguing that we need to keep the test, but if I see a test removed without replacement, I suspect a mistake. Which did apparently not happen in this case. Next time comment on the PR why you removed the test.
| consumer = newConsumer(); | ||
| completeUnsubscribeApplicationEventSuccessfully(); | ||
| consumer.unsubscribe(); | ||
| verify(backgroundEventReaper).reap(any(Long.class)); |
There was a problem hiding this comment.
You control the time here. Why do you not verify that reap() is called with the correct time?
| @Test | ||
| void testRunOnceInvokesReaper() { | ||
| consumerNetworkThread.runOnce(); | ||
| verify(applicationEventReaper).reap(any(Long.class)); |
There was a problem hiding this comment.
You control the time here. Why do you not verify that reap() is called with the correct time?
There was a problem hiding this comment.
And done here, too.
There was a problem hiding this comment.
Do you still have the change locally, because here it does still not verify the correct time?
@lianetm Thanks for the explanation! |
| // Close the consumer here as we know it will cause a FencedInstanceIdException to be thrown. | ||
| // If we get an error other than the FencedInstanceIdException, we'll raise a ruckus. | ||
| try { | ||
| consumer.close(); | ||
| } catch (KafkaException e) { | ||
| assertNotNull(e.getCause()); | ||
| assertInstanceOf(FencedInstanceIdException.class, e.getCause()); | ||
| } finally { | ||
| consumer = null; | ||
| } |
There was a problem hiding this comment.
Do we expect the close to throw? If so, we should verify that (at the moment our test will just complete successfully if the close does not throw). If that's the expectation, maybe this simpler snippet would cover it all:
| // Close the consumer here as we know it will cause a FencedInstanceIdException to be thrown. | |
| // If we get an error other than the FencedInstanceIdException, we'll raise a ruckus. | |
| try { | |
| consumer.close(); | |
| } catch (KafkaException e) { | |
| assertNotNull(e.getCause()); | |
| assertInstanceOf(FencedInstanceIdException.class, e.getCause()); | |
| } finally { | |
| consumer = null; | |
| } | |
| Throwable e = assertThrows(KafkaException.class, () -> consumer.close()); | |
| assertInstanceOf(FencedInstanceIdException.class, e.getCause()); | |
| consumer = null; |
There was a problem hiding this comment.
how did we resolve this? I see the section got completely removed, verification not needed?
There was a problem hiding this comment.
Yes, it turns out that changes made elsewhere have obviated the need for this check.
| final Timer timer) { | ||
| if (!shouldAutoCommit) | ||
| return; | ||
| void maybeAutoCommitSync(final Timer timer) { |
There was a problem hiding this comment.
This is not a "maybe" anymore, so what about autoCommitSyncAllConsumed?
There was a problem hiding this comment.
Changed to just autoCommitSync(). Is that OK?
| // First, complete (exceptionally) any events that have passed their deadline AND aren't already complete. | ||
| tracked.stream() | ||
| .filter(e -> !e.future().isDone()) | ||
| .filter(e -> currentTimeMs > e.deadlineMs()) |
There was a problem hiding this comment.
Don't we want >= here when identifying expired events? I would expect so (that's the semantic applied in the Timer class isExpired for instance)
There was a problem hiding this comment.
This is an interesting point 🤔
If a user provides a timeout of 1000 milliseconds, is it expired at 1000 milliseconds or at 1001 milliseconds?
Regardless, I will change it to >= to be consistent.
| * could occur when processing the events. In such cases, the processor will take a reference to the first | ||
| * error, continue to process the remaining events, and then throw the first error that occurred. | ||
| */ | ||
| private boolean processBackgroundEvents(EventProcessor<BackgroundEvent> processor) { |
There was a problem hiding this comment.
This processor passed as argument is in the end always a reference to the backgroundEventProcessor, so could we simplify this, remove the arg and directly reference the var? It caught my attention when seeing how this is used, which seems a bit redundant with all calls having to provide the same processBackgroundEvents(backgroundEventProcessor, ... which feels like an internal that the processBackgroundEvents could know about.
There was a problem hiding this comment.
There is a unit test that passes in a mocked event processor. Let me look at refactoring this.
There was a problem hiding this comment.
Done. That's much better 😄
Co-authored-by: Lianet Magrans <98415067+lianetm@users.noreply.github.com>
There was a problem hiding this comment.
Thanks for your patience and great effort here @kirktrue, LGTM to merge and move on with the follow ups. Just to recap, this is what I see should be address next related to timeout enforcement:
- https://issues.apache.org/jira/browse/KAFKA-16637
- https://issues.apache.org/jira/browse/KAFKA-16200
- https://issues.apache.org/jira/browse/KAFKA-16792
Also please let's have a jira to address this comment to remove the test we agreed brings no value.
Thanks again!
cc. @cadonna
|
I added KAFKA-16818 to cover the cases to refactor/migrate/remove tests. |
…eout (apache#15640) The intention of the CompletableApplicationEvent is for a Consumer to enqueue the event and then block, waiting for it to complete. The application thread will block up to the amount of the timeout. This change introduces a consistent manner in which events are expired out by checking their timeout values. The CompletableEventReaper is a new class that tracks CompletableEvents that are enqueued. Both the application thread and the network I/O thread maintain their own reaper instances. The application thread will track any CompletableBackgroundEvents that it receives and the network I/O thread will do the same with any CompletableApplicationEvents it receives. The application and network I/O threads will check their tracked events, and if any are expired, the reaper will invoke each event's CompletableFuture.completeExceptionally() method with a TimeoutException. On closing the AsyncKafkaConsumer, both threads will invoke their respective reapers to cancel any unprocessed events in their queues. In this case, the reaper will invoke each event's CompletableFuture.completeExceptionally() method with a CancellationException instead of a TimeoutException to differentiate the two cases. The overall design for the expiration mechanism is captured on the Apache wiki and the original issue (KAFKA-15848) has more background on the cause. Note: this change only handles the event expiration and does not cover the network request expiration. That is handled in a follow-up Jira (KAFKA-16200) that builds atop this change. This change also includes some minor refactoring of the EventProcessor and its implementations. This allows the event processor logic to focus on processing individual events rather than also the handling of batches of events. Reviewers: Lianet Magrans <lianetmr@gmail.com>, Philip Nee <pnee@confluent.io>, Bruno Cadonna <cadonna@apache.org>
…eout (apache#15640) The intention of the CompletableApplicationEvent is for a Consumer to enqueue the event and then block, waiting for it to complete. The application thread will block up to the amount of the timeout. This change introduces a consistent manner in which events are expired out by checking their timeout values. The CompletableEventReaper is a new class that tracks CompletableEvents that are enqueued. Both the application thread and the network I/O thread maintain their own reaper instances. The application thread will track any CompletableBackgroundEvents that it receives and the network I/O thread will do the same with any CompletableApplicationEvents it receives. The application and network I/O threads will check their tracked events, and if any are expired, the reaper will invoke each event's CompletableFuture.completeExceptionally() method with a TimeoutException. On closing the AsyncKafkaConsumer, both threads will invoke their respective reapers to cancel any unprocessed events in their queues. In this case, the reaper will invoke each event's CompletableFuture.completeExceptionally() method with a CancellationException instead of a TimeoutException to differentiate the two cases. The overall design for the expiration mechanism is captured on the Apache wiki and the original issue (KAFKA-15848) has more background on the cause. Note: this change only handles the event expiration and does not cover the network request expiration. That is handled in a follow-up Jira (KAFKA-16200) that builds atop this change. This change also includes some minor refactoring of the EventProcessor and its implementations. This allows the event processor logic to focus on processing individual events rather than also the handling of batches of events. Reviewers: Lianet Magrans <lianetmr@gmail.com>, Philip Nee <pnee@confluent.io>, Bruno Cadonna <cadonna@apache.org>
…eout (apache#15640) The intention of the CompletableApplicationEvent is for a Consumer to enqueue the event and then block, waiting for it to complete. The application thread will block up to the amount of the timeout. This change introduces a consistent manner in which events are expired out by checking their timeout values. The CompletableEventReaper is a new class that tracks CompletableEvents that are enqueued. Both the application thread and the network I/O thread maintain their own reaper instances. The application thread will track any CompletableBackgroundEvents that it receives and the network I/O thread will do the same with any CompletableApplicationEvents it receives. The application and network I/O threads will check their tracked events, and if any are expired, the reaper will invoke each event's CompletableFuture.completeExceptionally() method with a TimeoutException. On closing the AsyncKafkaConsumer, both threads will invoke their respective reapers to cancel any unprocessed events in their queues. In this case, the reaper will invoke each event's CompletableFuture.completeExceptionally() method with a CancellationException instead of a TimeoutException to differentiate the two cases. The overall design for the expiration mechanism is captured on the Apache wiki and the original issue (KAFKA-15848) has more background on the cause. Note: this change only handles the event expiration and does not cover the network request expiration. That is handled in a follow-up Jira (KAFKA-16200) that builds atop this change. This change also includes some minor refactoring of the EventProcessor and its implementations. This allows the event processor logic to focus on processing individual events rather than also the handling of batches of events. Reviewers: Lianet Magrans <lianetmr@gmail.com>, Philip Nee <pnee@confluent.io>, Bruno Cadonna <cadonna@apache.org>
…eout (apache#15640) The intention of the CompletableApplicationEvent is for a Consumer to enqueue the event and then block, waiting for it to complete. The application thread will block up to the amount of the timeout. This change introduces a consistent manner in which events are expired out by checking their timeout values. The CompletableEventReaper is a new class that tracks CompletableEvents that are enqueued. Both the application thread and the network I/O thread maintain their own reaper instances. The application thread will track any CompletableBackgroundEvents that it receives and the network I/O thread will do the same with any CompletableApplicationEvents it receives. The application and network I/O threads will check their tracked events, and if any are expired, the reaper will invoke each event's CompletableFuture.completeExceptionally() method with a TimeoutException. On closing the AsyncKafkaConsumer, both threads will invoke their respective reapers to cancel any unprocessed events in their queues. In this case, the reaper will invoke each event's CompletableFuture.completeExceptionally() method with a CancellationException instead of a TimeoutException to differentiate the two cases. The overall design for the expiration mechanism is captured on the Apache wiki and the original issue (KAFKA-15848) has more background on the cause. Note: this change only handles the event expiration and does not cover the network request expiration. That is handled in a follow-up Jira (KAFKA-16200) that builds atop this change. This change also includes some minor refactoring of the EventProcessor and its implementations. This allows the event processor logic to focus on processing individual events rather than also the handling of batches of events. Reviewers: Lianet Magrans <lianetmr@gmail.com>, Philip Nee <pnee@confluent.io>, Bruno Cadonna <cadonna@apache.org>
The intention of the
CompletableApplicationEventis for aConsumerto enqueue the event and then block, waiting for it to complete. The application thread will block up to the amount of the timeout. This change introduces a consistent manner in which events are expired out by checking their timeout values.The
CompletableEventReaperis a new class that tracksCompletableEvents that are enqueued. Both the application thread and the network I/O thread maintain their own reaper instances. The application thread will track anyCompletableBackgroundEvents that it receives and the network I/O thread will do the same with anyCompletableApplicationEvents it receives. The application and network I/O threads will check their tracked events, and if any are expired, the reaper will invoke each event'sCompletableFuture.completeExceptionally()method with aTimeoutException.On closing the
AsyncKafkaConsumer, both threads will invoke their respective reapers to cancel any unprocessed events in their queues. In this case, the reaper will invoke each event'sCompletableFuture.completeExceptionally()method with aCancellationExceptioninstead of aTimeoutExceptionto differentiate the two cases.The overall design for the expiration mechanism is captured on the Apache wiki and the original issue (KAFKA-15848) has more background on the cause.
Note: this change only handles the event expiration and does not cover the network request expiration. That is handled in a follow-up Jira (KAFKA-16200) that builds atop this change.
This change also includes some minor refactoring of the
EventProcessorand its implementations. This allows the event processor logic to focus on processing individual events rather than also the handling of batches of events.Committer Checklist (excluded from commit message)