-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-3276: Support async retry with @RetryableTopic #3523
GH-3276: Support async retry with @RetryableTopic #3523
Conversation
Thank you, @chickenchickenlove , for such a great investigation and the crack on the possible solution! |
Sounds good, @artembilan 👍 My Understanding 1
My Understanding 2
Which understanding is closer to what you described? |
The
So, I assume that your impl deals already with a I'm not sure what is
And as you see, the |
Thanks for your description. 🙇♂️ FYI, From now on, function calls flow is : |
@artembilan , For example, we have
In that case, Initially, only the However, once async failures occurs, a Thus, it can not be proper workaround. How about adding a I imagined adding a method like What are your thoughts on this approach? |
If so, it seems to work well. I made a new commit and pushed it to current branch. Would you consider this to be the right direction? If so, there are a few additional points we may need to consider 😀 I think that these are needed!
|
Hi, @artembilan ! When you have time, could you take a look? 🙇♂️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what is going on now.
Thank you for digging further!
Now I know what is the power of single-threaded processors and how to handle its behavior.
So, I have only one simple concern about public API and after that it would be great to mention this feature in the docs.
Re. error handling of the error handler.
See invokeErrorHandler()
:
try {
handled = this.commonErrorHandler.handleOne(rte, cRecord, this.consumer,
KafkaMessageListenerContainer.this.thisOrParentContainer);
}
catch (Exception ex) {
this.logger.error(ex, "ErrorHandler threw unexpected exception");
}
So, that should be enough for now to close all the gaps we have so far.
spring-kafka/src/main/java/org/springframework/kafka/listener/MessageListener.java
Outdated
Show resolved
Hide resolved
@artembilan , sorry to bother you.
I can't understand your comments 😢. |
...rc/main/java/org/springframework/kafka/listener/adapter/MessagingMessageListenerAdapter.java
Outdated
Show resolved
Hide resolved
*/ | ||
@FunctionalInterface | ||
public interface MessageListener<K, V> extends GenericMessageListener<ConsumerRecord<K, V>> { | ||
|
||
default void setCallbackForAsyncFailureQueue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have you renamed method to Queue
?
What was wrong with a Callback
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes to setCallbackForAsyncFail(...)
!
...rc/main/java/org/springframework/kafka/listener/adapter/MessagingMessageListenerAdapter.java
Outdated
Show resolved
Hide resolved
...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for so lengthy review, but in my feeling we are close for merge, so I'd like to clear up things around.
Thank you for incredible work you've done with this flaky use-case!
...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java
Outdated
Show resolved
Hide resolved
...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java
Outdated
Show resolved
Hide resolved
...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java
Outdated
Show resolved
Hide resolved
...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java
Outdated
Show resolved
Hide resolved
...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java
Outdated
Show resolved
Hide resolved
...fka/src/test/java/org/springframework/kafka/retrytopic/AsyncMonoRetryTopicScenarioTests.java
Outdated
Show resolved
Hide resolved
...fka/src/test/java/org/springframework/kafka/retrytopic/AsyncMonoRetryTopicScenarioTests.java
Outdated
Show resolved
Hide resolved
...fka/src/test/java/org/springframework/kafka/retrytopic/AsyncMonoRetryTopicScenarioTests.java
Show resolved
Hide resolved
...fka/src/test/java/org/springframework/kafka/retrytopic/AsyncMonoRetryTopicScenarioTests.java
Outdated
Show resolved
Hide resolved
...rg/springframework/kafka/retrytopic/AsyncMonoFutureRetryTopicClassLevelIntegrationTests.java
Show resolved
Hide resolved
...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java
Show resolved
Hide resolved
this.genericListener instanceof KafkaBackoffAwareMessageListenerAdapter<?, ?>) { | ||
KafkaBackoffAwareMessageListenerAdapter<K, V> genListener = | ||
(KafkaBackoffAwareMessageListenerAdapter<K, V>) this.genericListener; | ||
if (genListener.getDelegate() instanceof RecordMessagingMessageListenerAdapter<K, V>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed to extract pattern variable: https://www.baeldung.com/java-pattern-matching-instanceof
@@ -1,5 +1,5 @@ | |||
/* | |||
* Copyright 2015-2019 the original author or authors. | |||
* Copyright 2015-2024 the original author or authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This apparently has to be reverted as well.
I mean you revert, but don't run check
locally.
@@ -143,4 +143,5 @@ public void onMessage(ConsumerRecord<K, V> data, Acknowledgment acknowledgment) | |||
public void onMessage(ConsumerRecord<K, V> data, Consumer<?, ?> consumer) { | |||
onMessage(data, null, consumer); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: all the changes in this class have to be reverted.
...rc/main/java/org/springframework/kafka/listener/adapter/MessagingMessageListenerAdapter.java
Show resolved
Hide resolved
weeded.incrementAndGet(); | ||
} | ||
}); | ||
assertThat(weeded.get()).isEqualTo(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How this reflection stuff is related to the logic we would like to cover with this tests?
I thought we are pursuing a goal to verify that async failures are retried.
How does KafkaAdmin
interaction affects an expected behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AsyncCompletableFutureRetryTopicClassLevelIntegrationTests
demonstrates that the async retry feature works well in RetryTopicClassLevelIntegrationTests
. It means the synchronous condition. (ConsumerRecord
succeed or fail one by one)
So, I did copy and paste RetryTopicClassLevelIntegrationTests
and replace KafkaListener
's return type to Mono<?>
or CompletableFuture<?>
.
In the RetryTopicClassLevelIntegrationTests
, weeded
seems to demonstrate that some new topics should be created.
As a result of admin.newTopics()
, instance weededTopics
has two types (TopicForRetryable
, NewTopic
).
I think that the writer of RetryTopicClassLevelIntegrationTests
want to verify whether expected NewTopic
s are created.
If there are no major issues, I would like to follow the assumptions of RetryTopicClassLevelIntegrationTests
as well.
This is because I believe that AsyncRetry
should also satisfy the synchronous retry
condition where offsets are processed sequentially, given that AsyncRetry may or may not process offsets in sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see how this weeded
are relevant to async handling.
That is really not a part of retry topic behavior, but rather its configuration.
So, having same verifications in two different places is redundant.
...ngframework/kafka/retrytopic/AsyncCompletableFutureRetryTopicClassLevelIntegrationTests.java
Outdated
Show resolved
Hide resolved
...java/org/springframework/kafka/retrytopic/AsyncCompletableFutureRetryTopicScenarioTests.java
Outdated
Show resolved
Hide resolved
...fka/src/test/java/org/springframework/kafka/retrytopic/AsyncMonoRetryTopicScenarioTests.java
Outdated
Show resolved
Hide resolved
8b2aebb
to
645d7ce
Compare
...java/org/springframework/kafka/retrytopic/AsyncCompletableFutureRetryTopicScenarioTests.java
Outdated
Show resolved
Hide resolved
@artembilan , I have a question! Don't we need test codes for this situation? 🤔
I didn't wrote any test codes for this situation. Do you think I should write it? What are your thoughts? |
Sorry, I'm not totally sure in your question. |
...rc/main/java/org/springframework/kafka/listener/adapter/MessagingMessageListenerAdapter.java
Show resolved
Hide resolved
...rc/main/java/org/springframework/kafka/listener/adapter/MessagingMessageListenerAdapter.java
Outdated
Show resolved
Hide resolved
Sorry for lack of explanation 🙇♂️ For your information, However, in the |
That's exactly what I meant about checking against |
@@ -895,6 +902,15 @@ else if (listener instanceof MessageListener) { | |||
this.wantsFullRecords = false; | |||
this.pollThreadStateProcessor = setUpPollProcessor(false); | |||
this.observationEnabled = this.containerProperties.isObservationEnabled(); | |||
|
|||
if (!AopUtils.isAopProxy(this.genericListener) && | |||
this.genericListener instanceof AbstractDelegatingMessageListenerAdapter<?>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I think your previous solution was correct.
We have to set callback only in case of a KafkaBackoffAwareMessageListenerAdapter
which is indeed an indicator that we are in retry topic scenario.
This way we won't break all other use-cases where such an error handling would be unexpected.
As per your advice, I'm glad I debugged the AsyncListener test code. When you have time, Please take another look! 🙇♂️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulling locally for final review and possible merge.
Merged as 9eafa61. thank you for awesome contribution as always! So, I've accepted those new tests as well, although they are indeed some kind of slow. A couple remarks about contribution overall:
See more info here: https://cbea.ms/git-commit/ |
Thanks for your time and awesome review, always! 🙇♂️ |
So, yeah, we really got some failing tests already from this story:
|
I see 🥲. |
Motivation:
From now on, when the listener end point method returns
CompletableFuture<?>
andMono<?>
successfully and bothCompletableFuture<?>
andMono<?>
fail to be completed due to unexpected error, these failure record never retried even if a developer set@Retryable
topic.If we supports async retry with @RetryableTopic, we expect that spring-kafka's will be improved.
Changes
ConsumerRecord
when bothCompletableFuture<?>
andMono<?>
was failed.Idea
The Kafka's
Consumer
has soft lock, so multiple threads can not access the consumer simultaneously.It means that the thread in
ThreadExecutor
which handleCompletableFuture
orMono
cannot access the consumer.my idea is very simple.
the
KafkaMessageListenerContainer
hasConcurrentLinkedDeque
as its field.Then, when
MessagingMessageListenerContainer
addasyncSuccess()
andasyncFailure()
toCompletableFuture<?>
andMono<?>
as hooks.If these future object failed to be completed, future object will call
MessagingMessageListenerContainer.asyncFailure()
.When
MessagingMessageListenerContainer.asyncFailure()
is called,FailedRecordTuple
will be put toConcurrentLinkedDeque
.Then, next loop,
KafkaMessageListenerContainer
will callhandleAsyncFailure()
.then,
KafkaMessageListenerContainers
followsFailedRecordTracker
's logic.Result
========== Previous Context(below) ========
Background
I investigate how to support
@RetryableTopic
for async.I think it is difficult task for me to solve it. 😓
Anyway, challenges are a good thing, so I dug a little deeper. 😅
First of all, i want to discuss with the reviewers about which direction i should take for the implementation.
I implemented an ErrorHandler that sends messages to
Retry
andDLT
topics when an error occurs, passing it as a callback toMessagingMessageListenerAdapter
and inserting it intoMessagingMessageListeneerAdapter#asyncFailure()
to send messages to Retry and DLT topics. Although this method has some issues, it can achieve the intended purpose.You can check successful test cases below.
shouldRetryFirstFutureTopic
shouldRetryFirstMonoTopic
shouldRetrySecondFutureTopic
shouldRetrySecondMonoTopic
shouldRetryThirdFutureTopicWithTimeout
shouldRetryThirdMonoTopicWithTimeout
shouldRetryFourthFutureTopicWithNoDlt
shouldRetryFourthMonoTopicWithNoDlt
shouldGoStraightToDltInFuture
shouldGoStraightToDltInMono
However, i realized that it is hard for
spring-kafka
to supportback-off
to async kafka listener endpoint.The
Backoff time
is determined whenKafkaMessageListenerContainer
callsinvokeErroHandler()
.At that time,
FailedRecord
is computed byThread
.However, it is hard to imagine that
CompletableFuture
andMono
always will be executed in same thread.it always will affects
nextBackOff
.Thus,
spring-kafka
cannot supportBackOff Retry
in async mode.IMHO, I think that we have 3 options.
1.
@RetryableTopic
does not supportbackoff
features inasync
mode.in this case, new kafka header is needed to distinguish between
sync
andasync
.the other one is that always use minimum backoff time \to avoid
KafkaBackoffException
2. Introduce micrometer/context-propagation to support
backoff
features in async mode.For example, https://spring.io/blog/2023/03/28/context-propagation-with-project-reactor-1-the-basics.
I believe that we can infer proper
nextBackOffTime
by using context-propagation to store and restore state in eachthread
.3. Big refactoring.
But i have no idea. maybe
SyncKafkaMessageListenerContainer
andAsyncKafkaMessageListenerContainer
?Thanks for reading this description.
What do you think? Please let me know your opinion.
Thanks in advance! 🙇♂️