Improving Observability in Asynchronous Processing (`CompletableFuture`, `Mono`) #3528

chickenchickenlove · 2024-10-04T23:00:10Z

Expected Behavior

When the consumer endpoint method returns CompletableFuture<?> or Mono<?> as a result, and both the CompletableFuture<?> and the Mono<?> are fail to complete due to an error thrown in the async loop, spring-kafka should report metrics indicating that there have been failures in async consumer tasks.

For example,

spring_kafka_listener_seconds_count{
error="AsyncLoopExecutionFailedException",
messaging_kafka_consumer_group="topic4",
messaging_operation="receive",
messaging_source_kind="topic",
messaging_source_name="topic4",
messaging_system="kafka",
spring_kafka_listener_id="org.springframework.kafka.KafkaListenerEndpointContainer#0-0"} 6

Current Behavior

When the consumer endpoint method returns CompletableFuture<?> or Mono<?> as result, the MessagingMessageListenerAdapater adds a callback to both the CompletableFuture<?> and the Mono<?>.

However, currently, KafkaMessageListenerContainer does not consider whether CompletableFuture and Mono fail to complete.

If both CompletableFuture and Mono instances are success to be created, KafkaMessageListenerContainer will not report error metric even if both CompletableFuture and Mono fail to complete.

Context

I discovered this issue while solving another problem(GH-3276), and I checked to see if it was actually the case.
As expected, there was an issue.

To reproduce it, you can use method below.
it describes that

The CompletableFuture<?> successes to be created.
The CompletableFuture<?> fails to complete.

@Slf4j
@Component
public class MyAsyncBean {

    @RetryableTopic(
            attempts = "5",
            topicSuffixingStrategy = TopicSuffixingStrategy.SUFFIX_WITH_DELAY_VALUE,
            autoCreateTopics = "true",
            backoff = @Backoff(delay = 1000)
    )
    @KafkaListener(topics = "topic4", groupId = "topic4")
    public CompletableFuture<�Void> receive(ConsumerRecord<Object, Object> record) {
        return CompletableFuture.supplyAsync(() -> {
                    try {
                        Thread.sleep(1);
                    } catch (InterruptedException e) {
                        throw new RuntimeException(e);
                    }

                    throw new RuntimeException("Should report metric.");
                });
    }
}

To reviewer

If you consider this to be a reasonable request,
may I take on this task and try to solve it?

The text was updated successfully, but these errors were encountered:

artembilan · 2024-10-07T14:10:23Z

I think this makes sense, but does not look that easy to fix.
We have a code like this in the KafkaMessageListenerContainer:

return observation.observe(() -> {
				try {
					invokeOnMessage(cRecord);
					successTimer(sample, cRecord);
					recordInterceptAfter(cRecord, null);
				}
				catch (RuntimeException e) {
					failureTimer(sample, cRecord);
					recordInterceptAfter(cRecord, e);
					if (this.commonErrorHandler == null) {
						throw e;
					}
					observation.error(e);

So, as long as listener returns control over here, an observation is closed.
It sounds like we must not use an observe(), but rather start() manually and deal with it in the mentioned MessagingMessageListenerAdapter. Stop it there (or error()) respectively in the async branch.
But need to dig more for technical details.
Feel free to give it a try!

chickenchickenlove · 2024-10-08T23:25:48Z

@artembilan thanks for considering this as a one of reasonable request!
As you mentioned, Observation instance will close its observe scope when each record is completed.

It means that Observation instance in KafkaMessageListenerContainer does not consider the result of CompletableFuture<?> and Mono<?>.

Therefore, i believe a new Observation instance is needed here, similar to the one in KafkaMessageListenerContainer.

// MessagingMessageListenerAdapater.java
if (result instanceof CompletableFuture<?> completable) {
         ...
	completable.whenComplete((r, t) -> {
                 // newObservation maybe?
                 observation.observer(() -> { 
                     if (t == null) {
		  	asyncSuccess(r, replyTopic, source, messageReturnType);
		  	acknowledge(acknowledgment);
		    }
		    else {
		  	asyncFailure(request, acknowledgment, consumer, t, source);
		    }
                 }
                 ...

This way can't measure processed time metrics accurately.
if accurate async processed time is needed, developers should create observation instance and include it their endpoint method.

To sum up, i think we have two options.

Add a new observation instance to MessagingMessageListenerAdapter.handleResult() and open new observation scope. (pros: readability, cons : it may report inaccurate processing time)
Declare a new Observation class for measuring async objects, and guide developers to include it in their endpoint methods. (pros : report accurate processing time, cons: readability)

	@KafkaListener(...)
	static class TopicListener {

		@KafkaHandler
		public Mono<Void> listenAgain(String message) {
                         Observation ob = new AsyncAwareObservation();
			return Mono.fromCallable(() -> {
                                 ob.observer(() -> { 
                                 try {
                                      // Developer's logic.
                                      ...
				}
				catch (Exception e) {
					ob.error(e);
				}
                                 });
				...
			});
		}
	}

What do you think?
When you have time, please let me know 🙇‍♂️

artembilan · 2024-10-09T15:01:54Z

Well, async it or not, it is still a unit of work.
For observation it does not matter if it was started in one thread and finished in the other.
So, propagating it from one thread to another when that Mono/CompletableFuture is completed is the way to go.
See more info in Micrometer docs: https://micrometer.io/docs/observation#instrumentation_of_thread_switching_components.

I don't think though that we need to instrument an async executor, but having a variable for the registry.getCurrentObservation() and propagate it down to the whenComplete() should be enough.
The KafkaMessageListenerContainer then must just start an observation and that MessagingMessageListenerAdapter should get it in its handleResult() to stop for sync branch and propagate for stop or error in the whenComplete().

sobychacko · 2024-10-09T19:19:05Z

@chickenchickenlove We will take a crack at this on our end and share the feedback.

chickenchickenlove · 2024-10-10T12:49:04Z

Thanks for giving me deep information! 🙇‍♂️
I will study it even if I can't complete the task.

Also, thanks for keeping me updated on what's going on!

…ssing (CompletableFuture, Mono) Fixes: spring-projects#3528 spring-projects#3528 - Improve spring-kafka observability for failures in async consumer tasks when listener methods return CompletableFuture<?> or Mono<?> and throw errors during async execution - Refactoring code in KafkaMessageListenerContainer and MessagingMessageListenerAdapter around observability - Adding tests to verify - Add @nullable annotations to relevant methods for better null safety

…ssing Fixes: spring-projects#3528 spring-projects#3528 - Improve spring-kafka observability for failures in async consumer tasks when listener methods return CompletableFuture<?> or Mono<?> and throw errors during async execution - Refactoring code in KafkaMessageListenerContainer and MessagingMessageListenerAdapter around observability - Adding tests to verify - Add @nullable annotations to relevant methods for better null safety

Fixes: #3528 #3528 - Improve spring-kafka observability for failures in async consumer tasks when listener methods return CompletableFuture<?> or Mono<?> and throw errors during async execution - Refactoring code in KafkaMessageListenerContainer and MessagingMessageListenerAdapter around observability - Adding tests to verify - Add @nullable annotations to relevant methods for better null safety * Addressing PR review

…ack when sync Fixes: spring-projects#3695 Even if th `@KafkaHandler` method is `void` the `DelegatingInvocableHandler` returns an empty `InvocationResult`. That triggers a `MessagingMessageListenerAdapter.handleResult()` logic. On the `completableFutureResult.whenComplete()` we call `acknowledge()` which is not expected for `void` POJO methods. * Fix `MessagingMessageListenerAdapter` to check for `isAsyncReplies()` before calling `acknowledge()` This is a regression after spring-projects#3528

…3696) Fixes: #3695 Even if th `@KafkaHandler` method is `void` the `DelegatingInvocableHandler` returns an empty `InvocationResult`. That triggers a `MessagingMessageListenerAdapter.handleResult()` logic. On the `completableFutureResult.whenComplete()` we call `acknowledge()` which is not expected for `void` POJO methods. * Fix `MessagingMessageListenerAdapter` to check for `isAsyncReplies()` before calling `acknowledge()` This is a regression after #3528

chickenchickenlove added status: waiting-for-triage type: enhancement labels Oct 4, 2024

artembilan removed the status: waiting-for-triage label Oct 7, 2024

sobychacko modified the milestone: Backlog Oct 8, 2024

sobychacko self-assigned this Oct 14, 2024

sobychacko mentioned this issue Oct 15, 2024

GH-3528: Improving Observability in Asynchronous Processing #3558

Merged

artembilan added this to the 3.3.0-RC1 milestone Oct 16, 2024

artembilan closed this as completed in #3558 Oct 16, 2024

artembilan mentioned this issue Dec 27, 2024

Acknowledge+commit always issued in AckMode.MANUAL #3695

Closed

artembilan mentioned this issue Dec 27, 2024

GH-3695: Fix MessagingMessageListenerAdapter to not ack when sync #3696

Merged

artembilan mentioned this issue Apr 16, 2025

If an Error is thrown in the application's @KafkaListener listener code #3843

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improving Observability in Asynchronous Processing (`CompletableFuture`, `Mono`) #3528

Improving Observability in Asynchronous Processing (`CompletableFuture`, `Mono`) #3528

chickenchickenlove commented Oct 4, 2024 •

edited

Loading

artembilan commented Oct 7, 2024

Uh oh!

chickenchickenlove commented Oct 8, 2024 •

edited

Loading

Uh oh!

artembilan commented Oct 9, 2024

Uh oh!

sobychacko commented Oct 9, 2024

Uh oh!

chickenchickenlove commented Oct 10, 2024 •

edited

Loading

Uh oh!

Improving Observability in Asynchronous Processing (CompletableFuture, Mono) #3528

Improving Observability in Asynchronous Processing (CompletableFuture, Mono) #3528

Comments

chickenchickenlove commented Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Expected Behavior

Current Behavior

Context

To reviewer

artembilan commented Oct 7, 2024

Uh oh!

chickenchickenlove commented Oct 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

artembilan commented Oct 9, 2024

Uh oh!

sobychacko commented Oct 9, 2024

Uh oh!

chickenchickenlove commented Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Improving Observability in Asynchronous Processing (`CompletableFuture`, `Mono`) #3528

Improving Observability in Asynchronous Processing (`CompletableFuture`, `Mono`) #3528

chickenchickenlove commented Oct 4, 2024 •

edited

Loading

chickenchickenlove commented Oct 8, 2024 •

edited

Loading

chickenchickenlove commented Oct 10, 2024 •

edited

Loading