KAFKA-13064: Make ListConsumerGroupOffsetsHandler unmap for COORDINATOR_NOT_AVAILABLE error #11026

showuon · 2021-07-12T12:51:14Z

Make ListConsumerGroupOffsetsHandler unmap for COORDINATOR_NOT_AVAILABLE error

This is the old handle response logic. FYR:

void handleResponse(AbstractResponse abstractResponse) {
      final OffsetFetchResponse response = (OffsetFetchResponse) abstractResponse;
      final Map<TopicPartition, OffsetAndMetadata> groupOffsetsListing = new HashMap<>();

      // If coordinator changed since we fetched it, retry
      // here, we'll check all errors, including partition errors, to see if we need to retry
      if (ConsumerGroupOperationContext.hasCoordinatorMoved(response)) {
          Call call = getListConsumerGroupOffsetsCall(context);
          rescheduleFindCoordinatorTask(context, () -> call, this);
          return;
      }

      if (handleGroupRequestError(response.error(), context.future()))
          return;

      for (Map.Entry<TopicPartition, OffsetFetchResponse.PartitionData> entry :
          response.responseData().entrySet()) {
          final TopicPartition topicPartition = entry.getKey();
          OffsetFetchResponse.PartitionData partitionData = entry.getValue();
          final Errors error = partitionData.error;

          if (error == Errors.NONE) {
              final Long offset = partitionData.offset;
              final String metadata = partitionData.metadata;
              final Optional<Integer> leaderEpoch = partitionData.leaderEpoch;
              // Negative offset indicates that the group has no committed offset for this partition
              if (offset < 0) {
                  groupOffsetsListing.put(topicPartition, null);
              } else {
                  groupOffsetsListing.put(topicPartition, new OffsetAndMetadata(offset, leaderEpoch, metadata));
              }
          } else {
              log.warn("Skipping return offset for {} due to error {}.", topicPartition, error);
          }
      }
      context.future().complete(groupOffsetsListing);
  }

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

showuon · 2021-07-12T12:53:19Z

@dajac , please take a look. Thanks.

dajac

Left a few comments. Thanks for the PR.

.../src/main/java/org/apache/kafka/clients/admin/internals/ListConsumerGroupOffsetsHandler.java

dajac · 2021-07-13T15:15:29Z

.../src/main/java/org/apache/kafka/clients/admin/internals/ListConsumerGroupOffsetsHandler.java

+
+        if (groupsToUnmap.isEmpty() && groupsToRetry.isEmpty()) {
+            return new ApiResult<>(
+                completed,


We could get rid of completed and use Collections.singletonMap(groupId, groupOffsetsListing), no?

No, we can't do that because the completed here could be empty map. If we put Collections.singletonMap(groupId, groupOffsetsListing), it'll always not empty. Thanks.

@showuon I think that there is a case that we don't handle correctly.

Imagine that GROUP_AUTHORIZATION_FAILED is returned as a partition error. In this case, we ignore it in handlePartitionError and therefore don't add the failed group to failed. I think that we should also handle all the group level errors in handlePartitionError.

The second thing is that if there is a group failure, we should not add the group to completed at L131. Otherwise, this will complete the group future with an empty list.

Could you check this out and add a test for it?

Good suggestion! Will do it tomorrow (my time). Thanks.

dajac · 2021-07-13T15:17:02Z

.../src/main/java/org/apache/kafka/clients/admin/internals/ListConsumerGroupOffsetsHandler.java

    }

-}
+}


nit: Could we bring this back?

.../src/main/java/org/apache/kafka/clients/admin/internals/ListConsumerGroupOffsetsHandler.java

dajac

@showuon Thanks for the update. I left a few more comments.

dajac · 2021-07-15T12:57:30Z

.../src/main/java/org/apache/kafka/clients/admin/internals/ListConsumerGroupOffsetsHandler.java

+
+        if (groupsToUnmap.isEmpty() && groupsToRetry.isEmpty()) {
+            return new ApiResult<>(
+                completed,


@showuon I think that there is a case that we don't handle correctly.

Imagine that GROUP_AUTHORIZATION_FAILED is returned as a partition error. In this case, we ignore it in handlePartitionError and therefore don't add the failed group to failed. I think that we should also handle all the group level errors in handlePartitionError.

The second thing is that if there is a group failure, we should not add the group to completed at L131. Otherwise, this will complete the group future with an empty list.

Could you check this out and add a test for it?

dajac · 2021-07-15T12:58:32Z

.../src/main/java/org/apache/kafka/clients/admin/internals/ListConsumerGroupOffsetsHandler.java

+        switch (error) {
+            case COORDINATOR_LOAD_IN_PROGRESS:
+                // If the coordinator is in the middle of loading, then we just need to retry
+                log.debug("`{}` request for group {} failed because the coordinator " +


Could we also update the log messages here and below to follow what you did in handleGroupError?

Oh, sorry, I forgot the partitionError section. Will do.

.../src/main/java/org/apache/kafka/clients/admin/internals/ListConsumerGroupOffsetsHandler.java

dajac · 2021-07-15T20:54:05Z

.../src/main/java/org/apache/kafka/clients/admin/internals/ListConsumerGroupOffsetsHandler.java

+                final String unexpectedErrorMsg =
+                    String.format("`OffsetFetch` request for group id %s failed due to error %s", groupId.idValue, error);
+                log.error(unexpectedErrorMsg);
+                failed.put(groupId, error.exception(unexpectedErrorMsg));


Could we also remove providing the error message here like we did for the others?

Updated. Thanks.

…ll group level errors

showuon · 2021-07-16T07:50:00Z

Failed tests are unrelated, thanks.

    Build / JDK 16 and Scala 2.13 / kafka.api.TransactionsTest.testCommitTransactionTimeout()
    Build / JDK 11 and Scala 2.13 / kafka.api.ConsumerBounceTest.testCloseDuringRebalance()
    Build / JDK 11 and Scala 2.13 / kafka.api.ConsumerBounceTest.testCloseDuringRebalance()
    Build / JDK 8 and Scala 2.12 / kafka.api.ConsumerBounceTest.testCloseDuringRebalance()
    Build / JDK 8 and Scala 2.12 / kafka.api.ConsumerBounceTest.testCloseDuringRebalance()

dajac

LGTM

dajac · 2021-07-16T07:58:22Z

Failures are not related:

Build / JDK 16 and Scala 2.13 / testCommitTransactionTimeout() – kafka.api.TransactionsTest
9s
Build / JDK 11 and Scala 2.13 / testCloseDuringRebalance() – kafka.api.ConsumerBounceTest
7s
Build / JDK 11 and Scala 2.13 / testCloseDuringRebalance() – kafka.api.ConsumerBounceTest
10s
Build / JDK 8 and Scala 2.12 / testCloseDuringRebalance() – kafka.api.ConsumerBounceTest
6s
Build / JDK 8 and Scala 2.12 / testCloseDuringRebalance() – kafka.api.ConsumerBounceTest

…OR_NOT_AVAILABLE error (#11026) This patch improve the error handling in `ListConsumerGroupOffsetsHandler` and ensures that `COORDINATOR_NOT_AVAILABLE` is unmapped in order to look up the coordinator again. Reviewers: David Jacot <djacot@confluent.io>

dajac · 2021-07-16T08:00:08Z

Merged to trunk and 3.0.

dajac · 2021-07-16T08:00:49Z

@showuon Thanks for the patches. Could you update the description of this PR and the others to ensure that the description reflects the changes?

showuon · 2021-07-16T08:05:38Z

@dajac , all checked and updated. Thank you very much for your patiently review all these PRs! After these update, we are more confident in these new handlers. :)

…OR_NOT_AVAILABLE error (apache#11026) This patch improve the error handling in `ListConsumerGroupOffsetsHandler` and ensures that `COORDINATOR_NOT_AVAILABLE` is unmapped in order to look up the coordinator again. Reviewers: David Jacot <djacot@confluent.io>

KAFKA-13064: refactor ListConsumerGroupOffsetsHandler and tests

9a82c65

showuon mentioned this pull request Jul 13, 2021

KAFKA-13033: COORDINATOR_NOT_AVAILABLE should be unmapped #10973

Closed

3 tasks

dajac reviewed Jul 13, 2021

View reviewed changes

showuon added 4 commits July 14, 2021 15:33

Merge branch 'trunk' of https://github.com/apache/kafka into KAFKA-13064

872dc09

KAFKA-13064: refactor codes

9b6f185

KAFKA-13064: refactor

237ee0d

KAFKA-13064: update the comment to V0 and V1

b685e6f

dajac reviewed Jul 15, 2021

View reviewed changes

.../src/main/java/org/apache/kafka/clients/admin/internals/ListConsumerGroupOffsetsHandler.java Outdated Show resolved Hide resolved

dajac reviewed Jul 15, 2021

View reviewed changes

showuon force-pushed the KAFKA-13064 branch from b56af45 to eea7a26 Compare July 16, 2021 02:10

KAFKA-13064: remove handlePartitionError since group error contains a…

b41f92d

…ll group level errors

showuon force-pushed the KAFKA-13064 branch from eea7a26 to b41f92d Compare July 16, 2021 02:14

dajac changed the title ~~KAFKA-13064: refactor ListConsumerGroupOffsetsHandler and tests~~ KAFKA-13064: Make ListConsumerGroupOffsetsHandler unmap for COORDINATOR_NOT_AVAILABLE error Jul 16, 2021

dajac approved these changes Jul 16, 2021

View reviewed changes

dajac merged commit 4fd6d2b into apache:trunk Jul 16, 2021

                   }
-              }
+              }

KAFKA-13064: Make ListConsumerGroupOffsetsHandler unmap for COORDINATOR_NOT_AVAILABLE error #11026

KAFKA-13064: Make ListConsumerGroupOffsetsHandler unmap for COORDINATOR_NOT_AVAILABLE error #11026

Uh oh!

Conversation

showuon commented Jul 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

showuon commented Jul 12, 2021

Uh oh!

dajac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dajac left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

showuon commented Jul 16, 2021

Uh oh!

dajac left a comment

Choose a reason for hiding this comment

Uh oh!

dajac commented Jul 16, 2021

Uh oh!

dajac commented Jul 16, 2021

Uh oh!

dajac commented Jul 16, 2021

Uh oh!

showuon commented Jul 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

showuon commented Jul 12, 2021 •

edited

Loading