-
Notifications
You must be signed in to change notification settings - Fork 14.9k
KAFKA-13064: Make ListConsumerGroupOffsetsHandler unmap for COORDINATOR_NOT_AVAILABLE error #11026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@dajac , please take a look. Thanks. |
dajac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. Thanks for the PR.
.../src/main/java/org/apache/kafka/clients/admin/internals/ListConsumerGroupOffsetsHandler.java
Show resolved
Hide resolved
.../src/main/java/org/apache/kafka/clients/admin/internals/ListConsumerGroupOffsetsHandler.java
Outdated
Show resolved
Hide resolved
|
|
||
| if (groupsToUnmap.isEmpty() && groupsToRetry.isEmpty()) { | ||
| return new ApiResult<>( | ||
| completed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could get rid of completed and use Collections.singletonMap(groupId, groupOffsetsListing), no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we can't do that because the completed here could be empty map. If we put Collections.singletonMap(groupId, groupOffsetsListing), it'll always not empty. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@showuon I think that there is a case that we don't handle correctly.
Imagine that GROUP_AUTHORIZATION_FAILED is returned as a partition error. In this case, we ignore it in handlePartitionError and therefore don't add the failed group to failed. I think that we should also handle all the group level errors in handlePartitionError.
The second thing is that if there is a group failure, we should not add the group to completed at L131. Otherwise, this will complete the group future with an empty list.
Could you check this out and add a test for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! Will do it tomorrow (my time). Thanks.
| } | ||
|
|
||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could we bring this back?
.../src/main/java/org/apache/kafka/clients/admin/internals/ListConsumerGroupOffsetsHandler.java
Outdated
Show resolved
Hide resolved
dajac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@showuon Thanks for the update. I left a few more comments.
|
|
||
| if (groupsToUnmap.isEmpty() && groupsToRetry.isEmpty()) { | ||
| return new ApiResult<>( | ||
| completed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@showuon I think that there is a case that we don't handle correctly.
Imagine that GROUP_AUTHORIZATION_FAILED is returned as a partition error. In this case, we ignore it in handlePartitionError and therefore don't add the failed group to failed. I think that we should also handle all the group level errors in handlePartitionError.
The second thing is that if there is a group failure, we should not add the group to completed at L131. Otherwise, this will complete the group future with an empty list.
Could you check this out and add a test for it?
| switch (error) { | ||
| case COORDINATOR_LOAD_IN_PROGRESS: | ||
| // If the coordinator is in the middle of loading, then we just need to retry | ||
| log.debug("`{}` request for group {} failed because the coordinator " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also update the log messages here and below to follow what you did in handleGroupError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I forgot the partitionError section. Will do.
.../src/main/java/org/apache/kafka/clients/admin/internals/ListConsumerGroupOffsetsHandler.java
Outdated
Show resolved
Hide resolved
| final String unexpectedErrorMsg = | ||
| String.format("`OffsetFetch` request for group id %s failed due to error %s", groupId.idValue, error); | ||
| log.error(unexpectedErrorMsg); | ||
| failed.put(groupId, error.exception(unexpectedErrorMsg)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also remove providing the error message here like we did for the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks.
…ll group level errors
|
Failed tests are unrelated, thanks. |
dajac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Failures are not related: |
…OR_NOT_AVAILABLE error (#11026) This patch improve the error handling in `ListConsumerGroupOffsetsHandler` and ensures that `COORDINATOR_NOT_AVAILABLE` is unmapped in order to look up the coordinator again. Reviewers: David Jacot <djacot@confluent.io>
|
Merged to trunk and 3.0. |
|
@showuon Thanks for the patches. Could you update the description of this PR and the others to ensure that the description reflects the changes? |
|
@dajac , all checked and updated. Thank you very much for your patiently review all these PRs! After these update, we are more confident in these new handlers. :) |
…OR_NOT_AVAILABLE error (apache#11026) This patch improve the error handling in `ListConsumerGroupOffsetsHandler` and ensures that `COORDINATOR_NOT_AVAILABLE` is unmapped in order to look up the coordinator again. Reviewers: David Jacot <djacot@confluent.io>
Make ListConsumerGroupOffsetsHandler unmap for COORDINATOR_NOT_AVAILABLE error
This is the old handle response logic. FYR:
Committer Checklist (excluded from commit message)