-
Notifications
You must be signed in to change notification settings - Fork 14.9k
KAFKA-13062: Make DeleteConsumerGroupsHandler unmap for COORDINATOR_NOT_AVAILABLE error #11021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@dajac , please take a look. Thanks. |
| } | ||
| return new ApiResult<>(completed, failed, unmapped); | ||
|
|
||
| if (groupsToUnmap.isEmpty() && groupsToRetry.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems incorrect to do this here. We were able to do so in the other because they were expecting only one group at the time. This one is different. The driver will retry if the group is not completed nor failed. It seems to me that we could keep the existing code, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! Updated.
| case INVALID_GROUP_ID: | ||
| case NON_EMPTY_GROUP: | ||
| case GROUP_ID_NOT_FOUND: | ||
| log.error("Received non retriable failure for group {} in `{}` response", groupId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also try to uniformize the logs and would use debug all the time except for the unexpected errors.
| } | ||
|
|
||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could we revert this?
dajac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@showuon Thanks for the update. I left few minor comments and one question.
In the end, the handling of COORDINATOR_NOT_AVAILABLE is the only main difference in this PR. Should we reflect this in the title perhaps?
| case INVALID_GROUP_ID: | ||
| case NON_EMPTY_GROUP: | ||
| case GROUP_ID_NOT_FOUND: | ||
| log.debug("`DeleteConsumerGroups` request for group id {} failed due to error {}", groupId, error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We should use groupId.idValue here and in the others.
| case COORDINATOR_LOAD_IN_PROGRESS: | ||
| case COORDINATOR_NOT_AVAILABLE: | ||
| // If the coordinator is in the middle of loading, then we just need to retry | ||
| log.debug("`DeleteConsumerGroups` request for group {} failed because the coordinator " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: group -> group id?
| unmapped.add(groupId); | ||
| // If the coordinator is unavailable or there was a coordinator change, then we unmap | ||
| // the key so that we retry the `FindCoordinator` request | ||
| log.debug("`DeleteConsumerGroups` request for group {} returned error {}. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: group -> group id?
| final DeletableGroupResultCollection errorResponse1 = new DeletableGroupResultCollection(); | ||
| errorResponse1.add(new DeletableGroupResult() | ||
| .setGroupId("groupId") | ||
| .setErrorCode(Errors.COORDINATOR_NOT_AVAILABLE.code()) | ||
| ); | ||
| env.kafkaClient().prepareResponse(new DeleteGroupsResponse( | ||
| new DeleteGroupsResponseData() | ||
| .setResults(errorResponse1))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we moving this to later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is testing "retriable" errors should be retried. Before the change, COORDINATOR_NOT_AVAILABLE is considered as retriable error. But after this PR, it'll considered as unmapped error, so it is moved to later, to test when receiving the error, we should re-find coordinator, and then re-send request.
dajac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Failures are not related: |
…OT_AVAILABLE error (#11021) This patch improve the error handling in `DeleteConsumerGroupsHandler` and ensure that `COORDINATOR_NOT_AVAILABLE` is unmapped in order to look up the coordinator again. Reviewers: David Jacot <djacot@confluent.io>
|
Merged to trunk and to 3.0. cc @kkonstantine |
…OT_AVAILABLE error (apache#11021) This patch improve the error handling in `DeleteConsumerGroupsHandler` and ensure that `COORDINATOR_NOT_AVAILABLE` is unmapped in order to look up the coordinator again. Reviewers: David Jacot <djacot@confluent.io>
Make DeleteConsumerGroupsHandler unmap for COORDINATOR_NOT_AVAILABLE error
old handlResponse logic:
Committer Checklist (excluded from commit message)