KAFKA-13419: Only reset generation ID when ILLEGAL_GENERATION error#11451
KAFKA-13419: Only reset generation ID when ILLEGAL_GENERATION error#11451guozhangwang merged 9 commits intoapache:trunkfrom
Conversation
|
@ableegoldman @guozhangwang @dajac @hachikuji , do you think this change make sense? Thanks. |
guozhangwang
left a comment
There was a problem hiding this comment.
Hello @showuon thanks for reporting this issue. Just for my understanding: in join-group request protocol, we do not set the generation id, the generation id is only set as part of the protocol user data when sticky assignor is used. Only resetting the generation at the coordinator layer would not change what user data would be serialized and sent to the brokers. Am I missing something?
|
@guozhangwang , thanks for your comment. Answer your question below.
--> The point here is, when reset state and if (generation == Generation.NO_GENERATION.generationId &&
memberId.equals(Generation.NO_GENERATION.memberId)) {
revokedPartitions = new HashSet<>(subscriptions.assignedPartitions());
if (!revokedPartitions.isEmpty()) {
log.info("Giving away all assigned partitions as lost since generation has been reset," +
"indicating that consumer is no longer part of the group");
exception = invokePartitionsLost(revokedPartitions);
subscriptions.assignFromSubscribed(Collections.emptySet());
}
} That is, when we got Does that make sense? |
|
Hi @showuon I think we were referring to different things :) There are different places where we encode the assigned partitions as well as generation:
My above comment is referring to 2) above, which is only set upon Maybe you are trying to fix 1) above only, in which case that's also fine, but do you feel 2) is also an issue that needs to be fixed separately? |
|
@guozhangwang , that's a good point. Yes, I was focusing on the fix 1) above only. For 2), yes, we should also fix that, but I need some time to think a good way to fix that issue. And then open a jira ticket for it. However, for 1) only, we can fix the issue in I'll continue to add tests for this PR. Thank you! |
|
@showuon Sounds good, let's just focus on 1) here then. The proposed fix looks reasonable to me. |
guozhangwang
left a comment
There was a problem hiding this comment.
@showuon Could you check if the failed tests are relevant? I could re-trigger it if you think they are not.
|
@guozhangwang , no, no need to retrigger it. I found some slow rebalance issue after this change, an have a proposal for that. I should be able to finish the KIP today (my time). I'll let you know. Thank you. |
|
@guozhangwang , when investigating the broken tests, I found my change will cause the "normal rebalance" slower. Here's why:
It looks great. But after my change in this PR, it'll become (the change is highlighted in bold)
That's why this change causes the rebalance slower. We can explicitly leave group when sync group with === Therefore, I think we should add an additional field So, back to your original comment about 2 places to fix:
Well, KIP-792 is still focusing on 1) above. For 2), I've thought about it for some days, and I think we can ignore it, because in stickyAssignor (not cooperative one), we put both Thank you. |
|
Hi @showuon I think I agree with you that, if we are going to encode both As for this specific case, I'm actually thinking that we could consider having a slight different version of
When we add the generation id to the join group protocol, it means the response could also include UNKNOWN_MEMBER_ID as well:
Now back to your original question:
|
|
Also cc @dajac @hachikuji who're working on improving the general rebalance protocol here. |
|
@guozhangwang , thanks for the comments and clear explanation.
I think you're trying to say I've updated the KIP to add more detailed implementation you suggested. And for Thank you. |
| if (!generationSnapshot.equals(Generation.NO_GENERATION) && stateSnapshot == MemberState.STABLE) { | ||
| if ((generationSnapshot.generationId != Generation.NO_GENERATION.generationId || | ||
| !generationSnapshot.memberId.equals(Generation.NO_GENERATION.memberId)) && | ||
| stateSnapshot == MemberState.STABLE) { |
There was a problem hiding this comment.
Before this change, we'll always reset generation object to NO_GENERATION, but now, we'll have some cases that only reset generation ID. This check is after rebalance complete, consumer should have a valid generation ID in this moment, so, no generation ID (i.e. -1) also means the consumer needs to rejoin group. Change the if condition here.
There was a problem hiding this comment.
Is the || in this condition correct? I thought that we would consider the rebalance successful only if we have a valid generation and a valid member id. Am I missing something?
| .setAssignments(Collections.emptyList()) | ||
| ); | ||
| log.debug("Sending follower SyncGroup to coordinator {} at generation {}: {}", this.coordinator, this.generation, requestBuilder); | ||
| log.debug("Sending follower SyncGroup to coordinator {}: {}", this.coordinator, requestBuilder); |
There was a problem hiding this comment.
requestBuilder already log the generation info. So, remove it.
Before this change, log is like below (with duplicated generation info output):
Sending leader SyncGroup to coordinator localhost:55644 (id: 2147483647 rack: null) at generation Generation{generationId=2, memberId='consumer-test.group-15-e79d4f58-f8cc-4f98-897d-f711fb3385d8', protocol='range'}: SyncGroupRequestData(groupId='test.group', generationId=2, memberId='consumer-test.group-15-e79d4f58-f8cc-4f98-897d-f711fb3385d8', groupInstanceId=null, protocolType='consumer', protocolName='range', assignments=[SyncGroupRequestAssignment(memberId='consumer-test.group-16-ba76c722-d177-4cad-8251-2e7ece935e7d', assignment=[0, 1, 0, 0, 0, 0, -1, -1, -1, -1]), SyncGroupRequestAssignment(memberId='consumer-test.group-15-e79d4f58-f8cc-4f98-897d-f711fb3385d8', assignment=[0, 1, 0, 0, 0, 1, 0, 3, 102, 111, 111, 0, 0, 0, 1, 0, 0, 0, 0, -1, -1, -1, -1])])
| if (!generation.equals(Generation.NO_GENERATION) && state == MemberState.COMPLETING_REBALANCE) { | ||
| // check protocol name only if the generation is not reset | ||
| if (generation.protocolName != null && state == MemberState.COMPLETING_REBALANCE) { | ||
| // check protocol name only if the generation is not reset (protocol name is not null) |
There was a problem hiding this comment.
We only care about the protocolName here, so only check protocolName not null
There was a problem hiding this comment.
I think that the intend was to validate protocolName only when the generation was not reset. It seems that we are changing this here. Why?
There was a problem hiding this comment.
You are right, updated and explained below. Thanks.
| if (generation == Generation.NO_GENERATION.generationId && | ||
| if (generation == Generation.NO_GENERATION.generationId || | ||
| memberId.equals(Generation.NO_GENERATION.memberId)) { | ||
| revokedPartitions = new HashSet<>(subscriptions.assignedPartitions()); | ||
|
|
||
| if (!revokedPartitions.isEmpty()) { | ||
| log.info("Giving away all assigned partitions as lost since generation has been reset," + | ||
| "indicating that consumer is no longer part of the group"); | ||
| log.info("Giving away all assigned partitions as lost since generation/memberID has been reset," + | ||
| "indicating that consumer is in old state or no longer part of the group"); |
There was a problem hiding this comment.
After this change, either no generation ID or no member ID, we'll clear all their ownedPartitions since they are out-of-date.
| if (subscriptions.hasAutoAssignedPartitions() && !droppedPartitions.isEmpty()) { | ||
| final Exception e; | ||
| if (generation() == Generation.NO_GENERATION || rebalanceInProgress()) { | ||
| if (currentGeneration.equals(Generation.NO_GENERATION) || rebalanceInProgress()) { |
There was a problem hiding this comment.
We should use the currentGeneration snapshot to do the check. Otherwise, user might see the unexpected callback got called when comparing the log. Also, we should compare with equals here.
There was a problem hiding this comment.
I am not sure about this one. Is it correct to compare to Generation.NO_GENERATION here or do we need to compare to the generationId?
There was a problem hiding this comment.
David, you're right! I was focusing on fixing the == error here. Yes, we should be consistent with onJoinPrepare here, to invoke PartitionsLost when
generation == Generation.NO_GENERATION.generationId ||
memberId.equals(Generation.NO_GENERATION.memberId)
Otherwise, invoke PartitionsRevoked.
I'll update it later. Thank you.
| watchers.tryCompleteWatched() | ||
| debug(s"Request key $key unblocked $numCompleted $purgatoryName operations") | ||
| if (numCompleted > 0) { | ||
| debug(s"Request key $key unblocked $numCompleted $purgatoryName operations") |
There was a problem hiding this comment.
Before this change, we'll see many useless logs like this:
DEBUG Request key GroupJoinKey(test.group) unblocked 0 Rebalance operations
We should log when numCompleted > 0.
|
@guozhangwang , tests added. Please take a look when available. Thank you. |
guozhangwang
left a comment
There was a problem hiding this comment.
Thanks @showuon . I think the changes lgtm overall.
I want to note that, when we do the threading factoring where only the hb thread would do the network work, then a lot of these logic would be much simplified. More specifically (cc @dajac again):
- We do not need to check if generation has been reset concurrently since only one thread would be doing hb along with join/sync-group requests, when the thread is doing rebalance, it would not do hb any more.
- Illegal generation would then be very similar to unknown member id in hb response, since if the former is received, if means the member missed the most recent rebalance that bumps up the rebalance, and since that member does not participate in that rebalance the member.id should have been kicked out of the group (or someone else joins with the same member.id, but in either way this member's member.id would no longer be valid), so it's actually okay to always reset the member.id as well.
| final Generation currentGeneration = generation(); | ||
| final String memberId = currentGeneration.memberId; | ||
|
|
||
| log.debug("Executing onLeavePrepare with generation {} and memberId {}", currentGeneration, memberId); |
There was a problem hiding this comment.
What's the rationale of removing the member id in logging?
There was a problem hiding this comment.
It's because the memberId info is already included in the generation info. This is the log output currently:
Executing onLeavePrepare with generation Generation{generationId=1, memberId='consumer1', protocol='range'} and memberId consumer1
Sorry, I should have mentioned it to make it clear.
| if (subscriptions.hasAutoAssignedPartitions() && !droppedPartitions.isEmpty()) { | ||
| final Exception e; | ||
| if (generation() == Generation.NO_GENERATION || rebalanceInProgress()) { | ||
| if (currentGeneration.equals(Generation.NO_GENERATION) || rebalanceInProgress()) { |
| // then retry immediately | ||
| if (generationUnchanged()) | ||
| resetGenerationOnResponseError(ApiKeys.JOIN_GROUP, error); | ||
| resetGenerationOnResponseError(ApiKeys.JOIN_GROUP, error, true); |
There was a problem hiding this comment.
nit: maybe it's now better to rename this function, to resetStateOnResponseError?
|
@dajac , thanks for your comments. Yes, you are right, I didn't make the if condition correct. I've updated it to use a private boolean hasGenerationReset(Generation gen) {
// the member ID might not be reset for ILLEGAL_GENERATION error, so only check generationID and protocol name here
return gen.generationId == Generation.NO_GENERATION.generationId && gen.protocolName == null;
}Before this change, we can just do Thank you. |
|
@dajac , please have a 2nd review when available. Thank you. |
|
I made another pass and it LGTM. @dajac do you want to make another pass? |
|
I will take another look tomorrow. Sorry for the delay. |
No problem, David! :) |
dajac
left a comment
There was a problem hiding this comment.
Overall, the PR LGTM. I left two clarification questions. @guozhangwang Could you double check them? Feel free to merge if my questions are irrelevant.
| if (subscriptions.hasAutoAssignedPartitions() && !droppedPartitions.isEmpty()) { | ||
| final Exception e; | ||
| if (generation() == Generation.NO_GENERATION || rebalanceInProgress()) { | ||
| if (currentGeneration.equals(Generation.NO_GENERATION) || rebalanceInProgress()) { |
There was a problem hiding this comment.
I am not sure about this one. Is it correct to compare to Generation.NO_GENERATION here or do we need to compare to the generationId?
| Exception exception = null; | ||
| final Set<TopicPartition> revokedPartitions; | ||
| if (generation == Generation.NO_GENERATION.generationId && | ||
| if (generation == Generation.NO_GENERATION.generationId || |
There was a problem hiding this comment.
Is || memberId.equals(Generation.NO_GENERATION.memberId) really necessary? My understanding is that a reset memberId implies that generationId was also reset. I guess that it does not hurt to have it.
There was a problem hiding this comment.
Yes, I agree that it doesn't hurt to have it. Thank you.
…pache#11451) Updated: This PR will reset generation ID when ILLEGAL_GENERATION error since the member ID is still valid. ===== resetStateAndRejoin when REBALANCE_IN_PROGRESS error in sync group, to avoid out-of-date ownedPartition == JIRA description == In KAFKA-13406, we found there's user got stuck when in rebalancing with cooperative sticky assignor. The reason is the "ownedPartition" is out-of-date, and it failed the cooperative assignment validation. Investigate deeper, I found the root cause is we didn't reset generation and state after sync group fail. In KAFKA-12983, we fixed the issue that the onJoinPrepare is not called in resetStateAndRejoin method. And it causes the ownedPartition not get cleared. But there's another case that the ownedPartition will be out-of-date. Here's the example: consumer A joined and synced group successfully with generation 1 New rebalance started with generation 2, consumer A joined successfully, but somehow, consumer A doesn't send out sync group immediately other consumer completed sync group successfully in generation 2, except consumer A. After consumer A send out sync group, the new rebalance start, with generation 3. So consumer A got REBALANCE_IN_PROGRESS error with sync group response When receiving REBALANCE_IN_PROGRESS, we re-join the group, with generation 3, with the assignment (ownedPartition) in generation 1. So, now, we have out-of-date ownedPartition sent, with unexpected results happened We might want to do resetStateAndRejoin when RebalanceInProgressException errors happend in sync group. Because when we got sync group error, it means, join group passed, and other consumers (and the leader) might already completed this round of rebalance. The assignment distribution this consumer have is already out-of-date. Reviewers: David Jacot <djacot@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
|
@showuon we are facing the same issue, and i want to ask for the original 2 places to fix:
In this change you want to fix the 1) one, but found that it will cause rebalance wait for session timeout, because memberId reset. Seems only reset geneartionId when SyncGroupReqeust got RebalanceInProcess error can be a workaround for this.( i did some test and it works) Thanks a lot~~ |
@aiquestion , seems you are right. I forgot to reset the generation ID when SyncGroupReqeust got RebalanceInProcess error. Are you interested in submitting a PR for it? But one thing to clarify, after KAFKA-12984, KAFKA-13406 got fixed, even if the consumer joined with out-of-date ownedPartition, it won't cause the rebalancing stuck issue. So, the fix is just to allow the consumer lead has a correct version of consumer ownedPartitions to do partition assignment. Thank you. |
|
Thanks for reply. Will try to submit a PR for it. Yes, with KAFKA-12984, KAFKA-13406 rebalance will not stuck.
Another round of rebalance begin and some other consumer C will not be able to syncGroup in time, so the rebalance will goes for many rounds before stable, and there will be dup consuming in the rebalance time. |
Updated: This PR will reset generation ID when
ILLEGAL_GENERATIONerror since the member ID is still valid.=====
resetStateAndRejoinwhenREBALANCE_IN_PROGRESSerror in sync group, to avoid out-of-dateownedPartition== JIRA description ==
In KAFKA-13406, we found there's user got stuck when in rebalancing with cooperative sticky assignor. The reason is the "ownedPartition" is out-of-date, and it failed the cooperative assignment validation.
Investigate deeper, I found the root cause is we didn't reset generation and state after sync group fail. In KAFKA-12983, we fixed the issue that the onJoinPrepare is not called in resetStateAndRejoin method. And it causes the ownedPartition not get cleared. But there's another case that the ownedPartition will be out-of-date. Here's the example:
We might want to do resetStateAndRejoin when RebalanceInProgressException errors happend in sync group. Because when we got sync group error, it means, join group passed, and other consumers (and the leader) might already completed this round of rebalance. The assignment distribution this consumer have is already out-of-date.
Committer Checklist (excluded from commit message)