Skip to content

KAFKA-13916; Fenced replicas should not be allowed to join the ISR in KRaft (KIP-841, Part 1)#12240

Merged
jsancio merged 19 commits intoapache:trunkfrom
dajac:KAFKA-13916-metadata-records
Jun 7, 2022
Merged

KAFKA-13916; Fenced replicas should not be allowed to join the ISR in KRaft (KIP-841, Part 1)#12240
jsancio merged 19 commits intoapache:trunkfrom
dajac:KAFKA-13916-metadata-records

Conversation

@dajac
Copy link
Member

@dajac dajac commented Jun 2, 2022

This PR implements a first part of KIP-841. Specifically, it implements the following:

  • Adds a new metadata version.
  • Adds the InControlledShutdown field to the BrokerRegistrationRecord and BrokerRegistrationChangeRecord and bump their versions. The newest versions are only used if the new metadata version is enabled.
  • Writes a BrokerRegistrationChangeRecord with InControlledShutdown set when a broker requests a controlled shutdown.
  • Ensures that fenced and in controlled shutdown replicas are not picked as leaders nor included in the ISR.
  • Adds or extends unit tests.

The second half (the AlterPartition part) is implemented in #12181.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@dajac dajac requested a review from jsancio June 2, 2022 15:47
Copy link
Member

@jsancio jsancio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @dajac. Review of src/main. only skimmed src/test.

"type": "metadata",
"name": "BrokerRegistrationChangeRecord",
"validVersions": "0",
"validVersions": "0-1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This version bump is not strictly required since we are protecting the tagged field with the metadata version.

For example, in PartitionChangeRecord we added the tagged field LeaderRecoveryState without changing the version.

@mumrah @cmccabe Do you have an opinion on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually followed @cmccabe's suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I am not sure where or how to document this decision but it would be good to be consistent across change type records.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think we should not increase the record version unless there is an incompatible change. To my knowledge, that is what the record version has conveyed historically. However, I don't think there's any harm in increasing it, and I also don't think we need to solve it in this PR. Let's go with Colin's suggestion here and we can continue this discussion on the mailing list or offline

@dajac
Copy link
Member Author

dajac commented Jun 6, 2022

@jsancio Thanks for your review. I have addressed or replied to your comments.

@dajac dajac force-pushed the KAFKA-13916-metadata-records branch from b8caf3a to 2761eef Compare June 6, 2022 11:01
Comment on lines 96 to 107
changedBrokers.put(record.id(), Optional.of(broker.cloneWithFencing(true)));
changedBrokers.put(record.id(), broker.maybeCloneWith(
BrokerRegistrationFencingChange.UNFENCE.asBoolean(),
Optional.empty()
));
}

public void replay(UnfenceBrokerRecord record) {
BrokerRegistration broker = getBrokerOrThrow(record.id(), record.epoch(), "unfence");
changedBrokers.put(record.id(), Optional.of(broker.cloneWithFencing(false)));
changedBrokers.put(record.id(), broker.maybeCloneWith(
BrokerRegistrationFencingChange.FENCE.asBoolean(),
Optional.empty()
));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an inflight change that changes these enum values around. We need to be careful in the order we merge this PRs: #12236 (comment)

Comment on lines +127 to +129
if (metadataVersion.equals(MetadataVersion.UNINITIALIZED)) {
metadataVersion = MetadataVersion.IBP_3_0_IV1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a similar change for the ClusterControlManager iterator. If I am not mistaken it is possible for the FeatureControlManager to return UNINITIALIZED in both active controllers and inactive controllers.

Should we document this default version in MetadataVersion instead?

cc @mumrah

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mumrah is removing UNINITIALIZED in #12250. We can remove this logic afterwards.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, once that PR is merged, KRaft will be at metadata version IBP_3_0_IV1 implicitly until the controller finishes bootstrapping. This will be true on the controller and broker side of things

* Returns true if the broker is in fenced state; Returns false if it is
* not or if it does not exist.
*/
public boolean unfenced(int brokerId) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but I am pretty sure that we don't use this method anymore in src/main. If so, I say that we just remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. However, it is used in many places in the tests. I haven't found a good way to replace it in tests that is as convenient as this predicate. I would keep it.

Copy link
Member

@dengziming dengziming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, only one minor suggestion.

.collect(Collectors.toSet());
}

private short registerBrokerRecordVersion() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving this method to MetadataVersion? I find this logic also used in BrokerRegistration.toRecord

record,
record.id(),
record.epoch(),
BrokerRegistrationFencingChange.UNFENCE.asBoolean(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to Jose mentioned, this should be noticed if #12236 merged first.

Copy link
Member

@jsancio jsancio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Test failures seem unrelated.

@jsancio jsancio merged commit 151ca12 into apache:trunk Jun 7, 2022
@dajac dajac deleted the KAFKA-13916-metadata-records branch June 7, 2022 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants