Skip to content

KAFKA-14493: Introduce Zk to KRaft migration state machine STUBs in KRaft controller.#12998

Merged
cmccabe merged 41 commits intoapache:trunkfrom
akhileshchg:migration_driver_stub
Jan 9, 2023
Merged

KAFKA-14493: Introduce Zk to KRaft migration state machine STUBs in KRaft controller.#12998
cmccabe merged 41 commits intoapache:trunkfrom
akhileshchg:migration_driver_stub

Conversation

@akhileshchg
Copy link
Contributor

KAFKA-14493: Introduce Zk to KRaft migration state machine STUBs in KRaft controller.

This patch introduces a preliminary state machine that can be used by the KRaft controller to drive online migration from Zk to KRaft.

MigrationState -- Defines the states we can have during migration from Zk to KRaft.

KRaftMigrationDriver -- Defines the state transitions and events to handle actions like controller change, metadata change, and broker change and has interfaces through which it claims Zk controllership, performs zk writes, and sends RPCs to ZkBrokers.

MigrationClient -- Interface that defines the functions used to claim and relinquish Zk controllership, read to and write from Zk.

BrokersRpcClient -- Interface that defines the functions used to send RPCs to Zk brokers.

akhileshchg and others added 6 commits December 14, 2022 18:29
…Raft controller.

This patch introduces a preliminary state machine that can be used by KRaft
controller to drive online migration from Zk to KRaft.

MigrationState -- Defines the states we can have while migration from Zk to
KRaft.

KRaftMigrationDriver -- Defines the state transitions, and events to handle
actions like controller change, metadata change, broker change and have
interfaces through which it claims Zk controllership, performs zk writes and
sends RPCs to ZkBrokers.

MigrationClient -- Interface that defines the functions used to claim and
relinquish Zk controllership, read to and write from Zk.

BrokersRpcClient -- Interface that defines the functions used to send RPCs to
Zk brokers.
…ot generator.

This PR introduces the new metadata loader and snapshot generator. For the time being, they are
only used by the controller, but a PR for the broker will come soon.

The new metadata loader supports adding and removing publishers dynamically. (In contrast, the old
loader only supported adding a single publisher.) It also passes along more information about each
new image that is published. This information can be found in the LogDeltaManifest and
SnapshotManifest classes.

The new snapshot generator replaces the previous logic for generating snapshots in
QuorumController.java and associated classes. The new generator is intended to be shared between
the broker and the controller, so it is decoupled from both.

There are a few small changes to the old snapshot generator in this PR. Specifically, we move the
batch processing time and batch size metrics out of BrokerMetadataListener.scala and into
BrokerServerMetrics.scala.
Conflicts:
	core/src/main/scala/kafka/server/ControllerServer.scala
	core/src/main/scala/kafka/server/SharedServer.scala
	metadata/src/main/java/org/apache/kafka/image/loader/MetadataLoader.java
	metadata/src/main/java/org/apache/kafka/image/publisher/MetadataPublisher.java
Comment on lines 95 to 99
Set<Integer> zkRegisteredZkBrokers = zkMigrationClient.readBrokerIds();
zkRegisteredZkBrokers.removeAll(kraftRegisteredZkBrokers);
if (zkRegisteredZkBrokers.isEmpty()) {
return true;
} else {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot about the approach we decided to take. Remind me again, are we not looking at topic assignments at this point? I thought that is more decisive regarding the zkBrokers which need to register with the kraft controller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tenatively adding the change. We can remove it if it is not required.

akhileshchg and others added 22 commits December 15, 2022 18:06
Still an issue with broker epochs in UMR from KRaft
Conflicts:
	clients/src/main/resources/common/message/BrokerRegistrationRequest.json
	core/src/main/scala/kafka/server/BrokerLifecycleManager.scala
	core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala
	core/src/main/scala/kafka/server/KafkaServer.scala
	metadata/src/main/java/org/apache/kafka/metadata/BrokerRegistration.java
	metadata/src/main/resources/common/metadata/RegisterBrokerRecord.json
Conflicts:
core/src/main/scala/kafka/cluster/Partition.scala
core/src/main/scala/kafka/common/RecordValidationException.scala
core/src/main/scala/kafka/log/LocalLog.scala
core/src/main/scala/kafka/log/LogLoader.scala
core/src/main/scala/kafka/log/LogSegment.scala
core/src/main/scala/kafka/log/ProducerStateManager.scala
core/src/main/scala/kafka/log/UnifiedLog.scala
core/src/main/scala/kafka/server/ReplicaManager.scala
core/src/main/scala/kafka/tools/DumpLogSegments.scala
core/src/main/scala/kafka/utils/CoreUtils.scala
core/src/test/scala/unit/kafka/coordinator/group/GroupMetadataManagerTest.scala
core/src/test/scala/unit/kafka/coordinator/transaction/TransactionStateManagerTest.scala
core/src/test/scala/unit/kafka/log/LocalLogTest.scala
core/src/test/scala/unit/kafka/log/LogLoaderTest.scala
core/src/test/scala/unit/kafka/log/ProducerStateManagerTest.scala
core/src/test/scala/unit/kafka/log/UnifiedLogTest.scala
core/src/test/scala/unit/kafka/server/IsrExpirationTest.scala
core/src/test/scala/unit/kafka/server/ReplicaManagerQuotasTest.scala
core/src/test/scala/unit/kafka/server/ReplicaManagerTest.scala
core/src/test/scala/unit/kafka/utils/CoreUtilsTest.scala
jmh-benchmarks/src/main/java/org/apache/kafka/jmh/partition/UpdateFollowerFetchStateBenchmark.java
metadata/src/main/java/org/apache/kafka/metadata/migration/ZkRecordConsumer.java
storage/src/main/java/org/apache/kafka/server/log/internals/CorruptSnapshotException.java
mumrah and others added 4 commits January 5, 2023 12:49
…gration_driver_stub

Conflicts:
	core/src/main/scala/kafka/migration/KRaftControllerToZkBrokersRpcClient.scala
Copy link
Contributor

@cmccabe cmccabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending jenkins

@cmccabe cmccabe marked this pull request as ready for review January 5, 2023 22:01
@cmccabe cmccabe merged commit db49070 into apache:trunk Jan 9, 2023
cmccabe pushed a commit that referenced this pull request Jan 9, 2023
…Raft controller. (#12998)

This patch introduces a preliminary state machine that can be used by KRaft
controller to drive online migration from Zk to KRaft.

MigrationState -- Defines the states we can have while migration from Zk to
KRaft.

KRaftMigrationDriver -- Defines the state transitions, and events to handle
actions like controller change, metadata change, broker change and have
interfaces through which it claims Zk controllership, performs zk writes and
sends RPCs to ZkBrokers.

MigrationClient -- Interface that defines the functions used to claim and
relinquish Zk controllership, read to and write from Zk.

Co-authored-by: David Arthur <mumrah@gmail.com>
Reviewers: Colin P. McCabe <cmccabe@apache.org>
@rishiraj88
Copy link

Thanks

mumrah added a commit that referenced this pull request Jan 11, 2023
With the new broker epoch validation logic introduced in #12998, we no longer need the ZK broker epoch to be sent to the KRaft controller. This patch removes that epoch and replaces it with a boolean.

Another small fix is included in this patch for controlled shutdown in migration mode. Previously, if a ZK broker was in migration mode, it would always try to do controlled shutdown via BrokerLifecycleManager. Since there is no ordering dependency between bringing up ZK brokers and the KRaft quorum during migration, a ZK broker could be running in migration mode, but talking to a ZK controller. A small check was added to see if the current controller is ZK or KRaft before decided which controlled shutdown to attempt.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
mumrah added a commit that referenced this pull request Jan 11, 2023
With the new broker epoch validation logic introduced in #12998, we no longer need the ZK broker epoch to be sent to the KRaft controller. This patch removes that epoch and replaces it with a boolean.

Another small fix is included in this patch for controlled shutdown in migration mode. Previously, if a ZK broker was in migration mode, it would always try to do controlled shutdown via BrokerLifecycleManager. Since there is no ordering dependency between bringing up ZK brokers and the KRaft quorum during migration, a ZK broker could be running in migration mode, but talking to a ZK controller. A small check was added to see if the current controller is ZK or KRaft before decided which controlled shutdown to attempt.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
ijuma added a commit to fvaleri/kafka that referenced this pull request Jan 13, 2023
* apache-github/trunk:
  KAFKA-14601: Improve exception handling in KafkaEventQueue apache#13089
  KAFKA-14367; Add `OffsetCommit` to the new `GroupCoordinator` interface (apache#12886)
  KAFKA-14530: Check state updater more often (apache#13017)
  KAFKA-14304 Use boolean for ZK migrating brokers in RPC/record (apache#13103)
  KAFKA-14003 Kafka Streams JUnit4 to JUnit5 migration part 2 (apache#12301)
  KAFKA-14607: Move Scheduler/KafkaScheduler to server-common (apache#13092)
  KAFKA-14367; Add `OffsetFetch` to the new `GroupCoordinator` interface (apache#12870)
  KAFKA-14557; Lock metadata log dir (apache#13058)
  MINOR: Implement toString method for TopicAssignment and PartitionAssignment (apache#13101)
  KAFKA-12558: Do not prematurely mutate internal partition state in Mirror Maker 2 (apache#11818)
  KAFKA-14540: Fix DataOutputStreamWritable#writeByteBuffer (apache#13032)
  KAFKA-14600: Reduce flakiness in ProducerIdExpirationTest (apache#13087)
  KAFKA-14279: Add 3.3.x streams system tests (apache#13077)
  MINOR: bump streams quickstart pom versions and add to list in gradle.properties (apache#13064)
  MINOR: Update KRaft cluster upgrade documentation for 3.4 (apache#13063)
  KAFKA-14493: Introduce Zk to KRaft migration state machine STUBs in KRaft controller. (apache#12998)
  KAFKA-14570: Fix parenthesis in verifyFullFetchResponsePartitions output (apache#13072)
  MINOR: Remove public mutable fields from ProducerAppendInfo (apache#13091)
guozhangwang pushed a commit to guozhangwang/kafka that referenced this pull request Jan 25, 2023
…Raft controller. (apache#12998)

This patch introduces a preliminary state machine that can be used by KRaft
controller to drive online migration from Zk to KRaft.

MigrationState -- Defines the states we can have while migration from Zk to
KRaft.

KRaftMigrationDriver -- Defines the state transitions, and events to handle
actions like controller change, metadata change, broker change and have
interfaces through which it claims Zk controllership, performs zk writes and
sends RPCs to ZkBrokers.

MigrationClient -- Interface that defines the functions used to claim and
relinquish Zk controllership, read to and write from Zk.

Co-authored-by: David Arthur <mumrah@gmail.com>
Reviewers: Colin P. McCabe <cmccabe@apache.org>
guozhangwang pushed a commit to guozhangwang/kafka that referenced this pull request Jan 25, 2023
…e#13103)

With the new broker epoch validation logic introduced in apache#12998, we no longer need the ZK broker epoch to be sent to the KRaft controller. This patch removes that epoch and replaces it with a boolean.

Another small fix is included in this patch for controlled shutdown in migration mode. Previously, if a ZK broker was in migration mode, it would always try to do controlled shutdown via BrokerLifecycleManager. Since there is no ordering dependency between bringing up ZK brokers and the KRaft quorum during migration, a ZK broker could be running in migration mode, but talking to a ZK controller. A small check was added to see if the current controller is ZK or KRaft before decided which controlled shutdown to attempt.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants