KAFKA-13114: Revert state and reregister raft listener by jsancio · Pull Request #11116 · apache/kafka

jsancio · 2021-07-23T15:53:45Z

RaftClient's scheduleAppend may split the list of records into multiple
batches. This means that it is possible for the active controller to
see a committed offset for which it doesn't have an in-memory snapshot.

If the active controller needs to renounce and it is missing an
in-memory snapshot, then revert the state and reregister the Raft
listener. This will cause the controller to replay the entire metadata
partition.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

RaftClient's scheduleAppend may split the list of records into multiple batches. This means that it is possible for the active controller to see a committed offset for which it doesn't have an in-memory snapshot. If the active controller needs to renounce and it is missing an in-memory snapshot, then revert the state and register a new listener. This will cause the controller to replay the entire metadata partition.

hachikuji

The change seems reasonable, but should we add test cases?

metadata/src/main/java/org/apache/kafka/controller/QuorumController.java

jsancio

The change seems reasonable, but should we add test cases?

It wasn't clear to me how to do this with the current code but let me see if I can figure something out.

metadata/src/main/java/org/apache/kafka/controller/QuorumController.java

hachikuji · 2021-07-28T01:13:24Z

I'm trying to think of some approach for validating this logic. It is difficult because it is handling unexpected exceptions. One thought I had is implementing a poison message of some kind which could expire after some TTL. When the controller sees the poison message, it would check if it is still active and raise an exception accordingly. Something like that could be used in an integration test, which might be simpler than trying to induce a failure by mucking with internal state.

Another idea is to corrupt the log on one of the nodes, but I'm not sure this would hit the right path. In fact, this is probably a gap at the moment. If the batch reader fails during iteration, we should probably resign and perhaps even fail. I'll file a separate JIRA for this.

In any case, I think we should try to come up with some way to exercise this path. Otherwise it's hard to say if it even works (though it looks reasonable enough).

jsancio

@hachikuji I added a test for this case. I had to update LocalLogManager to better match Raft's leader election pattern. It is still not perfect but it is good enough for this test.

jsancio · 2021-07-29T20:46:50Z

metadata/src/main/java/org/apache/kafka/controller/QuorumController.java

+                    throw new IllegalStateException("The raft client was unable to allocate a buffer for an append");
+                } else if (offset == Long.MAX_VALUE) {
+                    throw new IllegalStateException("Unable to append records since this is not the leader");
+                }


This is a partial fix until we merge #10909

hachikuji · 2021-07-29T22:51:29Z

metadata/src/test/java/org/apache/kafka/metalog/LocalLogManager.java

+            .max();
+
+        if (firstOffset.isPresent() && resignAfterNonAtomicCommit.getAndSet(false)) {
+            // Emulate losing leadering in them middle of a non-atomic append by not writing


nit: losing leadership?

hachikuji · 2021-07-29T22:54:00Z

metadata/src/test/java/org/apache/kafka/controller/QuorumControllerTest.java

+        String topicName = "topic-name";
+
+        try (LocalLogManagerTestEnv logEnv = new LocalLogManagerTestEnv(1, Optional.empty())) {
+            try (QuorumControllerTestEnv controlEnv =


nit: could we pull this into the first try?

Cool. I didn't know that was valid Java.

hachikuji · 2021-07-30T20:44:54Z

metadata/src/test/java/org/apache/kafka/controller/QuorumControllerTest.java

+                );
+
+                // Wait for the new active controller
+                final QuorumController newController = controlEnv.activeController();


This confused me a little bit since we are trying to verify that the state on the original controller resets properly. That is what is happening here since there is only one controller in the test, but it is obscured a little bit by the new variable. Maybe it would be clearer to use the original reference and write this as:

assertEquals(controller, controlEnv.activeController());

Also, is there an epoch or something we can bump to ensure the transition?

Good idea. Fixed.

jsancio · 2021-07-30T22:40:31Z

metadata/src/test/java/org/apache/kafka/controller/QuorumControllerTest.java

+            // Wait for the controller to become active again
+            assertSame(controller, controlEnv.activeController());
+            assertTrue(
+                oldClaimEpoch < controller.curClaimEpoch(),
+                String.format("oldClaimEpoch = %s, newClaimEpoch = %s", oldClaimEpoch, controller.curClaimEpoch())
+            );


Only this should have changed. The rest are indentation changes from the previous commit.

…evert-renounce

hachikuji

LGTM. One minor suggestion.

metadata/src/test/java/org/apache/kafka/metalog/LocalLogManager.java

…r.java Co-authored-by: Jason Gustafson <jason@confluent.io>

RaftClient's scheduleAppend may split the list of records into multiple batches. This means that it is possible for the active controller to see a committed offset for which it doesn't have an in-memory snapshot. If the active controller needs to renounce and it is missing an in-memory snapshot, then revert the state and reregister the Raft listener. This will cause the controller to replay the entire metadata partition. Reviewers: Jason Gustafson <jason@confluent.io>

jsancio force-pushed the kafka-13114-cond-revert-renounce branch from a23e1a3 to 6b70e48 Compare July 23, 2021 22:55

jsancio force-pushed the kafka-13114-cond-revert-renounce branch from 6b70e48 to dda2b3d Compare July 24, 2021 17:27

jsancio marked this pull request as ready for review July 24, 2021 17:28

hachikuji reviewed Jul 26, 2021

View reviewed changes

metadata/src/main/java/org/apache/kafka/controller/QuorumController.java Outdated Show resolved Hide resolved

Remove code duplication

11541a3

jsancio commented Jul 27, 2021

View reviewed changes

metadata/src/main/java/org/apache/kafka/controller/QuorumController.java Outdated Show resolved Hide resolved

Add test that verifies renounce handling

476ae58

jsancio commented Jul 29, 2021

View reviewed changes

hachikuji reviewed Jul 30, 2021

View reviewed changes

Verify that the new active controller is the same

d3becdb

jsancio commented Jul 30, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/trunk' into kafka-13114-cond-r…

633f243

…evert-renounce

hachikuji approved these changes Jul 30, 2021

View reviewed changes

metadata/src/test/java/org/apache/kafka/metalog/LocalLogManager.java Outdated Show resolved Hide resolved

Update metadata/src/test/java/org/apache/kafka/metalog/LocalLogManage…

e8a663d

…r.java Co-authored-by: Jason Gustafson <jason@confluent.io>

hachikuji merged commit 4efd9bf into apache:trunk Aug 1, 2021

jsancio deleted the kafka-13114-cond-revert-renounce branch August 11, 2021 17:24

Conversation

jsancio commented Jul 23, 2021

Committer Checklist (excluded from commit message)

Uh oh!

hachikuji left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jsancio left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hachikuji commented Jul 28, 2021

Uh oh!

jsancio left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hachikuji left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jsancio left a comment •

edited

Loading