[KAFKA-6730] Simplify State Store Recovery by ConcurrencyPractitioner · Pull Request #5013 · apache/kafka

ConcurrencyPractitioner · 2018-05-12T18:41:16Z

No description provided.

ConcurrencyPractitioner · 2018-05-15T22:51:55Z

@mjsax One last push. :)
Thanks for your patience.

mjsax · 2018-05-16T04:22:25Z

\cc @bbejeck @vvcephei

mjsax

Thanks for the update. We are getting close :)

I think, we can still improve updating the end-offsets. Let's see what Guozhang thinks.

mjsax · 2018-05-16T04:25:43Z

...ams/src/test/java/org/apache/kafka/streams/processor/internals/StoreChangelogReaderTest.java

        assertThat(callbackTwo.restored.size(), equalTo(3));
    }

+    @Ignore


I think we can remove the ignored tests completely. \cc @bbejeck

mjsax · 2018-05-16T04:26:38Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java


        final Set<TopicPartition> restoringPartitions = new HashSet<>(needsRestoring.keySet());
        try {
-            final ConsumerRecords<byte[], byte[]> allRecords = poll(restoreConsumer, 10);


Line 73 above: we can remove the whole JavaDoc comment because this method does not throw TaskMigratedException any longer.

mjsax · 2018-05-16T04:28:01Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java

-            final ConsumerRecords<byte[], byte[]> allRecords = poll(restoreConsumer, 10);
-            for (final TopicPartition partition : restoringPartitions) {
-                restorePartition(allRecords, partition, active.restoringTaskFor(partition));
+            if (!needsRestoring.isEmpty()) {


IMHO, we should add a second check to make sure updatedEndOffsets is set only once:

if (!needsRestoring.isEmpty() && updatedEndOffsets == null) {

Additionally, we should set updatedEndOffsets = null in Line 117 below (wondering if this is good enough, or if we need to reset updatedEndOffsets = null somewhere else, too. If we are in the middle of restore, and a rebalance happens, we get new partitions and need to refresh the corresponding endOffsets... \cc @guozhangwang WDYT?

@mjsax What you suggested sounds right to me.

Hi @mjsax . When I added the check as you suggested, I got the following:
org.apache.kafka.streams.processor.internals.StoreChangelogReaderTest > shouldRestorePartitionsRegisteredPostInitialization FAILED
java.lang.NullPointerException
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.processNext(StoreChangelogReader.java:266)
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:92)
at org.apache.kafka.streams.processor.internals.StoreChangelogReaderTest.shouldRestorePartitionsRegisteredPostInitialization(StoreChangelogReaderTest.java:374).

Do you think you know the cause for this failure? I suggested something that might have been the cause in #4901, although that might not precisely be the source of the error.

ConcurrencyPractitioner · 2018-05-17T00:18:31Z

I have found that when you remove the check (updatedEndOffsets == null), then we won't have this problem. Reiterating, I think that we need to continuously update end offsets to resolve the corner case which is tested by the failing unit test.

guozhangwang · 2018-05-17T17:09:07Z

@ConcurrencyPractitioner Could you provide a more detailed reasoning of why the end offsets could be changed after we fetch it at the beginning of the restoration? The goal of https://issues.apache.org/jira/browse/KAFKA-6730 itself is that, since now we call poll() on each main loop during the restoration, if there is a rebalance then it should be detected before the end offset has been incremented by other threads. If you observed it is still not the case from the failing unit tests, we need to reconsider this JIRA.

ConcurrencyPractitioner · 2018-05-17T22:41:31Z

@guozhangwang In the test, I found that end offsets were updated twice for what appears to be two distinct topic partitions. In pure honesty, I do not know myseIf why tests are faiIing. Do you have any ideas?

mjsax · 2018-05-17T23:15:05Z

One thing: we need to set updatedEndOffsets = null in Line 112:

if (needsRestoring.isEmpty()) {
    restoreConsumer.unsubscribe();
    updatedEndOffsets = null; // add this
}

Not sure if this fixed the error.

mjsax · 2018-05-18T16:24:53Z

@ConcurrencyPractitioner I had a look into the test, and the issue is the test setup itself. The mock-consumer setup is split into two parts:

setupConsumer(1, topicPartition);
        consumer.updateEndOffsets(Collections.singletonMap(topicPartition, 10L));
        changelogReader.register(new StateRestorer(topicPartition, restoreListener, null, Long.MAX_VALUE, false, "storeName"));

and later

        setupConsumer(3, postInitialization);
        consumer.updateBeginningOffsets(Collections.singletonMap(postInitialization, 0L));
        consumer.updateEndOffsets(Collections.singletonMap(postInitialization, 3L));

Thus, when restore is called the first time, the mock consumer does not know about the other partition -- the real consumer however would know about the partition -- thus, we need to change the test setup such that the mocked consumer has the information about end-offsets for all partitions from the beginning on.

ConcurrencyPractitioner · 2018-05-19T00:20:12Z

@mjsax Correct me if I am wrong, but here is my current idea of what is happening. When the first restore is called, updatedEndOffsets has only one entry stored in it (topicPartition, 10). Then we set up a second partition. At this point, this topic partition, postInitialization, is added to the consumer along with its respective offsets. However, this is when we encounter a problem. For starters, due to the current design, updatedEndOffsets is called exactly once for every StoreChangelogReader object that is created, hence it could not be updated a second time. And as you mentioned, the mock consumer does not know about the new topic partition that is added to the consumer after the first restore was called. So all we have to do, in my opinion, is to allow updatedEndOffsets to call restoreConsumer#endOffsets if and only if more topic partitions had been added which needs to be restored. I think this way, we will not encounter this problem.

ConcurrencyPractitioner · 2018-05-19T00:21:41Z

Note that with the solution I am proposing, it would allow mock consumer to know about partitions that it had missed.

mjsax · 2018-05-19T22:11:13Z

@ConcurrencyPractitioner You observation sound correct to me. It think, we need to update the endOffsets each time when there is some new partitions to be initilized:

if (!needsInitializing.isEmpty()) {
    initialize();
}

Does this make sense?

ConcurrencyPractitioner · 2018-05-21T23:35:45Z

Hi @mjsax any comments?

mjsax

Build fails with compile error. Can you please rebase this PR?

mjsax · 2018-05-22T01:33:13Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java

+            if (!needsRestoring.isEmpty()) {
+                final Set<TopicPartition> remainingPartitions = new HashSet<>(needsRestoring.keySet());
+                remainingPartitions.removeAll(updatedEndOffsets.keySet());
+                updatedEndOffsets.putAll(restoreConsumer.endOffsets(remainingPartitions));


I think we can move this into the block above line 70 to 72 and simplify:

if (!needsInitializing.isEmpty()) { initialize(); updatedEndOffsets.putAll(restoreConsumer.endOffsets(needsInitializing.keySet())); }

mjsax · 2018-05-22T01:35:00Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java

+                final StateRestorer restorer = stateRestorers.get(partition);
+                final long pos = processNext(records.records(partition), restorer, updatedEndOffsets.get(partition));
+                restorer.setRestoredOffset(pos);
+                if (restorer.hasCompleted(pos, updatedEndOffsets.get(partition))) {


we can also remove the entry in updatedEndOffsets if this is true.

ConcurrencyPractitioner · 2018-05-22T02:24:29Z

Hi @mjsax , it appears that needsInitializing is distinct from needsRestoring. When I tested your approach, it didn't work because the topic partitions in needsInitializing are different from the ones that we actually need to restore.

ConcurrencyPractitioner · 2018-05-22T02:24:52Z

Only needsRestoring contains the needed partitions, not needsInitializing.

mjsax · 2018-05-22T03:50:32Z

I see. Guess we need to update the end-offsets before the call to initialize(); because the new partitions are move from needsInitialization to needsRestore within initialize().

ConcurrencyPractitioner · 2018-05-22T22:46:10Z

Hi @mjsax, when I updated the end offsets before initialize() is called. I got the following test error:

org.apache.kafka.streams.processor.internals.StoreChangelogReaderTest > shouldThrowExceptionIfConsumerHasCurrentSubscription FAILED
    java.lang.IllegalStateException: The partition sometopic-0 does not have an end offset.
        at org.apache.kafka.clients.consumer.MockConsumer.endOffsets(MockConsumer.java:377)
        at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:71)
        at org.apache.kafka.streams.processor.internals.StoreChangelogReaderTest.shouldThrowExceptionIfConsumerHasCurrentSubscription(StoreChangelogReaderTest.java:109)

As you can see, some partitions found in needsInitializing does not have an endOffsets, probably because they have not even been assigned one yet.

ConcurrencyPractitioner · 2018-05-24T22:19:47Z

Hi @mjsax any other comments that we would need to address?

ConcurrencyPractitioner · 2018-05-29T13:50:48Z

@mjsax If you feel you have time, could you tell me exactly what this PR is lacking?

mjsax · 2018-05-29T18:48:11Z

@ConcurrencyPractitioner Feature freeze deadline does not affect this PR, because it's an internal change. Code freeze deadline is in 2 weeks, so we still got time. Reviewing now though :)

mjsax

Call for second review @guozhangwang @bbejeck @vvcephei

mjsax · 2018-05-29T18:58:51Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java

@@ -81,10 +78,25 @@ public Collection<TopicPartition> restore(final RestoringTasks active) {

        final Set<TopicPartition> restoringPartitions = new HashSet<>(needsRestoring.keySet());


I think we can remove this variable to simplify the code further.

mjsax · 2018-05-29T18:59:31Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java

+            remainingPartitions.removeAll(updatedEndOffsets.keySet());
+            updatedEndOffsets.putAll(restoreConsumer.endOffsets(remainingPartitions));
+            final ConsumerRecords<byte[], byte[]> records = restoreConsumer.poll(10);
+            final Iterator<TopicPartition> iterator = restoringPartitions.iterator();


replace restoringPartitions with needsRestoring.keySet() to get rid of the unnecessary variable.

mjsax · 2018-05-29T19:01:40Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java

-                restorePartition(allRecords, partition, active.restoringTaskFor(partition));
+            final Set<TopicPartition> remainingPartitions = new HashSet<>(needsRestoring.keySet());
+            remainingPartitions.removeAll(updatedEndOffsets.keySet());
+            updatedEndOffsets.putAll(restoreConsumer.endOffsets(remainingPartitions));


This seems to be correct. Still wondering why you did not move this code into

if (!needsInitializing.isEmpty()) { initialize(); // put the three lines here }

I think that remainingPartitions can only contain data if !needsInitializing.isEmpty() is true -- thus, no need to execute the code in each iteration. Or do I miss anything?

Sorry about my delay. It should be fine now.

@mjsax Actually, needsInitializing can still be empty, but remainingPartitions would still have partitions which needs to be added from needsRestoring. I think the main reason is that needsInitializing and needsRestoring are completely independent of one another. If one does not contain partitions, that does not neccesarily effect the other. In other words, as long as needsRestoring has partitions which needs to be restored, we will probably need to check for partitions every time -- regardless of what needsInitializing contains.

I cannot follow. Not sure what I am missing. From my understanding the flow is as follows: (1) after new partitions are assigned, they are added to needsInitializing. (2) If partitions/task are initialized, it's check if the need restoring: (2a) if they need restoring they are removed from needsInitializing and added to needsRestoring (2b) if they don't need restoring, they are only removed from needsInitializing.

We need to maintain updatedEndOffsets only for partitions that needs restoring. And we need to get the endOffset for those partitions only once. Thus, new end-offsets only need to be added, after new partitions are assigned and thus, needsInitializing.isEmpty() is false.

Actually, needsInitializing can still be empty, but remainingPartitions would still have partitions which needs to be added from needsRestoring

Why? Those partitions should have been added to updatedEndOffsets after they got assigned initially.

I think the main reason is that needsInitializing and needsRestoring are completely independent of one another.

From my understanding, they are not independent. Each partitions is first in needsInitializing and might be "moved" from there to needsRestoring.

In other words, as long as needsRestoring has partitions which needs to be restored, we will probably need to check for partitions every time -- regardless of what needsInitializing contains

Why? The end-offset should not change and thus, for each partition that is moved from needsInitializing to needsRestoring we only add the end-offset once to updatedEndOffsets

Please let me know what I miss.

Oh, I see what happened. Sorry, this was my bad. What happened was that I was adding the end offsets to updatedEndOffsets before initialize() in the conditional was called. Consequently, I never detected any change in needsRestoring.

ConcurrencyPractitioner · 2018-05-29T21:53:17Z

HI @mjsax I think this should be all of it.

mjsax

Thanks for the update. One more comment.

mjsax · 2018-05-29T22:37:30Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java

+                if (restorer.hasCompleted(pos, updatedEndOffsets.get(partition))) {
+                    restorer.restoreDone();
+                    updatedEndOffsets.remove(partition);
+                    completedPartitions.add(partition);


One more thing -- missed this before: can't we remove completedPartitions and call iterator.remove() instead?

Maybe we need to change the iterator to iterate over the HashMap instead of the "key-set" of the HashMap though.

guozhangwang · 2018-05-30T04:24:33Z

retest this please

guozhangwang

Made my pass, other than @mjsax 's comment it lgtm.

ConcurrencyPractitioner · 2018-06-05T14:53:36Z

@mjsax Sorry about the delay. Fixed now.

bbejeck · 2018-06-05T15:37:09Z

retest this please

bbejeck

Thanks for the patch, LGTM.

mjsax

LGTM. Thanks for the patch and for being patient!

…grained-acl-create-topics * apache-github/trunk: KAFKA-5588: Remove deprecated --new-consumer tools option (apache#5097) MINOR: Fix for the location of the trogdor.sh executable file in the documentation. (apache#5040) KAFKA-6997: Exclude test-sources.jar when $INCLUDE_TEST_JARS is FALSE MINOR: docs should point to latest version (apache#5132) KAFKA-6981: Move the error handling configuration properties into the ConnectorConfig and SinkConnectorConfig classes (KIP-298) [KAFKA-6730] Simplify State Store Recovery (apache#5013) MINOR: Rename package `internal` to `internals` for consistency (apache#5137) KAFKA-6704: InvalidStateStoreException from IQ when StreamThread closes store (apache#4801) MINOR: Add missing configs for resilience settings MINOR: Add regression tests for KTable mapValues and filter (apache#5134) KAFKA-6750: Add listener name to authentication context (KIP-282) (apache#4829) KAFKA-3665: Enable TLS hostname verification by default (KIP-294) (apache#4956) KAFKA-6938: Add documentation for accessing Headers on Kafka Streams Processor API (apache#5128) KAFKA-6813: return to double-counting for count topology names (apache#5075) KAFKA-5919; Adding checks on "version" field for tools using it MINOR: Remove deprecated KafkaStreams constructors in docs (apache#5118)

…refix * apache-github/trunk: KAFKA-6726: Fine Grained ACL for CreateTopics (KIP-277) (apache#4795) KAFKA-5588: Remove deprecated --new-consumer tools option (apache#5097) MINOR: Fix for the location of the trogdor.sh executable file in the documentation. (apache#5040) KAFKA-6997: Exclude test-sources.jar when $INCLUDE_TEST_JARS is FALSE MINOR: docs should point to latest version (apache#5132) KAFKA-6981: Move the error handling configuration properties into the ConnectorConfig and SinkConnectorConfig classes (KIP-298) [KAFKA-6730] Simplify State Store Recovery (apache#5013) MINOR: Rename package `internal` to `internals` for consistency (apache#5137) KAFKA-6704: InvalidStateStoreException from IQ when StreamThread closes store (apache#4801) MINOR: Add missing configs for resilience settings MINOR: Add regression tests for KTable mapValues and filter (apache#5134) KAFKA-6750: Add listener name to authentication context (KIP-282) (apache#4829) KAFKA-3665: Enable TLS hostname verification by default (KIP-294) (apache#4956) KAFKA-6938: Add documentation for accessing Headers on Kafka Streams Processor API (apache#5128) KAFKA-6813: return to double-counting for count topology names (apache#5075) KAFKA-5919; Adding checks on "version" field for tools using it MINOR: Remove deprecated KafkaStreams constructors in docs (apache#5118)

Reviewer: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>

Richard Yu added 2 commits May 12, 2018 11:32

[KAFKA-6730] Simplify State Store Recovery

9b41afa

Making change

a129b4e

mjsax mentioned this pull request May 16, 2018

[KAFKA-6730] Simplify state store recovery #4901

Closed

mjsax requested review from guozhangwang and mjsax May 16, 2018 04:22

mjsax added the streams label May 16, 2018

mjsax reviewed May 16, 2018

View reviewed changes

ensuring updatedEndOffsets is called once

4883a43

Richard Yu and others added 2 commits May 19, 2018 16:04

Fixing post initialization

804a850

Merge branch 'trunk' into KAFKA-6730

3963257

mjsax reviewed May 22, 2018

View reviewed changes

Adding removal of partitions

78d81c3

Removing conditional

8207dbe

mjsax reviewed May 29, 2018

View reviewed changes

Richard Yu added 2 commits May 29, 2018 14:35

Patching

303fbab

Strengthening conditional

4a33d9c

mjsax reviewed May 29, 2018

View reviewed changes

guozhangwang reviewed May 30, 2018

View reviewed changes

Richard Yu added 2 commits May 31, 2018 19:37

Reverting previous unforeseen error

dcb5f0c

Moving partition addition

e742d31

bbejeck approved these changes Jun 5, 2018

View reviewed changes

mjsax approved these changes Jun 5, 2018

View reviewed changes

mjsax merged commit ba0ebca into apache:trunk Jun 5, 2018

mjsax mentioned this pull request Jun 6, 2018

MINOR: remove duplicate map in StoreChangelogReader #5143

Merged

3 tasks

mjsax mentioned this pull request Jun 6, 2018

KAFKA-5697: Use nonblocking poll in Streams #5107

Merged

3 tasks

ying-zheng pushed a commit to ying-zheng/kafka that referenced this pull request Jul 6, 2018

[KAFKA-6730] Simplify State Store Recovery (apache#5013)

631cae2

Reviewer: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>

		@@ -81,10 +78,25 @@ public Collection<TopicPartition> restore(final RestoringTasks active) {

		final Set<TopicPartition> restoringPartitions = new HashSet<>(needsRestoring.keySet());

Comments

Conversation

ConcurrencyPractitioner commented May 12, 2018

Uh oh!

ConcurrencyPractitioner commented May 15, 2018

Uh oh!

mjsax commented May 16, 2018

Uh oh!

mjsax left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ConcurrencyPractitioner commented May 17, 2018

Uh oh!

guozhangwang commented May 17, 2018

Uh oh!

ConcurrencyPractitioner commented May 17, 2018

Uh oh!

mjsax commented May 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjsax commented May 18, 2018

Uh oh!

ConcurrencyPractitioner commented May 19, 2018

Uh oh!

ConcurrencyPractitioner commented May 19, 2018

Uh oh!

mjsax commented May 19, 2018

Uh oh!

ConcurrencyPractitioner commented May 21, 2018

Uh oh!

mjsax left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ConcurrencyPractitioner commented May 22, 2018

Uh oh!

ConcurrencyPractitioner commented May 22, 2018

Uh oh!

mjsax commented May 22, 2018

Uh oh!

ConcurrencyPractitioner commented May 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ConcurrencyPractitioner commented May 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ConcurrencyPractitioner commented May 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjsax commented May 29, 2018

Uh oh!

mjsax left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ConcurrencyPractitioner Jun 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

mjsax commented May 17, 2018 •

edited

Loading

ConcurrencyPractitioner commented May 22, 2018 •

edited

Loading

ConcurrencyPractitioner commented May 24, 2018 •

edited

Loading

ConcurrencyPractitioner commented May 29, 2018 •

edited

Loading

ConcurrencyPractitioner Jun 1, 2018 •

edited

Loading