KAFKA-12964: Collect and rename snapshot files prior to async deletion. by gardnervickers · Pull Request #10896 · apache/kafka

gardnervickers · 2021-06-17T15:04:37Z

Segment and index files are currently renamed with a .deleted
suffix prior to async deletion. This serves two purposes, to
resume deletion on broker failure and also protect against
deletion of new segments during truncation (due to deletion
being async).

We should do the same for snapshot files. While they are not subject
to issues around resuming deletion due to the stray snapshot
scanning which is performed on log initialization, we can end up
with situations where truncation queues snapshots for deletion, but
prior to deletion new segments with the same snapshot file name are
created. Async deletion can then delete these new snapshots.

This patch offers a two-stage snapshot deletion which first renames
and removes the segments in question from the ProducerStateManager,
allowing the Log to asynchronously delete them.

Credit to Kowshik Prakasam kowshik@gmail.com for finding this issue
and creating the test demonstrating the failure.

Co-authored-by: Kowshik Prakasam kowshik@gmail.com

Segment and index files are currently renamed with a .deleted suffix prior to async deletion. This serves two purposes, to resume deletion on broker failure and also protect against deletion of new segments during truncation (due to deletion being async). We should do the same for snapshot files. While they are not subject to issues around resuming deletion due to the stray snapshot scanning which is performed on log initialization, we can end up with situations where truncation queues snapshots for deletion, but prior to deletion new segments with the same snapshot file name are created. Async deletion can then delete these new snapshots. This patch offers a two-stage snapshot deletion which first renames and removes the segments in question from the ProducerStateManager, allowing the Log to asynchronously delete them. Credit to Kowshik Prakasam <kowshik@gmail.com> for finding this issue and creating the test demonstrating the failure. Co-authored-by: Kowshik Prakasam <kowshik@gmail.com>

kowshik

@gardnervickers Thanks for the PR! LGTM. I have added few comments.

kowshik

@gardnervickers Thanks for the PR! LGTM. Just few comments below.

core/src/test/scala/unit/kafka/log/LogLoaderTest.scala

core/src/main/scala/kafka/log/Log.scala

core/src/main/scala/kafka/log/ProducerStateManager.scala

core/src/test/scala/unit/kafka/log/ProducerStateManagerTest.scala

kowshik · 2021-06-18T08:48:25Z

core/src/main/scala/kafka/log/ProducerStateManager.scala

+  private[log] def removeAndMarkSnapshotForDeletion(snapshotOffset: Long): Option[SnapshotFile] = {
+    Option(snapshots.remove(snapshotOffset)).flatMap { snapshot => {
+      // If the file cannot be renamed, it likely means that the file was deleted already.
+      // This can happen due to the way we construct an intermediate producer state manager


I'm not entirely sure I understood this comment. Looking at the LogLoader code, it doesn't appear that we use the intermediate producer state manager to issue async deletions of snapshot files.

So, is it still possible that a missing file is a valid case?

Agree with Kowshik. It's not clear if the comment is still valid.

@junrao @kowshik I accidentally included the word async here, the deletion performed by the intermediate ProducerStateManager is done synchronously.

I'm referring to the case where we go through LogLoader.recoverSegment. We construct a new "intermediate" ProducerStateManager for segment recovery which is separate from the "real" ProducerStateManager captured in LoadLogParams.

Segment recovery can use the intermediate ProducerStateManager to truncate snapshot files via ProducerStateManager.truncateAndReload in Log.rebuildProducerState. The SnapshotFile instances will be removed from the in-memory map for the "intermediate" ProducerStateManager in this case, but will remain for the "real" ProducerStateManager captured in LoadLogParams.

@gardnervickers : Good point. It seems that LogLoader.recoverSegment() can both remove and add snapshots, both of which will be missing in the "real" ProducerStateManager captured in LoadLogParams. This can lead to the missing file issue you pointed out and also potentially cause LogLoad.load() to do an unnecessary expensive Log.rebuildProducerState().

@kowshik : I am wondering if we should let LoadLoader reload the snapshots in the "real" ProducerStateManager before calling Log.rebuildProducerState() in LogLoad.load().

@gardnervickers : @kowshik mentioned that params.producerStateManager.removeStraySnapshots(params.segments.baseOffsets.toSeq)
in LogLoad.load() actually reloads the snapshots after log recovery. So, it seems that the issue you mentioned may not be a problem?

@kowshik @junrao Right, we'll still end up with a a fully in-sync ProducerStateManager after LogLoader.load(..) runs.

The problem can still occur though because we delete snapshots using both the "intermediate" and "real" ProducerStateManager prior to removeStraySnapshots at the end of LogLoader.load.

recoverSegment can delete snapshots with the intermediate ProducerStateManager

removeAndDeleteSegmentsAsync will use the "real" ProducerStateManager to schedule async deletion. It may have a stale view of the present snapshots on the filesystem if Switch to using scala 2.9.2 #1 deleted snapshots, causing the rename to fail.

At the end of LogLoader.load, we will removeStraySnapshots, which will fix up any discrepancies between the contents of the log dir and the "real" ProducerStateManager.

Lets assume PSM refers to ProducerStateManager.

@junrao @gardnervickers That feels right to me, thanks for the explanation! Couple things I wanted to ask:

Should we update the comment here to say:

// Reload all snapshots into the ProducerStateManager cache, the intermediate ProducerStateManager used // during log recovery may have created or deleted some snapshots // without the LoadLogParams.producerStateManager instance witnessing the changes.

PSM.removeStraySnapshots and its params could have a better name. Should we call it differently, like PSM.reloadEssentialSnapshots(essentialSegmentBaseOffsets: Seq[Long])?

=== SUMMARY OF CASES ===

I thought it's useful to summarize. There are few different cases that arise wheneverPSM.removeAndMarkSnapshotForDeletion() is invoked on the "real" PSM instance. I believe all cases are handled with the current code as explained below:

Straightforward cases:

Snapshot entry is present in real PSM instance and snapshot file is present. This is a straightforward case where we remove the entry and rename the file.

Snapshot entry is absent in real PSM instance and snapshot file is absent. This is also a more straightforward case where we do nothing.

Corner cases:

Snapshot entry is present in the real PSM instance, but snapshot file absent. This can happen because intermediate PSM deleted the snapshot file. In this case, we ignore the failure in the file rename.

Snapshot entry is absent in the real PSM instance, but snapshot file present. This can happen when intermediate PSM takes a snapshot, but the real PSM doesn't have the entry (yet). This is handled by the call to PSM.removeStraySnapshots here which corrects such discrepancies by loading all snapshots from disk and eliminating those that don't match the list of segment base offsets post recovery.

@gardnervickers : Thanks for the explanation. Makes sense.

@kowshik : The source of all the confusing is that we use the real PSM in some cases while using a temporary PSM in some other cases during recovery. The temporary PSM in recoverSegment() is used in 4 different places.

In recoverLog(). this is the case that we could just pass in the real PSM.

In completeSwapOperations(). We try to avoid recovering segment here in KAFKA-12520: Ensure log loading does not truncate producer state unless required #10763.
3 and 4. In loadSegmentFiles(). We probably need to clean this part of the logic a bit. If we are missing index file or the index file is corrupted, typically we can just rebuild the index without changing PSM. If the segment is truncated while rebuilding the index, we actually want to follow the process in step 1, by just removing the rest of the segments. So, we could also get rid of the temporary PSM in this case.

I am wondering if we could have a separate PR to get rid of the temporary PSM complete?

@junrao That's a good point. Yes, we should get rid of the temporary PSM. I've created a jira tracking this improvement: https://issues.apache.org/jira/browse/KAFKA-12977. It is currently assigned to myself and I'll follow up on it.

junrao

@gardnervickers : Thanks for the PR. Just a couple of minor comments.

core/src/main/scala/kafka/log/Log.scala

junrao · 2021-06-21T16:01:31Z

core/src/main/scala/kafka/log/ProducerStateManager.scala

+  private[log] def removeAndMarkSnapshotForDeletion(snapshotOffset: Long): Option[SnapshotFile] = {
+    Option(snapshots.remove(snapshotOffset)).flatMap { snapshot => {
+      // If the file cannot be renamed, it likely means that the file was deleted already.
+      // This can happen due to the way we construct an intermediate producer state manager


Agree with Kowshik. It's not clear if the comment is still valid.

junrao

@gardnervickers : The PR itself looks good to me. Just one more comment on the issue that you pointed out.

junrao · 2021-06-21T21:24:10Z

core/src/main/scala/kafka/log/ProducerStateManager.scala

+  private[log] def removeAndMarkSnapshotForDeletion(snapshotOffset: Long): Option[SnapshotFile] = {
+    Option(snapshots.remove(snapshotOffset)).flatMap { snapshot => {
+      // If the file cannot be renamed, it likely means that the file was deleted already.
+      // This can happen due to the way we construct an intermediate producer state manager


@gardnervickers : Good point. It seems that LogLoader.recoverSegment() can both remove and add snapshots, both of which will be missing in the "real" ProducerStateManager captured in LoadLogParams. This can lead to the missing file issue you pointed out and also potentially cause LogLoad.load() to do an unnecessary expensive Log.rebuildProducerState().

@kowshik : I am wondering if we should let LoadLoader reload the snapshots in the "real" ProducerStateManager before calling Log.rebuildProducerState() in LogLoad.load().

junrao

@gardnervickers : One further comment on the previous issue.

junrao · 2021-06-21T21:50:32Z

core/src/main/scala/kafka/log/ProducerStateManager.scala

+  private[log] def removeAndMarkSnapshotForDeletion(snapshotOffset: Long): Option[SnapshotFile] = {
+    Option(snapshots.remove(snapshotOffset)).flatMap { snapshot => {
+      // If the file cannot be renamed, it likely means that the file was deleted already.
+      // This can happen due to the way we construct an intermediate producer state manager


@gardnervickers : @kowshik mentioned that params.producerStateManager.removeStraySnapshots(params.segments.baseOffsets.toSeq)
in LogLoad.load() actually reloads the snapshots after log recovery. So, it seems that the issue you mentioned may not be a problem?

kowshik

@gardnervickers Thanks for the updated PR! Just few more comments.

core/src/main/scala/kafka/log/ProducerStateManager.scala

kowshik · 2021-06-21T23:54:21Z

core/src/main/scala/kafka/log/ProducerStateManager.scala

+  private[log] def removeAndMarkSnapshotForDeletion(snapshotOffset: Long): Option[SnapshotFile] = {
+    Option(snapshots.remove(snapshotOffset)).flatMap { snapshot => {
+      // If the file cannot be renamed, it likely means that the file was deleted already.
+      // This can happen due to the way we construct an intermediate producer state manager


Lets assume PSM refers to ProducerStateManager.

@junrao @gardnervickers That feels right to me, thanks for the explanation! Couple things I wanted to ask:

Should we update the comment here to say:

// Reload all snapshots into the ProducerStateManager cache, the intermediate ProducerStateManager used // during log recovery may have created or deleted some snapshots // without the LoadLogParams.producerStateManager instance witnessing the changes.

PSM.removeStraySnapshots and its params could have a better name. Should we call it differently, like PSM.reloadEssentialSnapshots(essentialSegmentBaseOffsets: Seq[Long])?

=== SUMMARY OF CASES ===

I thought it's useful to summarize. There are few different cases that arise wheneverPSM.removeAndMarkSnapshotForDeletion() is invoked on the "real" PSM instance. I believe all cases are handled with the current code as explained below:

Straightforward cases:

Snapshot entry is present in real PSM instance and snapshot file is present. This is a straightforward case where we remove the entry and rename the file.

Snapshot entry is absent in real PSM instance and snapshot file is absent. This is also a more straightforward case where we do nothing.

Corner cases:

Snapshot entry is present in the real PSM instance, but snapshot file absent. This can happen because intermediate PSM deleted the snapshot file. In this case, we ignore the failure in the file rename.

Snapshot entry is absent in the real PSM instance, but snapshot file present. This can happen when intermediate PSM takes a snapshot, but the real PSM doesn't have the entry (yet). This is handled by the call to PSM.removeStraySnapshots here which corrects such discrepancies by loading all snapshots from disk and eliminating those that don't match the list of segment base offsets post recovery.

core/src/main/scala/kafka/log/ProducerStateManager.scala

core/src/test/scala/unit/kafka/log/LogLoaderTest.scala

junrao

@gardnervickers : Thanks for the explanation. Makes sense. One more comment below.

junrao · 2021-06-22T01:04:39Z

core/src/main/scala/kafka/log/ProducerStateManager.scala

+  private[log] def removeAndMarkSnapshotForDeletion(snapshotOffset: Long): Option[SnapshotFile] = {
+    Option(snapshots.remove(snapshotOffset)).flatMap { snapshot => {
+      // If the file cannot be renamed, it likely means that the file was deleted already.
+      // This can happen due to the way we construct an intermediate producer state manager


@gardnervickers : Thanks for the explanation. Makes sense.

@kowshik : The source of all the confusing is that we use the real PSM in some cases while using a temporary PSM in some other cases during recovery. The temporary PSM in recoverSegment() is used in 4 different places.

In recoverLog(). this is the case that we could just pass in the real PSM.

In completeSwapOperations(). We try to avoid recovering segment here in KAFKA-12520: Ensure log loading does not truncate producer state unless required #10763.
3 and 4. In loadSegmentFiles(). We probably need to clean this part of the logic a bit. If we are missing index file or the index file is corrupted, typically we can just rebuild the index without changing PSM. If the segment is truncated while rebuilding the index, we actually want to follow the process in step 1, by just removing the rest of the segments. So, we could also get rid of the temporary PSM in this case.

I am wondering if we could have a separate PR to get rid of the temporary PSM complete?

kowshik

@gardnervickers Thanks for the PR! LGTM. Just few minor comments below.

core/src/test/scala/unit/kafka/log/LogLoaderTest.scala

kowshik · 2021-06-22T16:40:52Z

core/src/test/scala/unit/kafka/log/LogLoaderTest.scala

+      }
+    }
+    assertTrue(offsetsWithMissingSnapshotFiles.isEmpty,
+      s"Found offsets with missing producer state snapshot files: $offsetsWithMissingSnapshotFiles")


Does it make sense to check that there are no .deleted files in the log dir at the end of this test?

junrao

@gardnervickers : Thanks for the updated PR. LGTM. Do you want to address the remaining comments from Kowshik?

gardnervickers · 2021-07-01T00:56:32Z

Yes thanks @junrao. @kowshik please let me know if you think any other changes are necessary.

kowshik · 2021-07-01T01:06:58Z

@gardnervickers Thanks for the updated PR! LGTM.

…n. (apache#10896) Segment and index files are currently renamed with a .deleted suffix prior to async deletion. This serves two purposes, to resume deletion on broker failure and also protect against deletion of new segments during truncation (due to deletion being async). We should do the same for snapshot files. While they are not subject to issues around resuming deletion due to the stray snapshot scanning which is performed on log initialization, we can end up with situations where truncation queues snapshots for deletion, but prior to deletion new segments with the same snapshot file name are created. Async deletion can then delete these new snapshots. This patch offers a two-stage snapshot deletion which first renames and removes the segments in question from the ProducerStateManager, allowing the Log to asynchronously delete them. Credit to Kowshik Prakasam <kowshik@gmail.com> for finding this issue and creating the test demonstrating the failure. Co-authored-by: Kowshik Prakasam <kowshik@gmail.com> Address PR feedback Reviewers: Kowshik Prakasam <kprakasam@confluent.io>, Jun Rao <junrao@gmail.com>

kowshik reviewed Jun 18, 2021

View reviewed changes

Address PR feedback

871bf78

junrao reviewed Jun 21, 2021

View reviewed changes

Address PR feedback

3325369

junrao reviewed Jun 21, 2021

View reviewed changes

kowshik reviewed Jun 22, 2021

View reviewed changes

junrao reviewed Jun 22, 2021

View reviewed changes

Address PR feedback

befc370

kowshik approved these changes Jun 22, 2021

View reviewed changes

junrao approved these changes Jun 30, 2021

View reviewed changes

gardnervickers added 2 commits June 30, 2021 20:54

Address PR feedback

af5502c

Switch !snapshotFile.isDefined to snapshotFile.isEmpty

6e2df3c

junrao merged commit 789fc26 into apache:trunk Jul 1, 2021

Conversation

gardnervickers commented Jun 17, 2021

Uh oh!

kowshik left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kowshik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kowshik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kowshik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

gardnervickers commented Jul 1, 2021

Uh oh!

kowshik commented Jul 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

kowshik left a comment •

edited

Loading