Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28705 backup log cleaner cleans required walls when using multiple backuproots #6040

Conversation

DieterDP-ng
Copy link
Contributor

@DieterDP-ng DieterDP-ng commented Jul 2, 2024

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache9 Apache9 requested a review from bbeaudreault July 30, 2024 14:23
@Apache9
Copy link
Contributor

Apache9 commented Jul 30, 2024

Ping @bbeaudreault @rmdmattingly PTAL at this? I'm not familiar with the backup code and I think this is a blocker for the 2.6.1 release?

Comment on lines 101 to 120
if (existingValue == null || existingValue > entry.getValue()) {
serversToLatestUnusedLogTimestamp.put(address, entry.getValue());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to store the latest, or largest, timestamp in the serversToLatestUnusedLogTimestamp then I think we'd want to flip existingValue > entry.getValue() to existingValue < entry.getValue() if this conditional indicates that entry.getValue() should be added to the map

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading this code after 2 months, I realize it deserves some additional documentation.

But I think the comparison is good as-is:

  • In the first loop (line 88), we track the newest/mostRecent backup per root
  • In the second loop (line 95), we check the lowest timestamp (= oldest/leastRecent) per address, using the backup per root

So we effectively check per address which timestamp is included in all backups. Only wals older (= lower TS) than that boundary can be deleted.

Comment on lines 163 to 170
protected static boolean canDeleteFile(Map<Address, Long> addressToLastBackupMap, Path path) {
protected static boolean canDeleteFile(Map<Address, Long> addressToLatestUnusedLogTs, Path path) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total nitpick, but we unified on "newest" in that other PR, in case we'd like to use newest here as well for the sake of consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "newest unused log timestamp" also feels weird. I'll just rename it to a "boundary" to avoid the newest/oldest confusion.

@ndimiduk
Copy link
Member

This one looks pretty close. Shall we try to get this in for 2.6.1?

… backuproots

Extend & refactor the test for BackupLogCleaner.
@DieterDP-ng DieterDP-ng force-pushed the HBASE-28705-BackupLogCleaner-cleans-required-WALs-when-using-multiple-backuproots branch from aa0e531 to 64a0156 Compare September 16, 2024 19:42
@DieterDP-ng
Copy link
Contributor Author

Rebased on master with additonal javadoc and some renames for clarity.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

… backuproots

The BackupLogCleaner is responsible for avoiding the deletion of WAL/logs
that still need to be included in a future backup.

The logic to decide which files can be deleted did not work correctly when
multiple backup roots are used. Each backup root has a different chain of
backups (full, incremental1, incremental2, ...). So, if any chain requires
a log, it should be preserved. This was not the case.

The result was that logs could be incorrectly deleted, resulting in data loss
in backups.
@DieterDP-ng DieterDP-ng force-pushed the HBASE-28705-BackupLogCleaner-cleans-required-WALs-when-using-multiple-backuproots branch from 64a0156 to 7d76842 Compare September 17, 2024 07:27
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

Map<TableName, Long> tableNameBackupInfoMap = new HashMap<>();
for (BackupInfo backupInfo : backups) {
// This map tracks, for every backup root, the most recent created backup (= highest timestamp)
Map<String, BackupInfo> newestBackupPerRootDir = new HashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be multiple BackupInfo objects with the same root? (what is a "root" anyway? a FileSystem + Path?) Should this be a Map<String, List<BackupInfo>>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A backup root corresponds to the path you specify when doing hbase backup create <type> <backup_path>. Within a single root, you'll have a series of backups. Having multiple roots allows you to manage backups independently (eg: if you delete one root, backups in other roots are unaffected). Incremental backups are always built on top of previous backups from the same root.

Eg, (F = full backup, I = incremental backup):

Time --------------------------------------------->
Root1: F    I     I     I     F    I    I    I
Root2: F          F  I    I   I              F

So to answer your questions:

  • There are multiple BackupInfos per root: 1 per backup created in that root. A root is a Path.
  • In this piece of logic, we only need the most recent (= newest) backup, so Map<String, BackupInfo>, where the root is represented as a String.

* I.e. WALs with a lower (= older) or equal timestamp are no longer needed for future incremental
* backups.
*/
private Map<Address, Long> serverToPreservationBoundaryTs(List<BackupInfo> backups)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this method have a precondition asserting that all BackupInfos in the list have an identical table set?

Otherwise I think this could cause data loss in an incremental backup. Let's say we have one-table and another-table on our cluster, and we back up each one independently. Let's say we first took a backup for one-table, then a backup for another-table. Then we passed the corresponding BackupInfos into this method, this would yield preservation boundaries which would remove WALs necessary for the next incremental backup of one-table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an issue. Trying to follow your example:

  • Imagine 3 servers S1, S2, S3
  • table T1 has regions on S1 and S2, table T2 has regions on S2 and S3
  • we have 2 backup roots R1 that backs up T1, and R2 that backs up T2

At time:

  • t=0, we backup T1 in R1 => backup B1
  • t=0, we backup T2 in R2 => backup B2
  • t=10, we backup T1 in R1 => backup B3
  • t=20, we backup T2 in R2. => backup B4

Following the logic in this method:

  • newestBackupPerRootDir will contain: (R1: B3, R2: B4)
  • boundaries will contain: (S1: 10, S2: 10, S3: 20)

So for T1, all WALs up to t=10 can be deleted, for T2, WALs will be preserved from t=10 or t=20, depending whether other tables are present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation, both this and below. I think I follow you. Let me rephrase.

newestBackupPerRootDir is a "race to the top" in terms of identifying the most recent timestamp across backups, while boundaries is a "race to the bottom" in terms of identifying the least recent backup present on each server. By selecting the "oldest of the newest" you determine the minimum timestamp that must be preserved for each server. Any WALs older than this minimum timestamp can be safely discarded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I think the missing piece for me was that a backup root was specific to a given table

we have 2 backup roots R1 that backs up T1, and R2 that backs up T2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I think the missing piece for me was that a backup root was specific to a given table

It isn't. Each backup root contains one or more tables, i.e. a set of tables, these table sets may or may not have tables in common.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, okay so reframing your example in a single root:

Imagine
3 servers S1, S2, S3
table T1 has regions on S1 and S2, table T2 has regions on S2 and S3
we have 1 backup root R1 that backs up T1 and T2

At time:
t=0, we backup T1 in R1 => backup B1
t=0, we backup T2 in R1 => backup B2
t=10, we backup T1 in R1 => backup B3
t=20, we backup T2 in R1 => backup B4

Following the logic in this method:

newestBackupPerRootDir will contain: (R1: B4)

We would calculate the preservation boundaries here:

    // This map tracks, for every address, the least recent (= oldest / lowest timestamp) inclusion
    // in any backup. In other words, it is the timestamp boundary up to which all backups roots
    // have included the WAL in their backup.
    Map<Address, Long> boundaries = new HashMap<>();
    for (BackupInfo backupInfo : newestBackupPerRootDir.values()) {
      for (TableName table : backupInfo.getTables()) {
        for (Map.Entry<String, Long> entry : backupInfo.getTableSetTimestampMap().get(table)
          .entrySet()) {
          Address address = Address.fromString(entry.getKey());
          Long storedTs = boundaries.get(address);
          if (storedTs == null || entry.getValue() < storedTs) {
            boundaries.put(address, entry.getValue());
          }
        }
      }
    }

Since the newest backup in R1 is B4, serverToPreservationBoundaryTs will contain (S1: 20, S2: 20, S3: 20)

In this situation, we must not delete WALs from S1 or S2 between times 10 and 20 because a subsequent incremental backup of T1 will require those WALs.

In the BackupLogCleaner, though, we will end up calling canDeleteFile and hitting this code with that serverToPreservationBoundaryTs (S1: 20, S2: 20, S3: 20) renamed as addressToBoundaryTs:

      if (!addressToBoundaryTs.containsKey(walServerAddress)) { // for S1 & S2, this will be false
         // irrelevant ...
      }

      Long backupBoundary = addressToBoundaryTs.get(walServerAddress); // for S1 & S2, this will be 20
      if (backupBoundary >= walTimestamp) { // For WALs between times 10 and 20, this will be true
        if (LOG.isDebugEnabled()) {
          LOG.debug(
            "Backup found for server: {}. All backups from {} are newer than file, so deleting: {}",
            walServerAddress.getHostName(), backupBoundary, path);
        }
        // so we will erroneously delete WALs <20, despite the next inc backup of T1 requiring them
        return true;
      }

I guess this depends on how ancestry is defined? If we consider all backups in a root to be ancestors regardless of their table set, then maybe it is okay to delete these WALs. But, if not, then I don't see how it is okay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newestBackupPerRootDir will contain: (R1: B4)

Agreed

Since the newest backup in R1 is B4, serverToPreservationBoundaryTs will contain (S1: 20, S2: 20, S3: 20)

I think this is wrong.

To make things more concrete, I'm going to assume all backups were full backups, and B1 == B2 (since it has the same timestamp). I.e. the backups were:

  • B1: at timestamp 0, containing tables T1 & T2
  • B3: at timestamp 10, containing T1
  • B4: at timestamp 20, containing T2

At this point, the data in backupInfo.getTableSetTimestampMap() will be:

T1:
  S1: 10
  S2: 10
T2:
  S2: 20
  S3: 20 

and serverToPreservationBoundaryTs will be (S1: 10, S2: 10, S3: 20).

The reason that the BackupInfo of B4 contains the log timestamps of tables not included in B4 is due to how the backup client updates these:

// From: FullTableBackupClient

// Updates only the rows for the tables included in B4
backupManager.writeRegionServerLogTimestamp(backupInfo.getTables(), newTimestamps);

// Reads the rows for all tables once included in a backup in this backup root
Map<TableName, Map<String, Long>> newTableSetTimestampMap = backupManager.readLogTimestampMap();

Copy link
Contributor

@rmdmattingly rmdmattingly Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being a pain here, but I'm not sure I agree. When building serverToPreservationBoundaryTs we loop through:

Map<Address, Long> boundaries = new HashMap<>();
for (BackupInfo backupInfo : newestBackupPerRootDir.values()) {
  // build boundaries via tableSetTimestampMap, like you said
}
// logging omitted
return boundaries;

and we agree that newestBackupPerRootDir.values() will only contain B4. So our boundaries would end up only being based on the timestamps from B4? How does the second newest backup, and beyond, in the root come into play?

Copy link
Contributor

@rmdmattingly rmdmattingly Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the second newest backup, and beyond, in the root come into play?

Answering my own questions a bit here, I think I needed to look deeper into how the BackupInfo gets constructed. When the tableSetTimestampMap is populated, it will contain entries for every table in the root, not just every table in the given backup. So while B4 might only pertain to T2, its corresponding BackupInfo will be aware of the timestamps for T1. That's a confusing design imo, but if true then I think this logic is sound

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's indeed like that. Doing it that way is a good way to avoid some special cases where backups in the same root backup do not contain the same tables, but the docs surrounding those BackupInfo concepts is indeed lacking.

Copy link
Member

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DieterDP-ng thank you for your explanations. I think I can now participate effectively in the discussion.

* I.e. WALs with a lower (= older) or equal timestamp are no longer needed for future incremental
* backups.
*/
private Map<Address, Long> serverToPreservationBoundaryTs(List<BackupInfo> backups)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation, both this and below. I think I follow you. Let me rephrase.

newestBackupPerRootDir is a "race to the top" in terms of identifying the most recent timestamp across backups, while boundaries is a "race to the bottom" in terms of identifying the least recent backup present on each server. By selecting the "oldest of the newest" you determine the minimum timestamp that must be preserved for each server. Any WALs older than this minimum timestamp can be safely discarded.

@@ -61,8 +65,10 @@ public class TestBackupLogCleaner extends TestBackupBase {

@Test
public void testBackupLogCleaner() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is not quite what you described in your earlier comment, and I'm pretty sure that it doesn't test the criteria in the same way either. It's a valid test case, but I think it's missing some cases that we would like to see covered.

In this test, we have a single region server, which serves as S2 from your above example.

  1. At T0 you take a full backup B1 of all four tables at R1. boundary for our server is the timestamp of oldest of the wals of all the tables.
  2. Data is inserted into tables 1 and 2.
  3. At T1 you take incremental backup B1 of tables 1, 2, 3 to R1. boundary for our server doesn't change because the oldest wal is still from B1 and table 4.
  4. At T2 you take a full backup B2 of tables 1, 2, 3 to R2. boundary does not change because the oldest wal is still from B1 and table 4.

Based on my understanding, this test never sees a point when a wal can be discarded because wals from table4 are never rolled and so the wals identified for table4 in B1 are always the "oldest of the newest" for this server.

I think that we'd like to see a test that does what you described in your initial comment: observes the "rising tide" of wal timestamp on the server as backups across roots progress.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've majorly refactored & extended the testcase, and found a bug along the way.
@rmdmattingly @ndimiduk please take a look.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 39s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 2s master passed
+1 💚 compile 0m 21s master passed
+1 💚 javadoc 0m 15s master passed
+1 💚 shadedjars 5m 16s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 59s the patch passed
+1 💚 compile 0m 21s the patch passed
+1 💚 javac 0m 21s the patch passed
+1 💚 javadoc 0m 15s the patch passed
+1 💚 shadedjars 5m 15s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 11m 22s hbase-backup in the patch passed.
30m 41s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6040/5/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6040
Optional Tests javac javadoc unit compile shadedjars
uname Linux 0e3e91358d3d 5.4.0-195-generic #215-Ubuntu SMP Fri Aug 2 18:28:05 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a31a19c
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6040/5/testReport/
Max. process+thread count 3368 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6040/5/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 25s master passed
+1 💚 compile 0m 31s master passed
+1 💚 checkstyle 0m 11s master passed
+1 💚 spotbugs 0m 33s master passed
+1 💚 spotless 0m 47s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 8s the patch passed
+1 💚 compile 0m 30s the patch passed
-0 ⚠️ javac 0m 30s /results-compile-javac-hbase-backup.txt hbase-backup generated 1 new + 100 unchanged - 0 fixed = 101 total (was 100)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 10s the patch passed
+1 💚 spotbugs 0m 37s the patch passed
+1 💚 hadoopcheck 13m 40s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 1m 1s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
33m 22s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6040/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6040
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 4cdc54a15b0e 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a31a19c
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6040/5/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Member

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give @rmdmattingly a chance to comment, but this is good by me. Thanks a lot for this cleanup @DieterDP-ng , this is a real improvement in the system.

Copy link
Contributor

@rmdmattingly rmdmattingly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as well. Thanks for the contribution, and thanks for your patience as I wrap my head around more of our backups code!

@ndimiduk ndimiduk merged commit c477901 into apache:master Sep 25, 2024
1 check passed
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Sep 25, 2024
… backuproots (apache#6040)

The BackupLogCleaner is responsible for avoiding the deletion of WAL/logs
that still need to be included in a future backup.

The logic to decide which files can be deleted did not work correctly when
multiple backup roots are used. Each backup root has a different chain of
backups (full, incremental1, incremental2, ...). So, if any chain requires
a log, it should be preserved. This was not the case.

The result was that logs could be incorrectly deleted, resulting in data loss
in backups.

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Ray Mattingly <rmattingly@apache.org >
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Sep 25, 2024
… backuproots (apache#6040)

The BackupLogCleaner is responsible for avoiding the deletion of WAL/logs
that still need to be included in a future backup.

The logic to decide which files can be deleted did not work correctly when
multiple backup roots are used. Each backup root has a different chain of
backups (full, incremental1, incremental2, ...). So, if any chain requires
a log, it should be preserved. This was not the case.

The result was that logs could be incorrectly deleted, resulting in data loss
in backups.

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Ray Mattingly <rmattingly@apache.org >
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Sep 25, 2024
… backuproots (apache#6040)

The BackupLogCleaner is responsible for avoiding the deletion of WAL/logs
that still need to be included in a future backup.

The logic to decide which files can be deleted did not work correctly when
multiple backup roots are used. Each backup root has a different chain of
backups (full, incremental1, incremental2, ...). So, if any chain requires
a log, it should be preserved. This was not the case.

The result was that logs could be incorrectly deleted, resulting in data loss
in backups.

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Ray Mattingly <rmattingly@apache.org >
ndimiduk pushed a commit that referenced this pull request Sep 26, 2024
… backuproots (#6040)

The BackupLogCleaner is responsible for avoiding the deletion of WAL/logs
that still need to be included in a future backup.

The logic to decide which files can be deleted did not work correctly when
multiple backup roots are used. Each backup root has a different chain of
backups (full, incremental1, incremental2, ...). So, if any chain requires
a log, it should be preserved. This was not the case.

The result was that logs could be incorrectly deleted, resulting in data loss
in backups.

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Ray Mattingly <rmattingly@apache.org >
ndimiduk added a commit that referenced this pull request Sep 26, 2024
… backuproots (#6040)

The BackupLogCleaner is responsible for avoiding the deletion of WAL/logs
that still need to be included in a future backup.

The logic to decide which files can be deleted did not work correctly when
multiple backup roots are used. Each backup root has a different chain of
backups (full, incremental1, incremental2, ...). So, if any chain requires
a log, it should be preserved. This was not the case.

The result was that logs could be incorrectly deleted, resulting in data loss
in backups.

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Ray Mattingly <rmattingly@apache.org >
Co-authored-by: DieterDP <90392398+DieterDP-ng@users.noreply.github.com>
ndimiduk added a commit that referenced this pull request Sep 26, 2024
… backuproots (#6040)

The BackupLogCleaner is responsible for avoiding the deletion of WAL/logs
that still need to be included in a future backup.

The logic to decide which files can be deleted did not work correctly when
multiple backup roots are used. Each backup root has a different chain of
backups (full, incremental1, incremental2, ...). So, if any chain requires
a log, it should be preserved. This was not the case.

The result was that logs could be incorrectly deleted, resulting in data loss
in backups.

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Ray Mattingly <rmattingly@apache.org >
Co-authored-by: DieterDP <90392398+DieterDP-ng@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants