HBASE-28697 Don't clean bulk load system entries until backup is complete #6089

rmdmattingly · 2024-07-17T20:09:05Z

https://issues.apache.org/jira/browse/HBASE-28697

I've been thinking through the incremental backup order of operations, and I think we delete rows from the bulk loads system table too early and, consequently, make it possible to produce a "successful" incremental backup that is missing bulk loads.

To summarize the steps here, starting in IncrementalTableBackupCilent#execute:

We take an incremental backup of the WALs generated since the last backup
We ensure any bulk loads done since the last backup are appropriately represented in the new backup by going through the system table and copying the appropriate files to the backup directory
We delete all of the system table rows which told us about these bulk loads
We generate a backup manifest and mark the backup as complete
If we began deleting any of the system table rows regarding bulk loads, but fail in steps 3 and 4 before we are able to mark the backup as complete, then we'll be in a precarious spot. If we retry an incremental backup then it may succeed, but it would not know to persist the bulk loaded files for which we have already deleted system table references.

We could consider this issue an extension or replacement of https://issues.apache.org/jira/browse/HBASE-28084 in some ways, depending on what solution we land on. I think that we could fix this specific issue by reordering the bulk load table cleanup, but there will always be gotchas like this. Maybe it is simpler to require that the next backup be a full backup after any incremental failure.

cc @hgromer @ndimiduk @DieterDP-ng

…lete

Apache-HBase · 2024-07-17T20:53:05Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 30s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	hbaseanti	0m 0s		Patch does not have any anti-patterns.
			_ master Compile Tests _
+1 💚	mvninstall	3m 45s		master passed
+1 💚	compile	0m 34s		master passed
+1 💚	checkstyle	0m 12s		master passed
+1 💚	spotbugs	0m 37s		master passed
+1 💚	spotless	0m 51s		branch has no errors when running spotless:check.
			_ Patch Compile Tests _
+1 💚	mvninstall	3m 29s		the patch passed
+1 💚	compile	0m 32s		the patch passed
+1 💚	javac	0m 32s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 10s		the patch passed
+1 💚	spotbugs	0m 43s		the patch passed
+1 💚	hadoopcheck	12m 18s		Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚	spotless	0m 50s		patch has no errors when running spotless:check.
			_ Other Tests _
+1 💚	asflicense	0m 10s		The patch does not generate ASF License warnings.
		32m 23s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6089/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#6089
Optional Tests	dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname	Linux 5a5e33cf3512 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `5d872aa`
Default Java	Eclipse Adoptium-17.0.11+9
Max. process+thread count	81 (vs. ulimit of 30000)
modules	C: hbase-backup U: hbase-backup
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6089/1/console
versions	git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2024-07-17T20:53:55Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 17s		Docker mode activated.
-0 ⚠️	yetus	0m 3s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ master Compile Tests _
+1 💚	mvninstall	4m 39s		master passed
+1 💚	compile	0m 29s		master passed
+1 💚	javadoc	0m 20s		master passed
+1 💚	shadedjars	6m 14s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	3m 6s		the patch passed
+1 💚	compile	0m 25s		the patch passed
+1 💚	javac	0m 25s		the patch passed
+1 💚	javadoc	0m 17s		the patch passed
+1 💚	shadedjars	5m 53s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
+1 💚	unit	10m 26s		hbase-backup in the patch passed.
		33m 16s

Subsystem	Report/Notes
Docker	ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6089/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR	#6089
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux ec260b53bacd 5.4.0-182-generic #202-Ubuntu SMP Fri Apr 26 12:29:36 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	master / `5d872aa`
Default Java	Eclipse Adoptium-17.0.11+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6089/1/testReport/
Max. process+thread count	3174 (vs. ulimit of 30000)
modules	C: hbase-backup U: hbase-backup
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6089/1/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

ndimiduk

This seems alright to me but I'd appreciate to hear from another voice with familiarity in this code path. Specifically, are there semantic implications in other parts of the backup system for having the completeBackup called before the deletes occur?

ndimiduk · 2024-07-18T11:34:38Z

...e-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java

   */
  @SuppressWarnings("unchecked")
-  protected Map<byte[], List<Path>>[] handleBulkLoad(List<TableName> sTableList)
-    throws IOException {
+  protected List<byte[]> handleBulkLoad(List<TableName> sTableList) throws IOException {


Why is it okay to drop the table context of the rowkeys in the returned value? a rowkey is only meaningful in the context of its table (or region).

We talked about this a bit offline. We can purge these rowkeys because they are only returned by handleBulkload if we have bulk loaded the keys in this backup.

Right now an inopportune failure would result in us missing bulk load data on subsequent incremental backups, but with this change an inopportune failure would result is us backing up duplicative files which should be just a little bit wasteful, but otherwise innocuous

Is it worth having some backup consistency check that can detect and purge extra files? Or do we think that backups will cycle out and the redundancy will be dropped the next time a full backup is taken?

rmdmattingly · 2024-08-27T15:16:52Z

@DieterDP-ng any thoughts on this PR?

ndimiduk · 2024-09-02T08:35:37Z

...e-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java

   */
  @SuppressWarnings("unchecked")
-  protected Map<byte[], List<Path>>[] handleBulkLoad(List<TableName> sTableList)
-    throws IOException {
+  protected List<byte[]> handleBulkLoad(List<TableName> sTableList) throws IOException {


Is it worth having some backup consistency check that can detect and purge extra files? Or do we think that backups will cycle out and the redundancy will be dropped the next time a full backup is taken?

…lete (apache#6089) Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>

…lete (#6089) Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>

DieterDP-ng · 2024-09-05T10:57:58Z

@DieterDP-ng any thoughts on this PR?

Sorry for the late reply. These changes look OK to me.

This seems alright to me but I'd appreciate to hear from another voice with familiarity in this code path. Specifically, are there semantic implications in other parts of the backup system for having the completBackup called before the deletes occur?

I'm not aware of any.

HBASE-28697 Don't clean bulk load system entries until backup is comp…

5d872aa

…lete

ndimiduk reviewed Jul 18, 2024

View reviewed changes

ndimiduk approved these changes Sep 2, 2024

View reviewed changes

ndimiduk merged commit ed6613e into apache:master Sep 2, 2024
1 check passed

ndimiduk deleted the HBASE-28697-master branch September 2, 2024 08:38

ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Sep 2, 2024

HBASE-28697 Don't clean bulk load system entries until backup is comp…

71c394f

…lete (apache#6089) Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>

ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Sep 2, 2024

HBASE-28697 Don't clean bulk load system entries until backup is comp…

29886f7

…lete (apache#6089) Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>

ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Sep 2, 2024

HBASE-28697 Don't clean bulk load system entries until backup is comp…

54bd0a4

…lete (apache#6089) Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>

ndimiduk pushed a commit that referenced this pull request Sep 2, 2024

HBASE-28697 Don't clean bulk load system entries until backup is comp…

a030e80

…lete (#6089) Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>

ndimiduk pushed a commit that referenced this pull request Sep 2, 2024

HBASE-28697 Don't clean bulk load system entries until backup is comp…

ac5d492

…lete (#6089) Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>

ndimiduk pushed a commit that referenced this pull request Sep 2, 2024

HBASE-28697 Don't clean bulk load system entries until backup is comp…

77eed85

…lete (#6089) Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HBASE-28697 Don't clean bulk load system entries until backup is complete #6089

HBASE-28697 Don't clean bulk load system entries until backup is complete #6089

rmdmattingly commented Jul 17, 2024

Apache-HBase commented Jul 17, 2024

Apache-HBase commented Jul 17, 2024

ndimiduk left a comment

ndimiduk Jul 18, 2024

rmdmattingly Jul 19, 2024

ndimiduk Sep 2, 2024

rmdmattingly commented Aug 27, 2024

ndimiduk Sep 2, 2024

DieterDP-ng commented Sep 5, 2024

HBASE-28697 Don't clean bulk load system entries until backup is complete #6089

HBASE-28697 Don't clean bulk load system entries until backup is complete #6089

Conversation

rmdmattingly commented Jul 17, 2024

Apache-HBase commented Jul 17, 2024

Apache-HBase commented Jul 17, 2024

ndimiduk left a comment

Choose a reason for hiding this comment

ndimiduk Jul 18, 2024

Choose a reason for hiding this comment

rmdmattingly Jul 19, 2024

Choose a reason for hiding this comment

ndimiduk Sep 2, 2024

Choose a reason for hiding this comment

rmdmattingly commented Aug 27, 2024

ndimiduk Sep 2, 2024

Choose a reason for hiding this comment

DieterDP-ng commented Sep 5, 2024