Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 24 additions & 11 deletions src/main/asciidoc/_chapters/backup_restore.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -804,16 +804,28 @@ providing a comparable level of security. This is a manual step which users *mus
[[br.technical.details]]
== Technical Details of Incremental Backup and Restore

HBase incremental backups enable more efficient capture of HBase table images than previous attempts at serial backup and restore
solutions, such as those that only used HBase Export and Import APIs. Incremental backups use Write Ahead Logs (WALs) to capture
the data changes since the previous backup was created. A WAL roll (create new WALs) is executed across all RegionServers to track
the WALs that need to be in the backup.

After the incremental backup image is created, the source backup files usually are on same node as the data source. A process similar
to the DistCp (distributed copy) tool is used to move the source backup files to the target file systems. When a table restore operation
starts, a two-step process is initiated. First, the full backup is restored from the full backup image. Second, all WAL files from
incremental backups between the last full backup and the incremental backup being restored are converted to HFiles, which the HBase
Bulk Load utility automatically imports as restored data in the table.
HBase incremental backups enable more efficient capture of HBase table images than previous attempts
at serial backup and restore solutions, such as those that only used HBase Export and Import APIs.
Incremental backups use Write Ahead Logs (WALs) to capture the data changes since the
previous backup was created. A WAL roll (create new WALs) is executed across all RegionServers
to track the WALs that need to be in the backup.
In addition to WALs, incremental backups also track bulk-loaded HFiles for tables under backup.

Incremental backup gathers all WAL files generated since the last backup from the source cluster,
converts them to HFiles in a `.tmp` directory under the `BACKUP_ROOT`, and then moves these
HFiles to their final location under the backup root directory to form the backup image.
It also reads bulk load records from the backup system table, forms the paths for the corresponding
bulk-loaded HFiles, and copies those files to the backup destination.
Bulk-loaded files are preserved (not deleted by cleaner chores) until they've been included in a
backup (for each backup root).
A process similar to the DistCp (distributed copy) tool is used to move the backup files to the
target file system.

When a table restore operation starts, a two-step process is initiated.
First, the full backup is restored from the full backup image.
Second, all HFiles from incremental backups between the last full backup and the incremental backup
being restored (including bulk-loaded HFiles) are bulk loaded into the table using the
HBase Bulk Load utility.

You can only restore on a live HBase cluster because the data must be redistributed to complete the restore operation successfully.

Expand Down Expand Up @@ -872,8 +884,9 @@ data at the full 80MB/s and `-w` is used to limit the job from spawning 16 worke

Like we did for full backups, we have to understand the incremental backup process to approximate its runtime and cost.

* Identify new write-ahead logs since last full or incremental backup: negligible. Apriori knowledge from the backup system table(s).
* Identify new write-ahead logs since the last full or incremental backup: negligible. Apriori knowledge from the backup system table(s).
* Read, filter, and write "minimized" HFiles equivalent to the WALs: dominated by the speed of writing data. Relative to write speed of HDFS.
* Read bulk load records from the backup system table, form the paths for bulk-loaded HFiles, and copy them to the backup destination.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hgromer - I think you can contribute a summary of HBASE-27659 to this part of the HBase docs.

* DistCp the HFiles to the destination: <<br.export.snapshot.cost,see above>>.

For the second step, the dominating cost of this operation would be the re-writing the data (under the assumption that a majority of the
Expand Down