Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xtrabackup: Better support for large datasets #5065

Merged
merged 9 commits into from
Aug 11, 2019

Conversation

enisoc
Copy link
Member

@enisoc enisoc commented Aug 8, 2019

Changes to improve the Vitess xtrabackup integration for use with large datasets:

  • Stream stderr in the background instead of waiting until the end. This is needed for long-running backups so that the xtrabackup process doesn't block after the write buffer fills up. It's also nice for checking in on progress during a long upload.
  • Use move-back instead of copy-back so the disk doesn't need 2x the space to restore. We download backups from remote storage on every restore, so there's no need to keep a copy of the original downloaded files on local disk.
  • Store stream mode (tar vs xbstream) in the manifest so going forward it will be possible to restore from either one, regardless of the current flag setting for creating new backups.
  • Support optional data striping to parallelize compression/decompression and file upload/download. The striping parameters are stored in the manifest so the flags for new backups don't have to match in order to restore an old one.

Fixes #5063

Signed-off-by: Anthony Yeh enisoc@planetscale.com

@enisoc enisoc requested a review from deepthi August 8, 2019 23:15
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good so far

@enisoc enisoc force-pushed the xtrabackup-stream-logs branch 4 times, most recently from e7421f0 to 904e3aa Compare August 9, 2019 02:43
@enisoc
Copy link
Member Author

enisoc commented Aug 9, 2019

I added some more changes to hopefully fix this for long restores too. I'll retry the large test DB with both this and #5066 in a custom build.

I also changed how we search for the replication position in the xtrabackup log since in my first test it found "" (empty string) and considered that a valid position (empty GTID set). I don't think we want to allow that.

This is needed for long-running backups so that the xtrabackup process
doesn't block after the write buffer fills up.

It's also nice for checking in on progress during a long upload.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
@enisoc enisoc force-pushed the xtrabackup-stream-logs branch from 904e3aa to b56856b Compare August 9, 2019 02:46
@enisoc
Copy link
Member Author

enisoc commented Aug 9, 2019

The backup side of this worked well on our 250G/shard test DB, which takes 1.5hrs to back up. I'll report back once I've tested the restore side.

enisoc added 3 commits August 9, 2019 14:19
Direct write didn't use Infof() so there was no timestamp.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
To avoid requiring 2x disk space upon restore.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
@enisoc enisoc changed the title xtrabackup: Stream stderr to logs. xtrabackup: Better support for large datasets Aug 10, 2019
enisoc added 3 commits August 9, 2019 23:05
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
@enisoc enisoc force-pushed the xtrabackup-stream-logs branch from 09e709d to af13447 Compare August 10, 2019 17:13
@enisoc
Copy link
Member Author

enisoc commented Aug 10, 2019

This should be ready for review now. I ended up broadening the scope of this PR to generally supporting large backups/restores with xtrabackup. I've tested it on our 250GB/shard keyspace and it passed.

The data striping seems to get us back to parity with backup/restore times for the same keyspace using the built-in backup engine (which compresses and uploads each file in the data dir independently). Before adding striping, xtrabackup was between 2x and 8x slower on my test keyspace because decompression and upload/download were single-threaded.

The xbstream format could technically support parallel compression/decompression on its own without striping, but not without extra disk space to store the compressed and decompressed content of a given file at the same time. You either risk running out of disk space and failing to restore, or you run with extra disk space that's only used during restore and is wasted otherwise. Also, without striping, even parallel xbstream would still bottleneck into a single destination file upload/download. Blob stores like S3 and GCS get better throughput across multiple files than for a single file.

@enisoc enisoc marked this pull request as ready for review August 10, 2019 18:25
@enisoc enisoc requested a review from sougou as a code owner August 10, 2019 18:25
@enisoc enisoc requested a review from deepthi August 10, 2019 18:28
enisoc added 2 commits August 10, 2019 14:58
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! LGTM.

@deepthi deepthi merged commit 7e99841 into vitessio:master Aug 11, 2019
@enisoc enisoc deleted the xtrabackup-stream-logs branch August 11, 2019 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

XtraBackup: Log start and progress of backup
2 participants