xtrabackup: Better support for large datasets #5065

enisoc · 2019-08-08T23:15:52Z

Changes to improve the Vitess xtrabackup integration for use with large datasets:

Stream stderr in the background instead of waiting until the end. This is needed for long-running backups so that the xtrabackup process doesn't block after the write buffer fills up. It's also nice for checking in on progress during a long upload.
Use move-back instead of copy-back so the disk doesn't need 2x the space to restore. We download backups from remote storage on every restore, so there's no need to keep a copy of the original downloaded files on local disk.
Store stream mode (tar vs xbstream) in the manifest so going forward it will be possible to restore from either one, regardless of the current flag setting for creating new backups.
Support optional data striping to parallelize compression/decompression and file upload/download. The striping parameters are stored in the manifest so the flags for new backups don't have to match in order to restore an old one.

Signed-off-by: Anthony Yeh enisoc@planetscale.com

deepthi

looks good so far

enisoc · 2019-08-09T02:45:15Z

I added some more changes to hopefully fix this for long restores too. I'll retry the large test DB with both this and #5066 in a custom build.

I also changed how we search for the replication position in the xtrabackup log since in my first test it found "" (empty string) and considered that a valid position (empty GTID set). I don't think we want to allow that.

This is needed for long-running backups so that the xtrabackup process doesn't block after the write buffer fills up. It's also nice for checking in on progress during a long upload. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

enisoc · 2019-08-09T06:03:28Z

The backup side of this worked well on our 250G/shard test DB, which takes 1.5hrs to back up. I'll report back once I've tested the restore side.

Direct write didn't use Infof() so there was no timestamp. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

To avoid requiring 2x disk space upon restore. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

enisoc · 2019-08-10T18:25:41Z

This should be ready for review now. I ended up broadening the scope of this PR to generally supporting large backups/restores with xtrabackup. I've tested it on our 250GB/shard keyspace and it passed.

The data striping seems to get us back to parity with backup/restore times for the same keyspace using the built-in backup engine (which compresses and uploads each file in the data dir independently). Before adding striping, xtrabackup was between 2x and 8x slower on my test keyspace because decompression and upload/download were single-threaded.

The xbstream format could technically support parallel compression/decompression on its own without striping, but not without extra disk space to store the compressed and decompressed content of a given file at the same time. You either risk running out of disk space and failing to restore, or you run with extra disk space that's only used during restore and is wasted otherwise. Also, without striping, even parallel xbstream would still bottleneck into a single destination file upload/download. Blob stores like S3 and GCS get better throughput across multiple files than for a single file.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

deepthi

Nice work! LGTM.

enisoc requested a review from deepthi August 8, 2019 23:15

deepthi reviewed Aug 8, 2019

View reviewed changes

enisoc force-pushed the xtrabackup-stream-logs branch 4 times, most recently from e7421f0 to 904e3aa Compare August 9, 2019 02:43

xtrabackup: Stream stderr to logs.

b56856b

This is needed for long-running backups so that the xtrabackup process doesn't block after the write buffer fills up. It's also nice for checking in on progress during a long upload. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

enisoc force-pushed the xtrabackup-stream-logs branch from 904e3aa to b56856b Compare August 9, 2019 02:46

enisoc added 3 commits August 9, 2019 14:19

Scan lines to send to logger instead of direct write.

36fef02

Direct write didn't use Infof() so there was no timestamp. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

xtrabackup: Add verbose flag to tar.

4cb274d

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Use move-back instead of copy-back.

93074c5

To avoid requiring 2x disk space upon restore. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

enisoc changed the title ~~xtrabackup: Stream stderr to logs.~~ xtrabackup: Better support for large datasets Aug 10, 2019

enisoc added 3 commits August 9, 2019 23:05

Clean up Close() error handling.

fdce969

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Add verbose flag to xbstream extraction command.

af13447

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Add data striping for xtrabackup.

b185573

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

enisoc force-pushed the xtrabackup-stream-logs branch from 09e709d to af13447 Compare August 10, 2019 17:13

enisoc marked this pull request as ready for review August 10, 2019 18:25

enisoc requested a review from sougou as a code owner August 10, 2019 18:25

enisoc requested a review from deepthi August 10, 2019 18:28

enisoc added 2 commits August 10, 2019 14:58

Add unit test for data striping.

870a07f

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

Test data striping in integration test.

11391f1

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>

deepthi reviewed Aug 11, 2019

View reviewed changes

deepthi approved these changes Aug 11, 2019

View reviewed changes

deepthi merged commit 7e99841 into vitessio:master Aug 11, 2019

enisoc deleted the xtrabackup-stream-logs branch August 11, 2019 02:35

arka-g mentioned this pull request Sep 11, 2019

Slack vitess 2019.08.26.r0 tinyspeck/vitess#137

Merged

deepthi mentioned this pull request Oct 11, 2019

xtrabackup is not documented vitessio/website#261

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xtrabackup: Better support for large datasets #5065

xtrabackup: Better support for large datasets #5065

enisoc commented Aug 8, 2019 •

edited

Loading

deepthi left a comment

enisoc commented Aug 9, 2019

enisoc commented Aug 9, 2019

enisoc commented Aug 10, 2019 •

edited

Loading

deepthi left a comment

xtrabackup: Better support for large datasets #5065

xtrabackup: Better support for large datasets #5065

Conversation

enisoc commented Aug 8, 2019 • edited Loading

deepthi left a comment

Choose a reason for hiding this comment

enisoc commented Aug 9, 2019

enisoc commented Aug 9, 2019

enisoc commented Aug 10, 2019 • edited Loading

deepthi left a comment

Choose a reason for hiding this comment

enisoc commented Aug 8, 2019 •

edited

Loading

enisoc commented Aug 10, 2019 •

edited

Loading