Skip to content
This repository has been archived by the owner on Dec 8, 2021. It is now read-only.

streaming from cloud object storage #233

Open
gregwebs opened this issue Aug 25, 2019 · 3 comments
Open

streaming from cloud object storage #233

gregwebs opened this issue Aug 25, 2019 · 3 comments
Labels
5.0 request difficulty/2-medium Medium-difficulty issue feature-request This issue is a feature request priority/P2 Medium priority issue
Milestone

Comments

@gregwebs
Copy link

Feature Request

To restore a large amount of data, if we cannot stream it we must wait for the entire backup to be downloaded from object storage. Our restore process should instead look like this, which will make it quicker and reduce disk requirements (cost):

  1. Download the metadata
  2. Download chunks of data
  3. Apply the downloaded chunks of data while downloading new ones. Backpressure from applying should limit the downloading
@gregwebs gregwebs added the feature-request This issue is a feature request label Aug 25, 2019
@gregwebs gregwebs changed the title streaming from cloud storage streaming from cloud object storage Aug 25, 2019
@gregwebs
Copy link
Author

It is possible to try to do this by feeding data through stream-based tools to lightning. However, lightning currently expects a directory and the data may be stored as a gzip tarball.

So tidb-lightning needs to either handle the unpacking itself or be able to continue to look for new files showing up in the directory until it receives a signal that all is finished. It may also be possible to achieve this by running lightning separately for each table using table filter techniques.

@kennytm kennytm added the priority/P2 Medium priority issue label May 28, 2020
@kennytm kennytm added difficulty/2-medium Medium-difficulty issue difficulty/3-hard Hard issue and removed difficulty/2-medium Medium-difficulty issue labels May 28, 2020
@kennytm
Copy link
Collaborator

kennytm commented May 28, 2020

🤔 I assume this is more than #69?

@gregwebs
Copy link
Author

#69 does not require a streaming implementation to be completed.

Our new BR tool streams data to and from cloud storage nicely.

@kennytm kennytm added 5.0 request difficulty/2-medium Medium-difficulty issue and removed difficulty/3-hard Hard issue labels May 28, 2020
@kennytm kennytm added this to the v5.0.0 milestone Jun 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
5.0 request difficulty/2-medium Medium-difficulty issue feature-request This issue is a feature request priority/P2 Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants