Restore from S3 compatible API? #69

morgo · 2018-09-14T14:54:30Z

A feature request for your roadmap:

Can it be possible to restore directly from a mydumper backup stored in S3? In most cloud deployments this is where user backups will be stored (the S3 API is implemented by many other object stores).

Value

Value description

Support restore to TiDB via S3.

Value score

(TBD) / 5

Workload estimation

(TBD)

Time

GanttStart: 2020-07-27
GanttDue: 2020-09-04
GanttProgress: 100%

kennytm · 2018-09-14T15:44:03Z

(Now also tracked in Jira as TOOL-362)

We plan to support importing from Zip, FTP and maybe HDFS in version v1.1 by the end of 2018. I adding S3 API support is not hard if it can be abstracted as a VFS or something similar.

gregwebs · 2018-09-14T15:49:49Z

There are several S3 Fuse projects. I don't think it should be terribly difficult to make a VFS adapter (probably depends on error handling complexity). There are already apache VFS adapters.

gregwebs · 2018-10-18T15:22:23Z

We are starting to use go-cloud for cloud support. It also supports the filesystem as a backend.

kennytm · 2018-10-18T16:13:59Z

Nice! We could use the github.com/google/go-cloud/blob/* for actual cloud storage, but we can't use the fileblob due to the restriction:

Blob names must only contain alphanumeric characters, slashes, periods, spaces, underscores, and dashes.

I've seen at least one customer giving the table a Chinese name, and mydumper will not escape them in the filename, making this implementation not usable.

gregwebs · 2018-10-18T16:29:36Z

Yeah, looks like fileblob is just meant for testing purposes anyways, so two pathways (cloud or file) would still be needed.

tennix · 2020-07-03T06:11:50Z

Is there any update on this? To be more cloud-native, we need to support restoring from S3 storage. Though we have a workaround in tidb-operator to use rclone to download all backup files locally and then feed to tidb-lightning, it's time-consuming and not user friendly (users cannot determine how large a PV should be required).

overvenus · 2020-07-27T07:31:44Z

For the AWS Aurora scenario, Aurora exports data in CSV format, and it is partitioned into multiple files. It's worth taking into consideration. https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.SaveIntoS3.html#AuroraMySQL.Integrating.SaveIntoS3.Grant

kennytm · 2020-07-27T07:36:42Z

🤔 Partitioning into multiple files isn't a problem (it is actually desired). The problem is the file name does not end in *.csv.

s3-region://bucket-name/file-prefix.part_00000

overvenus · 2020-07-27T07:44:50Z

Can lightning choose file format based on its content?

Also, Aurora can export data into TEXT format, and the name is the same as the CSV format.

kennytm · 2020-07-27T08:06:08Z

it can but i don't trust Lightning to do so 🙂.

perhaps we need RFC 5 anyway.

glorv · 2020-07-27T08:50:23Z

Also, Aurora can export data into TEXT format, and the name is the same as the CSV format.

Seems TEXT format likes CSV format, but use TAB as the delimiter and write the raw value for each field. So It's not a very good format, e.g. if some string fields contains TAB, then it's hard to distigulish which TABs are the delimiter and which are field values. So we should recommend customers to use csv instead of text.

glorv · 2020-07-27T08:59:48Z

it can but i don't trust Lightning to do so 🙂.

perhaps we need RFC 5 anyway.

Shall we provide an option to allow use explicitly set the input files format to csv or sql if they are not end with them.
Or we may support the sepcial pattern for s3's file partition pattern like part_0000. In the future, if we support read from compressed files, we should also support partitioned compression files like schema.table.csv.001.tar.gz

glorv · 2020-07-27T09:10:41Z

it can but i don't trust Lightning to do so 🙂.

perhaps we need RFC 5 anyway.

Seems if we want to support partitioned files in s3 buckets or partitioned compression files, RFC 5 needs to be updated. And I afraid if the route rule is complex, it will be hard to teach user to use this feature

overvenus · 2020-07-27T11:43:42Z

For Aurora partition dump, Lightning could read Aurora dump manifest directly.

kennytm · 2020-07-27T12:51:16Z

That is a large departure from the existing model (walkdir the directory to discover files), and the existing model does work (if you don't scatter the data source around multiple irrelevant places), so I regard the manifest file support as low priority.

kennytm added the feature-request This issue is a feature request label Jan 21, 2019

morgo mentioned this issue Feb 14, 2019

Require less configuration #131

Open

ericsyh mentioned this issue May 23, 2019

Restore from NFS protocol #194

Closed

DanielZhangQD mentioned this issue Feb 6, 2020

Lightning streaming support pingcap/tidb-operator#1646

Open

kennytm added the priority/P2 Medium priority issue label May 28, 2020

kennytm added the difficulty/2-medium Medium-difficulty issue label May 28, 2020

kennytm mentioned this issue May 28, 2020

streaming from cloud object storage #233

Open

kennytm added priority/P1 High priority issue, must be solved before next release and removed priority/P2 Medium priority issue labels Jul 3, 2020

IANTHEREAL added feature/accepted and removed feature-request This issue is a feature request labels Jul 27, 2020

glorv self-assigned this Jul 27, 2020

overvenus mentioned this issue Jul 27, 2020

Enhance Data Migration from RDS to TiDB pingcap/tidb#18629

Closed

16 tasks

glorv mentioned this issue Jul 30, 2020

restore: support restore from s3 #361

Merged

kennytm closed this as completed in #361 Sep 4, 2020

scsldb added this to the 4.0.10 milestone Oct 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore from S3 compatible API? #69

Restore from S3 compatible API? #69

morgo commented Sep 14, 2018 •

edited by glorv

Loading

kennytm commented Sep 14, 2018 •

edited

Loading

gregwebs commented Sep 14, 2018

gregwebs commented Oct 18, 2018

kennytm commented Oct 18, 2018

gregwebs commented Oct 18, 2018

tennix commented Jul 3, 2020

overvenus commented Jul 27, 2020

kennytm commented Jul 27, 2020

overvenus commented Jul 27, 2020

kennytm commented Jul 27, 2020

glorv commented Jul 27, 2020

glorv commented Jul 27, 2020

glorv commented Jul 27, 2020

overvenus commented Jul 27, 2020

kennytm commented Jul 27, 2020 •

edited

Loading

Restore from S3 compatible API? #69

Restore from S3 compatible API? #69

Comments

morgo commented Sep 14, 2018 • edited by glorv Loading

Value

Value description

Value score

Workload estimation

Time

kennytm commented Sep 14, 2018 • edited Loading

gregwebs commented Sep 14, 2018

gregwebs commented Oct 18, 2018

kennytm commented Oct 18, 2018

gregwebs commented Oct 18, 2018

tennix commented Jul 3, 2020

overvenus commented Jul 27, 2020

kennytm commented Jul 27, 2020

overvenus commented Jul 27, 2020

kennytm commented Jul 27, 2020

glorv commented Jul 27, 2020

glorv commented Jul 27, 2020

glorv commented Jul 27, 2020

overvenus commented Jul 27, 2020

kennytm commented Jul 27, 2020 • edited Loading

morgo commented Sep 14, 2018 •

edited by glorv

Loading

kennytm commented Sep 14, 2018 •

edited

Loading

kennytm commented Jul 27, 2020 •

edited

Loading