Use mydumper to back up a database containing many tables, very slow #11733

AlexNewTao · 2019-08-13T11:37:09Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

What did you do?

A：The number of tables in the backup target database is very large. There are more than 70,000 tables. The total amount of data is 1.2T from the monitoring. When I use mydumper for backup, I backed up 8 hours and backed up 40,000 tables. The amount of data backed up is only 20g. (not backed up)

I want to ask:
1）Why is it so slow when backing up many tables? What did mydumper do during this period?

2）Backing up a database with a particularly large number of database tables, is there any suggestion to speed up database backups for multiple tables?

B： I also encountered problems with tidb oom. At present, the version of tidb is 2.1.4. When I back up a database with a large amount of data, the situation of tidb oom will appear. Check the official website's issue, explaining the reason for the tidb version. When backing up, load all the data into the memory, which causes the tidb oom. The version after tidb2.1.13 fixes this.

What did you expect to see?

When the database is backed up with a lot of tables, the speed will be very fast.

What did you see instead?

Very slow when backing up a database with a large number of tables

What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)?

tidb version 2.1.4

The text was updated successfully, but these errors were encountered:

csuzhangxc · 2019-08-15T05:21:35Z

@AlexNewTao

What did mydumper do during this period

fetch the database & table list; read the data from tables (SELECT) and write them to the file.

Why is it so slow when backing up many tables
is there any suggestion to speed up database backups for multiple table

How many threads are you using? (-t, --threads, Number of threads to use, default 4).

Can you provide the arguments used for mydumper?

AlexNewTao · 2019-08-16T02:40:28Z

@csuzhangxc thanks for your support!

How many threads are you using? (-t, --threads, Number of threads to use, default 4).

16 threads use for mydumper, but it still very slow.

the command is

./mydumper -u '%s' -p '%s' -h '%s' -P '%s'  -t 16 -F 64 -o '%s' -z '%s' " %(backupuser,password, host,port,dirname,timesnapshot)

here anthor question:
I want to verify that tidb is in the backup, because of the version of the oom problem.
Before the tidb2.1.13 version, during backup, tidb will load all the data into memory, which will easily lead to oom. Now if I use tidb version 3.0.2, the other service version is still verified by 2.1.4, so there will be Is there a compatibility issue? Because the 3.0 version is quite large compared to the previous changes.

you said

fetch the database & table list; read the data from tables (SELECT) and write them to the file.

I want to know what optimizations have been made in memory usage when using the mydumper for backups before and after version 2.1.13?

thanks！

csuzhangxc · 2019-08-22T05:26:06Z

@AlexNewTao

What is the size of your largest table?

Can you try our latest version of mydumper? It uses TiDB's _row_id to support dumping a single TiDB table concurrently (but needing to use -r not -F), and it can dump a single table 6x faster than the previous version.

I want to know what optimizations have been made in memory usage when using the mydumper for backups before and after version 2.1.13?

just Fix the issue that pb memory cannot be released quickly., see #10815

AlexNewTao · 2019-08-26T02:38:37Z

@csuzhangxc

The largest table is about 5 billion rows, about 400g in size.

Ok, I use the latest mydumper tool and test it with the -r parameter.

Thank you for your support.

csuzhangxc · 2019-08-29T03:50:43Z

@AlexNewTao
Does latest mydumper (with -r) solve your problem now?

AlexNewTao · 2019-09-06T09:07:53Z

it did not solve my problem

(1) Test with the -r parameter
Test version 3.0.2
When the -r parameter is used, this happens, that is, the backup is always in progress, but in the directory where the corresponding backup data is located, the backup data is not found, and there is no abnormal exit.

When you use the -r parameter to back up, is it a backup of the database with a lot of tables?

If the single table is very large, what kind of configuration parameters does mydumper use for backup?

(2) Upgrade the tidb version test
When using version 3.0.2, when the amount of backup data is relatively large (I did not specify the -r parameter and did not specify the -F parameter), the memory overhead of the tidb service is still very large.

Tested data: a table inserted by sysbench, 32 tables, 10000000 rows of data per table,

At the time of backup, the memory overhead of tidb increased from 200M to 35G in a short time, and the overhead was still very large.

csuzhangxc · 2019-09-10T05:42:41Z

@AlexNewTao

what version of mydumper are you using?

-r is used to backup a single large table concurrently. -r, --rows Try to split tables into chunks of this many rows. This option turns off --chunk-filesize.

when backuping, can you SHOW PROCESSLIST to verify the dump threads?

AlexNewTao · 2019-09-16T02:59:04Z

the version is
mydumper 0.9.5 , built against MySQL 5.7.24;

csuzhangxc · 2019-09-16T03:03:56Z

It's not the latest release, you can download the latest via https://pingcap.com/docs-cn/v3.0/reference/tools/download/#syncer-loader-%E5%92%8C-mydumper.

AlexNewTao · 2019-09-16T03:12:15Z

thanks, i'll test with the latest mydumper , what your suggestion about r param value in backup big data with the size of 1T;

csuzhangxc · 2019-09-16T03:20:53Z

-r can be something like 256MB / avg-row-size.

AlexNewTao · 2019-09-19T08:11:20Z

Thank you, when I use the latest tool test, with the -r parameter, the memory overhead of tidb has dropped significantly. When backing up big data sets, tidb will not oom;

I still have a question

When using the -r parameter, if you specify the -r size to 1000000, it means that the size of each file is cut according to the 1000000 rows of the table. If you use the loader for recovery, because the single transaction of the tidb has the following restrictions:

A single transaction contains no more than 5000 SQL statements (default)
No more than 6MB per key-value pair
The total number of key-value pairs does not exceed 300,000
The total size of the key-value pairs does not exceed 100MB

When recovering as described above, a single transaction reads the recovered data for recovery. What is the size of the sql of a transaction? Is there a problem with this?

AlexNewTao · 2019-09-19T08:35:18Z

Found in the cut file when looking for the insert statement

1000000 rows of data, corresponding to 205 inserts, each insert inserts about 4890 rows of data, less than the set 5000 rows;

In other words, when mydumper performs backup, it will automatically cut the file. The data volume of sql is less than the given 5000 rows, regardless of the number specified by the -r parameter.

AlexNewTao added the type/bug The issue is confirmed as a bug. label Aug 13, 2019

AlexNewTao closed this as completed Sep 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use mydumper to back up a database containing many tables, very slow #11733

Use mydumper to back up a database containing many tables, very slow #11733

AlexNewTao commented Aug 13, 2019

csuzhangxc commented Aug 15, 2019

AlexNewTao commented Aug 16, 2019 •

edited

Loading

csuzhangxc commented Aug 22, 2019

AlexNewTao commented Aug 26, 2019

csuzhangxc commented Aug 29, 2019

AlexNewTao commented Sep 6, 2019

csuzhangxc commented Sep 10, 2019

AlexNewTao commented Sep 16, 2019

csuzhangxc commented Sep 16, 2019

AlexNewTao commented Sep 16, 2019

csuzhangxc commented Sep 16, 2019

AlexNewTao commented Sep 19, 2019

AlexNewTao commented Sep 19, 2019

Use mydumper to back up a database containing many tables, very slow #11733

Use mydumper to back up a database containing many tables, very slow #11733

Comments

AlexNewTao commented Aug 13, 2019

Bug Report

csuzhangxc commented Aug 15, 2019

AlexNewTao commented Aug 16, 2019 • edited Loading

csuzhangxc commented Aug 22, 2019

AlexNewTao commented Aug 26, 2019

csuzhangxc commented Aug 29, 2019

AlexNewTao commented Sep 6, 2019

csuzhangxc commented Sep 10, 2019

AlexNewTao commented Sep 16, 2019

csuzhangxc commented Sep 16, 2019

AlexNewTao commented Sep 16, 2019

csuzhangxc commented Sep 16, 2019

AlexNewTao commented Sep 19, 2019

AlexNewTao commented Sep 19, 2019

AlexNewTao commented Aug 16, 2019 •

edited

Loading