Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use mydumper to back up a database containing many tables, very slow #11733

Closed
AlexNewTao opened this issue Aug 13, 2019 · 13 comments
Closed

Use mydumper to back up a database containing many tables, very slow #11733

AlexNewTao opened this issue Aug 13, 2019 · 13 comments
Labels
type/bug The issue is confirmed as a bug.

Comments

@AlexNewTao
Copy link

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?

A:The number of tables in the backup target database is very large. There are more than 70,000 tables. The total amount of data is 1.2T from the monitoring. When I use mydumper for backup, I backed up 8 hours and backed up 40,000 tables. The amount of data backed up is only 20g. (not backed up)

I want to ask:
1)Why is it so slow when backing up many tables? What did mydumper do during this period?

2)Backing up a database with a particularly large number of database tables, is there any suggestion to speed up database backups for multiple tables?

B: I also encountered problems with tidb oom. At present, the version of tidb is 2.1.4. When I back up a database with a large amount of data, the situation of tidb oom will appear. Check the official website's issue, explaining the reason for the tidb version. When backing up, load all the data into the memory, which causes the tidb oom. The version after tidb2.1.13 fixes this.

  1. What did you expect to see?

When the database is backed up with a lot of tables, the speed will be very fast.

  1. What did you see instead?

Very slow when backing up a database with a large number of tables

  1. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)?

tidb version 2.1.4

@AlexNewTao AlexNewTao added the type/bug The issue is confirmed as a bug. label Aug 13, 2019
@csuzhangxc
Copy link
Member

@AlexNewTao

What did mydumper do during this period

fetch the database & table list; read the data from tables (SELECT) and write them to the file.

Why is it so slow when backing up many tables
is there any suggestion to speed up database backups for multiple table

How many threads are you using? (-t, --threads, Number of threads to use, default 4).

Can you provide the arguments used for mydumper?

@AlexNewTao
Copy link
Author

AlexNewTao commented Aug 16, 2019

@csuzhangxc thanks for your support!

How many threads are you using? (-t, --threads, Number of threads to use, default 4).

16 threads use for mydumper, but it still very slow.

the command is

./mydumper -u '%s' -p '%s' -h '%s' -P '%s'  -t 16 -F 64 -o '%s' -z '%s' " %(backupuser,password, host,port,dirname,timesnapshot)

here anthor question:
I want to verify that tidb is in the backup, because of the version of the oom problem.
Before the tidb2.1.13 version, during backup, tidb will load all the data into memory, which will easily lead to oom. Now if I use tidb version 3.0.2, the other service version is still verified by 2.1.4, so there will be Is there a compatibility issue? Because the 3.0 version is quite large compared to the previous changes.

you said

fetch the database & table list; read the data from tables (SELECT) and write them to the file.

I want to know what optimizations have been made in memory usage when using the mydumper for backups before and after version 2.1.13?

thanks!

@csuzhangxc
Copy link
Member

@AlexNewTao

What is the size of your largest table?

Can you try our latest version of mydumper? It uses TiDB's _row_id to support dumping a single TiDB table concurrently (but needing to use -r not -F), and it can dump a single table 6x faster than the previous version.

I want to know what optimizations have been made in memory usage when using the mydumper for backups before and after version 2.1.13?

just Fix the issue that pb memory cannot be released quickly., see #10815

@AlexNewTao
Copy link
Author

@csuzhangxc

The largest table is about 5 billion rows, about 400g in size.

Ok, I use the latest mydumper tool and test it with the -r parameter.

Thank you for your support.

@csuzhangxc
Copy link
Member

@AlexNewTao
Does latest mydumper (with -r) solve your problem now?

@AlexNewTao
Copy link
Author

it did not solve my problem

(1) Test with the -r parameter
Test version 3.0.2
When the -r parameter is used, this happens, that is, the backup is always in progress, but in the directory where the corresponding backup data is located, the backup data is not found, and there is no abnormal exit.

When you use the -r parameter to back up, is it a backup of the database with a lot of tables?

If the single table is very large, what kind of configuration parameters does mydumper use for backup?

(2) Upgrade the tidb version test
When using version 3.0.2, when the amount of backup data is relatively large (I did not specify the -r parameter and did not specify the -F parameter), the memory overhead of the tidb service is still very large.

Tested data: a table inserted by sysbench, 32 tables, 10000000 rows of data per table,

At the time of backup, the memory overhead of tidb increased from 200M to 35G in a short time, and the overhead was still very large.

@csuzhangxc
Copy link
Member

@AlexNewTao

what version of mydumper are you using?

-r is used to backup a single large table concurrently. -r, --rows Try to split tables into chunks of this many rows. This option turns off --chunk-filesize.

when backuping, can you SHOW PROCESSLIST to verify the dump threads?

@AlexNewTao
Copy link
Author

the version is
mydumper 0.9.5 , built against MySQL 5.7.24;

@csuzhangxc
Copy link
Member

It's not the latest release, you can download the latest via https://pingcap.com/docs-cn/v3.0/reference/tools/download/#syncer-loader-%E5%92%8C-mydumper.

@AlexNewTao
Copy link
Author

thanks, i'll test with the latest mydumper , what your suggestion about r param value in backup big data with the size of 1T;

@csuzhangxc
Copy link
Member

-r can be something like 256MB / avg-row-size.

@AlexNewTao
Copy link
Author

Thank you, when I use the latest tool test, with the -r parameter, the memory overhead of tidb has dropped significantly. When backing up big data sets, tidb will not oom;

I still have a question

When using the -r parameter, if you specify the -r size to 1000000, it means that the size of each file is cut according to the 1000000 rows of the table. If you use the loader for recovery, because the single transaction of the tidb has the following restrictions:

A single transaction contains no more than 5000 SQL statements (default)
No more than 6MB per key-value pair
The total number of key-value pairs does not exceed 300,000
The total size of the key-value pairs does not exceed 100MB

When recovering as described above, a single transaction reads the recovered data for recovery. What is the size of the sql of a transaction? Is there a problem with this?

@AlexNewTao
Copy link
Author

Found in the cut file when looking for the insert statement

1000000 rows of data, corresponding to 205 inserts, each insert inserts about 4890 rows of data, less than the set 5000 rows;

In other words, when mydumper performs backup, it will automatically cut the file. The data volume of sql is less than the given 5000 rows, regardless of the number specified by the -r parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

2 participants