-
Notifications
You must be signed in to change notification settings - Fork 85
Benchmark against mydumper #4
Comments
We should already have run several dump checks against Mydumper but we have never collected any concrete data. Anyway the timing is similar or faster. |
ScopeTest the performance of TiDB Dumpling and similar tools for dumping a large table from a MySQL 8.0 server. SetupFor this I used two AWS machines, one for running MySQL and another one for the dump process. I used the The security group I created for this had 3 rules for inbound traffic:
Machine setupMySQL hostThe first thing to do is to setup the NVMe device as storage for MySQL.
Entry to add for
Then we need to install MySQL Server yum install https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
yum install -y mysql-community-server mysql-shell unzip perf sysstat
systemctl enable --now mysqld.service MySQL sets up a random root password during the installation. This passwords needs to be changed for the account the be usable. For this we need to get the password from mysql -p ALTER USER 'root'@'localhost' IDENTIFIED BY 'pashysVotubOcirgA-0'; And create [client]
user=root
password="pashysVotubOcirgA-0"
[mysql]
prompt="\u@\h [\d] > " Run To configure MySQL: SELECT 6.5*1024*1024*1024;
SET PERSIST local_infile=ON;
SET PERSIST innodb_buffer_pool_size=5*1024*1024*1024;
SET PERSIST innodb_io_capacity_max=200000;
SET PERSIST innodb_io_capacity=20000;
SET PERSIST_ONLY innodb_log_file_size=4*1024*1024*1024;
RESTART; Now follow https://docs.pingcap.com/tidb/stable/import-example-data to:
To import the data we will use MySQL Shell's
importConfig = {
"columns": [
"duration",
"start_date",
"end_date",
"start_station_number",
"start_station",
"end_station_number",
"end_station",
"bike_number",
"member_type"
],
"dialect": "csv",
"schema": "bikeshare",
"showprogress": true,
"skipRows": 1,
"table": "trips"
}
for (let f of ["2010-capitalbikeshare-tripdata.csv",
"2013Q2-capitalbikeshare-tripdata.csv",
"2015Q1-capitalbikeshare-tripdata.csv",
"2016Q4-capitalbikeshare-tripdata.csv",
"2011-capitalbikeshare-tripdata.csv",
"2013Q3-capitalbikeshare-tripdata.csv",
"2015Q2-capitalbikeshare-tripdata.csv",
"2017Q1-capitalbikeshare-tripdata.csv",
"2012Q1-capitalbikeshare-tripdata.csv",
"2013Q4-capitalbikeshare-tripdata.csv",
"2015Q3-capitalbikeshare-tripdata.csv",
"2017Q2-capitalbikeshare-tripdata.csv",
"2012Q2-capitalbikeshare-tripdata.csv",
"2014Q1-capitalbikeshare-tripdata.csv",
"2015Q4-capitalbikeshare-tripdata.csv",
"2017Q3-capitalbikeshare-tripdata.csv",
"2012Q3-capitalbikeshare-tripdata.csv",
"2014Q2-capitalbikeshare-tripdata.csv",
"2016Q1-capitalbikeshare-tripdata.csv",
"2017Q4-capitalbikeshare-tripdata.csv",
"2012Q4-capitalbikeshare-tripdata.csv",
"2014Q3-capitalbikeshare-tripdata.csv",
"2016Q2-capitalbikeshare-tripdata.csv",
"2013Q1-capitalbikeshare-tripdata.csv",
"2014Q4-capitalbikeshare-tripdata.csv",
"2016Q3-capitalbikeshare-tripdata.csv"
]) {
util.importTable(f, importConfig)
} This results in a 2.52 GiB trips table:
Setup dump user and restrict access to the IP of the other host: create user 'dump'@'172.31.0.201' identified by 'lotGiSluAfvoff-0';
grant all on bikeshare.* to 'dump'@'172.31.0.201';
grant reload, replication client on *.* to 'dump'@'172.31.0.201';
grant select on mysql.* to 'dump'@'172.31.0.201'; -- needed by util.DumpTables
grant BACKUP_ADMIN on *.* to 'dump'@'172.31.0.201'; -- needed by util.DumpTables Dump hostThis is the machine running Dumpling, MyDumper, mysqldump, etc.
Entry to add for
Create a
Install tiup and dumpling
Install mydumper
Testmysqldump
TiDB Dumpling
The query it is running is this:
Doesnt' look like it does anything in parallel by default. After adding
And the time looks like this:
MySQL Shell util.dumpTables
Here we can see multiple threads
The output is compressed with zstd, we can use uncompressed by adding Without compresssion:
mydumper
After adding
and:
Test with more dataI generated more data by running this multiple times: select max(trip_id) from trips into @offset;
insert into trips
select @offset+trip_id, duration, start_date, end_date,start_station_number,
start_station,end_station_number,end_station,bike_number,member_type from trips
limit 100000; This resulted in a 13G
Now dumpling caused the disk on the dumpling host to go to 100% util. Adding With compression (CPU bound):
Without compression (I/O bound):
Testing mydumper:
And with
And with
And with
versions
Follow-ups
|
Compare the speed of Dumpling on a database against Mydumper, on a large (> 10 GB) database.
Score
The text was updated successfully, but these errors were encountered: