Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
…ikv#9448) cherry-pick tikv#9283 to release-4.0 --- <!-- Thank you for contributing to TiKV! If you haven't already, please read TiKV's [CONTRIBUTING](https://github.com/tikv/tikv/blob/master/CONTRIBUTING.md) document. If you're unsure about anything, just ask; somebody should be along to answer within a day or two. PR Title Format: 1. module [, module2, module3]: what's changed 2. *: what's changed If you want to open the **Challenge Program** pull request, please use the following template: https://raw.githubusercontent.com/tikv/.github/master/.github/PULL_REQUEST_TEMPLATE/challenge-program.md You can use it with query parameters: https://github.com/tikv/tikv/compare/master...${you branch}?template=challenge-program.md --> ### What problem does this PR solve? Issue Number: close tikv#9144 <!-- REMOVE this line if no issue to close --> Problem Summary: BR will read all data of a region and fill it in a SST writer. But it is in-memory. If there is a huge region, TiKV may crash for OOM because of keeping all data of this region in memory. ### What is changed and how it works? What's Changed: Record the written txn entries' size. When it reaches `region_max_size`, we will save the data cached in RocksDB to a SST file and then switch to the next file. ### Related changes - Need to cherry-pick to the release branch ### Check List <!--REMOVE the items that are not applicable--> Tests <!-- At least one of them must be included. --> - Unit test - Integration test - Manual test (add detailed scripts or steps below) 1. Set `sst-max-size` to 15MiB. ``` mysql> select * from CLUSTER_CONFIG where `TYPE`="tikv"; +------+-----------------+---------------------------------------------------------------+------------------------------------------------------+ | TYPE | INSTANCE | KEY | VALUE | +------+-----------------+---------------------------------------------------------------+------------------------------------------------------+ | tikv | 127.0.0.1:20160 | backup.batch-size | 8 | | tikv | 127.0.0.1:20160 | backup.num-threads | 9 | | tikv | 127.0.0.1:20160 | backup.sst-max-size | 15MiB | ... ``` 2. Backup around 100MB data(without compaction) successfully. ``` $ ./br backup full -s ./backup --pd http://127.0.0.1:2379 Full backup <--------------------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% Checksum <-----------------------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% [2020/12/31 14:39:12.534 +08:00] [INFO] [collector.go:60] ["Full backup Success summary: total backup ranges: 2, total success: 2, total failed: 0, total take(Full backup time): 4.273097395s, total take(real time): 8.133315406s, total kv: 8000000, total size(MB): 361.27, avg speed(MB/s): 84.55"] ["backup checksum"=901.754111ms] ["backup fast checksum"=6.09384ms] ["backup total regions"=10] [BackupTS=421893700168974340] [Size=48023090] ``` 3. The big region can be split into several files: ``` -rw-r--r-- 1 * * 1.5M Dec 31 14:39 1_60_28_74219326eeb0a4ae3a0f5190f7784132bb0e44791391547ef66862aaeb668579_1609396745730_write.sst -rw-r--r-- 1 * * 1.2M Dec 31 14:39 1_60_28_b7a5509d9912c66a21589d614cfc8828acd4051a7eeea3f24f5a7b337b5a389e_1609396746062_write.sst -rw-r--r-- 1 * * 1.5M Dec 31 14:39 1_60_28_cdcc2ce1c18a30a2b779b574f64de9f0e3be81c2d8720d5af0a9ef9633f8fbb7_1609396745429_write.sst -rw-r--r-- 1 * * 2.4M Dec 31 14:39 1_62_28_4259e616a6e7b70c33ee64af60230f3e4160af9ac7aac723f033cddf6681826a_1609396747038_write.sst -rw-r--r-- 1 * * 2.4M Dec 31 14:39 1_62_28_5d0de44b65fb805e45c93278661edd39792308c8ce90855b54118c4959ec9f16_1609396746731_write.sst -rw-r--r-- 1 * * 2.4M Dec 31 14:39 1_62_28_ef7ab4b5471b088ee909870e316d926f31f4f6ec771754690eac61af76e8782c_1609396747374_write.sst -rw-r--r-- 1 * * 1.5M Dec 31 14:39 1_64_29_74211aae8215fe9cde8bd7ceb8494afdcc18e5c6a8c5830292a577a9859d38e1_1609396746671_write.sst -rw-r--r-- 1 * * 1.2M Dec 31 14:39 1_64_29_81e152c98742938c1662241fac1c841319029e800da6881d799a16723cb42888_1609396747010_write.sst -rw-r--r-- 1 * * 1.5M Dec 31 14:39 1_64_29_ce0dde9826aee9e5ccac0a516f18b9871d3897effd559ff7450b8e56ac449bbd_1609396746349_write.sst -rw-r--r-- 1 * * 78 Dec 31 14:39 backup.lock -rw-r--r-- 1 * * 229K Dec 31 14:39 backupmeta ``` 4. Restore backuped data. It works successfully and passes the manual check. ``` ./br restore full -s ./backup --pd http://127.0.0.1:2379 Full restore <-------------------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% [2020/12/31 14:42:49.983 +08:00] [INFO] [collector.go:60] ["Full restore Success summary: total restore files: 27, total success: 27, total failed: 0, total take(Full restore time): 5.063048828s, total take(real time): 7.84620924s, total kv: 8000000, total size(MB): 361.27, avg speed(MB/s): 71.36"] ["split region"=26.217737ms] ["restore checksum"=4.10792638s] ["restore ranges"=26] [Size=48023090] ``` ### Release note <!-- bugfixes or new feature need a release note --> - Fix the problem that TiKV OOM when we backup a huge region.
- Loading branch information