Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A smarter way to scatter batch regions #33937

Open
3pointer opened this issue Apr 13, 2022 · 4 comments
Open

A smarter way to scatter batch regions #33937

3pointer opened this issue Apr 13, 2022 · 4 comments
Labels
component/br This issue is related to BR of TiDB. component/lightning This issue is related to Lightning of TiDB. component/pd type/enhancement The issue or PR belongs to an enhancement.

Comments

@3pointer
Copy link
Contributor

Enhancement

Both Lightning and BR will do batch split & scatter regions job to make the data import evenly. but this seems not work as expected when we have lots of stores(100+).

basic there are three reasons:

  1. We are not get promised the scatter will successfully when batch_size is large.
  2. In most cases there's only one region(region_id=2) after cluster bootstrapped
  3. Especially for BR, the origin region is not scattered after one batch operation.
    _, newRegions, err := c.BatchSplitRegionsWithOrigin(ctx, regionInfo, keys)

According to the metrics. we can see that when request a 8k regions scatter. there are half(4.4k) failed to scatter.
image

To solve this issue.I think we can do a pre-split/scatter job.

Suppose we have lots of stores(100+) and we know the a batch split/scatter size is 8k. we can do the following things

  1. Sort 8k split keys.
  2. Choose the middle keys according to store count(suppose is 100, then we choose 99 keys can generate 100 parts).
  3. Split the region with the chosen 100 keys, and make sure these new split regions scatter successfully.
  4. For each rest part. Re-scan the new region with the part min/max keys.
  5. Finish the rest split and scatter.

With this change. we can make the full use of tikv store. make each of them do split scatter evenly.

@3pointer 3pointer added type/enhancement The issue or PR belongs to an enhancement. component/br This issue is related to BR of TiDB. component/lightning This issue is related to Lightning of TiDB. labels Apr 13, 2022
@nolouch
Copy link
Member

nolouch commented Apr 14, 2022

5> Finish the rest split and scatter.
Can it only split?

@3pointer
Copy link
Contributor Author

3pointer commented Apr 14, 2022

5> Finish the rest split and scatter. Can it only split?

I think only split is ok, it works for BR/Lightning and it may save lots of time.

@nolouch
Copy link
Member

nolouch commented Apr 14, 2022

BTW. we can make the strategy implemented in the PD server, then Multiple downstream multiple implementations can be prevented. there is a pr relative it: tikv/pd#4778

@YuJuncen
Copy link
Contributor

cc #27234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/br This issue is related to BR of TiDB. component/lightning This issue is related to Lightning of TiDB. component/pd type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

3 participants