BR can reduce scatter operations via coarse-grained scattering #27234

YuJuncen · 2021-08-16T03:13:13Z

Enhancement

Currently, BR uses the following procedure to scatter regions:

BR collect keys for splitting regions from end keys of files and rewrite rules.
BR split those keys via BatchSplit interface of BR.
After the region split getting present, BR create the scatter operators.
After those operator getting done or timed out, or 3 mins passed, BR report scatter as done.

At the step (3), the number of scatter operators equals to the number of newly split regions. In practice, it would be a huge number (i.e. concurrency * pd-concurrency) of operators, which would be slow and timedout-prone.

In fact, given the number of stores would be fairly less than concurrency * pd-concurrency, and the cost of splitting is quite lower than scattering, we can make a better scatter strategy:

BR collect keys as normal.
BR divide the keys found into equal parts. The number of parts could be equals to the number of the stores or n times of it.
BR split the key space according to these parts, then scatter them.
Execute fine-grained split in the scattered regions.

An example, let there be a set of 129 keys [k1,k2,...,k129] need to split, in a cluster with 3 stores:

We split at [k43, k86], then there would be 3 regions [k1, k43), [k43, k86), [k86, k129).
Given there are only 3 stores, we can scatter those 3 (relatively) bigger regions into 3 stores.

[3 scatter operators generated]

Store 1: [k1, k43)
Store 2: [k43, k86)
Store 3: [k86, k129)

We can then split them in-place at the new store.

Store 1: [k1, k2), ..., [k42, k43)
Store 2: [k43, k44), ..., [k85, k86)
Store 3: [k86, k87), ..., [k128, k129)

Comparing to the current way:

We split it at one store.

Store 1: [k1, k2), ..., [k42, k43), [k43, k44), ..., [k85, k86), [k86, k87), ..., [k128, k129)
Store 2: ∅
Store 3: ∅

We scatter those regions into stores.

[128 scatter operators generated]

Store 1: [k1, k2), [k43, k44) ...
Store 2: [k42, k43), [k85, k86), ...
Store 3: [k86, k87), [k128, k129), ...

The number of scatter operators reduces from O(files + rewrite rules) to O(stores).

The text was updated successfully, but these errors were encountered:

YuJuncen · 2021-08-16T03:13:23Z

/component br
/sig migrate

YuJuncen · 2021-08-16T03:21:20Z

farther refactor and optimizations

The batcher was adapted to split huge tables into small batches for preventing generating huge split & scatter batch. After this optimize, it would be better to send huge batches.

So, when meeting huge tables, it's no need to split it into 128-sized chunks(or, we still need split it, but for huger chunks, because sort ranges for really huge tables would be hard?), but we can send it directly to the Splitter. (There would be a thread pool to limit the download & ingest part, so it would be OK to send huge batch to Importer.)

And then, the BlankTablesAfterSend field in DrainResult can be removed. Even the complex batcher.drainRanges method can also be removed.

3pointer · 2021-08-16T04:14:30Z

The number of scatter operators reduces from O(files + rewrite rules) to O(stores).

Change split 128 regions & scatter 128 regions to split 3(store-size) regions & scatter 3 regions & split 128/3 regions sounds a big optimization, @kennytm WDYT🤔?

kennytm · 2021-08-16T10:18:57Z

🤔 this surely reduces a lot of scatter calls. One effect, though, is that a longer range of similar keys are all clustered in the same configuration. iiuc, the scatter result may be like this,

potential scatter result, existing (each circle represent a region, each row is a store, 🔴 = leader, 🟢 = follower, ⚪ = not exists):

⚪🟢🟢⚪🟢🟢🟢⚪🔴🟢⚪⚪🟢⚪⚪⚪🔴⚪⚪🟢🟢🔴🟢🔴⚪🟢🔴🟢⚪🔴🟢⚪⚪⚪⚪

⚪🟢⚪🟢🔴⚪⚪⚪⚪⚪🔴🟢⚪🔴🟢🔴⚪🟢⚪⚪⚪🟢⚪🟢⚪🔴⚪🔴🟢🟢🟢🟢🔴🔴🔴

🔴⚪🔴🟢⚪⚪⚪🔴🟢⚪⚪🔴🟢⚪⚪⚪⚪🟢🟢🟢⚪🟢⚪⚪🔴🟢🟢⚪🟢🟢🔴🟢⚪🟢🟢

🟢🔴🟢🔴🟢🟢🟢🟢⚪🟢🟢⚪🔴🟢🟢🟢🟢🔴🔴⚪🟢⚪🔴⚪🟢⚪🟢⚪🔴⚪⚪⚪🟢🟢🟢

🟢⚪⚪⚪⚪🔴🔴🟢🟢🔴🟢🟢⚪🟢🔴🟢🟢⚪🟢🔴🔴⚪🟢🟢🟢⚪⚪🟢⚪⚪⚪🔴🟢⚪⚪

potential scatter result, proposed

⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴🔴

🔴🔴🔴🔴🔴🔴🔴🟢🟢🟢🟢🟢🟢🟢🔴🔴🔴🔴🔴🔴🔴⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪

🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢⚪⚪⚪⚪⚪⚪⚪🟢🟢🟢🟢🟢🟢🟢

⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢

🟢🟢🟢🟢🟢🟢🟢🔴🔴🔴🔴🔴🔴🔴🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢🟢⚪⚪⚪⚪⚪⚪⚪

(we also see that stores 3 and 4 never have any leaders, due to the random nature of scatter. we need a deterministic version of scattering, perhaps by manual migration, to ensure uniform distribution)

YuJuncen · 2021-08-16T10:59:19Z

@kennytm For the unbalanced leader problem seems serval transform leader operator after splitting can help? (BTW, Near keys shares same configuration(i.e. shares the same peer, but can have different leader.) seems harmless?)

kennytm · 2021-08-16T11:14:32Z

is transfer-leader much cheaper than scattering (the empty regions)?

YuJuncen · 2021-09-15T05:48:16Z

Close this due to:
According to the friends from the PD team, it isn't good if adjoining key ranges shares the same configuration, and the cost of scattering empty region should not be so much higher than transforming leader.

YuJuncen added the type/enhancement The issue or PR belongs to an enhancement. label Aug 16, 2021

ti-chi-bot added component/br This issue is related to BR of TiDB. sig/migrate labels Aug 16, 2021

YuJuncen closed this as completed Sep 15, 2021

YuJuncen mentioned this issue Apr 15, 2022

A smarter way to scatter batch regions #33937

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BR can reduce scatter operations via coarse-grained scattering #27234

BR can reduce scatter operations via coarse-grained scattering #27234

YuJuncen commented Aug 16, 2021

YuJuncen commented Aug 16, 2021

YuJuncen commented Aug 16, 2021 •

edited

Loading

3pointer commented Aug 16, 2021

kennytm commented Aug 16, 2021 •

edited

Loading

YuJuncen commented Aug 16, 2021 •

edited

Loading

kennytm commented Aug 16, 2021

YuJuncen commented Sep 15, 2021

BR can reduce scatter operations via coarse-grained scattering #27234

BR can reduce scatter operations via coarse-grained scattering #27234

Comments

YuJuncen commented Aug 16, 2021

Enhancement

YuJuncen commented Aug 16, 2021

YuJuncen commented Aug 16, 2021 • edited Loading

farther refactor and optimizations

3pointer commented Aug 16, 2021

kennytm commented Aug 16, 2021 • edited Loading

YuJuncen commented Aug 16, 2021 • edited Loading

kennytm commented Aug 16, 2021

YuJuncen commented Sep 15, 2021

YuJuncen commented Aug 16, 2021 •

edited

Loading

kennytm commented Aug 16, 2021 •

edited

Loading

YuJuncen commented Aug 16, 2021 •

edited

Loading