-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BR can reduce scatter operations via coarse-grained scattering #27234
Comments
/component br |
farther refactor and optimizationsThe batcher was adapted to split huge tables into small batches for preventing generating huge split & scatter batch. After this optimize, it would be better to send huge batches. So, when meeting huge tables, it's no need to split it into 128-sized chunks(or, we still need split it, but for huger chunks, because sort ranges for really huge tables would be hard?), but we can send it directly to the Splitter. (There would be a thread pool to limit the download & ingest part, so it would be OK to send huge batch to Importer.) And then, the |
Change |
🤔 this surely reduces a lot of scatter calls. One effect, though, is that a longer range of similar keys are all clustered in the same configuration. iiuc, the scatter result may be like this,
(we also see that stores 3 and 4 never have any leaders, due to the random nature of scatter. we need a deterministic version of scattering, perhaps by manual migration, to ensure uniform distribution) |
@kennytm For the unbalanced leader problem seems serval transform leader operator after splitting can help? (BTW, Near keys shares same configuration(i.e. shares the same peer, but can have different leader.) seems harmless?) |
is transfer-leader much cheaper than scattering (the empty regions)? |
Close this due to: |
Enhancement
Currently, BR uses the following procedure to scatter regions:
BatchSplit
interface of BR.At the step (3), the number of scatter operators equals to the number of newly split regions. In practice, it would be a huge number (i.e.
concurrency * pd-concurrency
) of operators, which would be slow and timedout-prone.In fact, given the number of stores would be fairly less than
concurrency * pd-concurrency
, and the cost of splitting is quite lower than scattering, we can make a better scatter strategy:n
times of it.An example, let there be a set of 129 keys
[k1,k2,...,k129]
need to split, in a cluster with 3 stores:[k43, k86]
, then there would be 3 regions[k1, k43)
,[k43, k86)
,[k86, k129)
.Comparing to the current way:
The number of scatter operators reduces from
O(files + rewrite rules)
toO(stores)
.The text was updated successfully, but these errors were encountered: