Perform a distribution of aliens across a smaller set of nodes #872

ikopylov · 2024-04-06T01:01:16Z

Right now if we have cluster with 100 nodes and 1 node is down, we got 99 partitions all other the cluster for the same period. After alien recovery all these 99 partitions will be moved to the original node. Every new partition reduce performance and consume file descriptors, so this can become a problem.
We can divide all nodes into subsets and distribute the subsets evenly among the nodes. If the target node is down, then a node to store aliens will be selected from the allocated subset. If all nodes in a subset become unavailable (this should happen very rarely), then we can use other subsets. This approach will reduce the number of partitions created in the cluster.
Size of the subset can be configured by user in config file.

Distribution example:
100 nodes, subset size: 10
Node 1: subset 1 (nodes 1-10)
Node 2: subset 2 (nodes 11-20)
...
Node 10: subset 10 (nodes 91-100)
Node 11: subset 1 (nodes 1-10)
...

Related issue #871

ikopylov added the improvement label Apr 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform a distribution of aliens across a smaller set of nodes #872

Perform a distribution of aliens across a smaller set of nodes #872

ikopylov commented Apr 6, 2024

Perform a distribution of aliens across a smaller set of nodes #872

Perform a distribution of aliens across a smaller set of nodes #872

Comments

ikopylov commented Apr 6, 2024