You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now if we have cluster with 100 nodes and 1 node is down, we got 99 partitions all other the cluster for the same period. After alien recovery all these 99 partitions will be moved to the original node. Every new partition reduce performance and consume file descriptors, so this can become a problem.
We can divide all nodes into subsets and distribute the subsets evenly among the nodes. If the target node is down, then a node to store aliens will be selected from the allocated subset. If all nodes in a subset become unavailable (this should happen very rarely), then we can use other subsets. This approach will reduce the number of partitions created in the cluster.
Size of the subset can be configured by user in config file.
Right now if we have cluster with 100 nodes and 1 node is down, we got 99 partitions all other the cluster for the same period. After alien recovery all these 99 partitions will be moved to the original node. Every new partition reduce performance and consume file descriptors, so this can become a problem.
We can divide all nodes into subsets and distribute the subsets evenly among the nodes. If the target node is down, then a node to store aliens will be selected from the allocated subset. If all nodes in a subset become unavailable (this should happen very rarely), then we can use other subsets. This approach will reduce the number of partitions created in the cluster.
Size of the subset can be configured by user in config file.
Distribution example:
100 nodes, subset size: 10
Node 1: subset 1 (nodes 1-10)
Node 2: subset 2 (nodes 11-20)
...
Node 10: subset 10 (nodes 91-100)
Node 11: subset 1 (nodes 1-10)
...
Related issue #871
The text was updated successfully, but these errors were encountered: