Skip to content

[Enhancement] Optimize the algorithm of selecting host for a bucket scan task when a backend not alive #5132

@xinghuayu007

Description

@xinghuayu007

Describe the bug

When query uses bucket join, fuction Coordinator#BucketShuffleJoinController#getExecHostPortForFragmentIDAndBucketSeq is responseful for making sure each host have average bucket to scan. That means if there are 10 buckets to scan and 5 hosts, the strategy will distributed 2 buckets to each host. The algorithm is like this:

a. use data structure buckendIdToBucketCountMap to represents how many buckets distributed to the backend;
b. traverse every backend, find a backend which owns minimum buckets. We call it mini_backend;
c. distribute the bucket to the mini_backend;
d. update buckendIdToBucketCountMap for mini_backend;

When all bakends are all alive, the algorithm is available. But when mini_backend is not alive, it will chose a replica host as final host randomly and buckendIdToBucketCountMap is not updated. This will cause the bucket scan task not load balance.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions