Describe the bug
When query uses bucket join, fuction Coordinator#BucketShuffleJoinController#getExecHostPortForFragmentIDAndBucketSeq is responseful for making sure each host have average bucket to scan. That means if there are 10 buckets to scan and 5 hosts, the strategy will distributed 2 buckets to each host. The algorithm is like this:
a. use data structure buckendIdToBucketCountMap to represents how many buckets distributed to the backend;
b. traverse every backend, find a backend which owns minimum buckets. We call it mini_backend;
c. distribute the bucket to the mini_backend;
d. update buckendIdToBucketCountMap for mini_backend;
When all bakends are all alive, the algorithm is available. But when mini_backend is not alive, it will chose a replica host as final host randomly and buckendIdToBucketCountMap is not updated. This will cause the bucket scan task not load balance.
Desktop (please complete the following information):
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
Smartphone (please complete the following information):
- Device: [e.g. iPhone6]
- OS: [e.g. iOS8.1]
- Browser [e.g. stock browser, safari]
- Version [e.g. 22]
Additional context
Add any other context about the problem here.