Skip to content

Commit

Permalink
Add a test to monitor the distribution of vnode to physical node
Browse files Browse the repository at this point in the history
### What changes are proposed in this pull request?

Add a test so we can monitor the vnode distribution is not too uneven. 

This test calculates the standard deviation over mean on the collection of virtual nodes assigned to physical nodes. It arbitrarily bounds it at 0.25, but ideally this number should get smaller over time as we improve hashing algorithm
and use better ways to assign virtual nodes to physical nodes.


### Why are the changes needed?

We may change hashing algorithm and virtual node assignment in the future, this will provide guidance and catch errors. 

### Does this PR introduce any user facing changes?
No. 

			pr-link: #18147
			change-id: cid-152d8edc9b65ef59967d5985849feeb471a6650d
  • Loading branch information
yuzhu authored Sep 15, 2023
1 parent f05deb3 commit 8bba797
Showing 1 changed file with 42 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,10 @@
import org.apache.commons.lang3.RandomStringUtils;
import org.junit.Test;

import java.util.Collection;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.NavigableMap;
import java.util.Set;
import java.util.concurrent.CountDownLatch;
Expand All @@ -48,6 +51,45 @@ public void uninitializedThrowsException() {
assertThrows(IllegalStateException.class, () -> provider.get(OBJECT_KEY, 0));
}

@Test
/**
* This test calculates the standard deviation over mean on the collection of
* virtual nodes assigned to physical nodes. It arbitrarily bounds it at 0.25,
* but ideally this number should get smaller over time as we improve hashing algorithm
* and use better ways to assign virtual nodes to physical nodes.
*
* This uses 2000 virtual nodes and 50 physical nodes, if these parameters change,
* the bound is likely going to change.
*/
public void virtualNodeDistribution() {
ConsistentHashProvider provider = new ConsistentHashProvider(1, WORKER_LIST_TTL_MS);
List<BlockWorkerInfo> workerList = generateRandomWorkerList(50);
// set initial state
provider.refresh(workerList, 2000);
NavigableMap<Integer, BlockWorkerInfo> map = provider.getActiveNodesMap();
Map<BlockWorkerInfo, Long> count = new HashMap<>();
long last = Integer.MIN_VALUE;
for (Map.Entry<Integer, BlockWorkerInfo> entry: map.entrySet()) {
count.put(entry.getValue(), count.getOrDefault(entry.getValue(), 0L)
+ (entry.getKey() - last));
last = entry.getKey().intValue();
}
assertTrue(calcSDoverMean(count.values()) < 0.25);
}

private double calcSDoverMean(Collection<Long> list) {
long sum = 0L;
double var = 0;
for (long num : list) {
sum += num;
}
double avg = sum * 1.0 / list.size();
for (long num : list) {
var = var + (num - avg) * (num - avg);
}
return Math.sqrt(var / list.size()) / avg;
}

@Test
public void concurrentInitialization() {
ConsistentHashProvider provider = new ConsistentHashProvider(1, WORKER_LIST_TTL_MS);
Expand Down

0 comments on commit 8bba797

Please sign in to comment.