Skip to content

Commit 233729f

Browse files
committed
Cluster: some bias towwards FAIL/PFAIL nodes in gossip sections.
This improves PFAIL -> FAIL switch. Too late at this point in the RC releases to add proper PFAIL/FAIL separate dictionary to do this in a less randomized way. Tested in practice with experiments that this helps. PFAIL -> FAIL average with 20 nodes and node-timeout set to 5 seconds takes 2.5 seconds without this commit, 1 second with this commit.
1 parent 69b4f00 commit 233729f

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

src/cluster.c

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2158,7 +2158,7 @@ void clusterSendPing(clusterLink *link, int type) {
21582158
clusterBuildMessageHdr(hdr,type);
21592159

21602160
/* Populate the gossip fields */
2161-
int maxiterations = wanted*2;
2161+
int maxiterations = wanted*3;
21622162
while(freshnodes > 0 && gossipcount < wanted && maxiterations--) {
21632163
dictEntry *de = dictGetRandomKey(server.cluster->nodes);
21642164
clusterNode *this = dictGetVal(de);
@@ -2169,6 +2169,11 @@ void clusterSendPing(clusterLink *link, int type) {
21692169
* already, so we just gossip about other nodes. */
21702170
if (this == myself) continue;
21712171

2172+
/* Give a bias to FAIL/PFAIL nodes. */
2173+
if (maxiterations > wanted*2 &&
2174+
!(this->flags & (REDIS_NODE_PFAIL|REDIS_NODE_FAIL)))
2175+
continue;
2176+
21722177
/* In the gossip section don't include:
21732178
* 1) Nodes in HANDSHAKE state.
21742179
* 3) Nodes with the NOADDR flag set.
@@ -2201,8 +2206,6 @@ void clusterSendPing(clusterLink *link, int type) {
22012206
gossip->notused2 = 0;
22022207
gossipcount++;
22032208
}
2204-
redisLog(REDIS_VERBOSE,"WANTED: %d, USED_ITER: %d, GOSSIPCOUNT: %d",
2205-
wanted, wanted*2-maxiterations, gossipcount);
22062209

22072210
/* Ready to send... fix the totlen fiend and queue the message in the
22082211
* output buffer. */

0 commit comments

Comments
 (0)