You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 25, 2020. It is now read-only.
Ringpop currently allows bootstrap to occur without a listening tchannel underneath. This was confirmed by using tick-cluster with bootstrap but not listen on both a single node, and all nodes (patch below).
Behaviorally, a single node failing to listen is the worst case. It has continuous 1-way interactions with other nodes, and seems to create a continuous cycle of other nodes marking it suspect. This is possible in real life during a rolling upgrade, or if bootstrap/listen handling is incorrect in some cases.
If all nodes fail to listen, they all simply fail to bootstrap, as expected.
Our current code demonstrates listen() before bootstrap() pretty consistently, but given the failure modes, we ought to be more defensive and confirm that the tchannel is already listening, or call this.channel.listen() ourselves.
Behavior was confirmed by watching tick-cluster logs, and running ringpop-admin dump on one of the live nodes.
diff --git a/main.js b/main.js
index dc8fd2c..bd4f101 100755
--- a/main.js+++ b/main.js@@ -61,10 +61,15 @@ function main(args) {
var listenParts = listen.split(':');
var port = Number(listenParts[1]);
var host = listenParts[0];
- tchannel.listen(port, host, onListening);- function onListening() {+ if (port === 3000) {
ringpop.bootstrap(program.hosts);
+ } else {+ tchannel.listen(port, host, onListening);++ function onListening() {+ ringpop.bootstrap(program.hosts);+ }
}
}
The text was updated successfully, but these errors were encountered:
(More or less duplicate of uber/ringpop-go#146)
Ringpop currently allows bootstrap to occur without a listening tchannel underneath. This was confirmed by using tick-cluster with bootstrap but not listen on both a single node, and all nodes (patch below).
Behaviorally, a single node failing to listen is the worst case. It has continuous 1-way interactions with other nodes, and seems to create a continuous cycle of other nodes marking it suspect. This is possible in real life during a rolling upgrade, or if bootstrap/listen handling is incorrect in some cases.
If all nodes fail to listen, they all simply fail to bootstrap, as expected.
Our current code demonstrates
listen()
beforebootstrap()
pretty consistently, but given the failure modes, we ought to be more defensive and confirm that the tchannel is already listening, or callthis.channel.listen()
ourselves.Behavior was confirmed by watching tick-cluster logs, and running ringpop-admin dump on one of the live nodes.
The text was updated successfully, but these errors were encountered: