Skip to content

Failover Cluster

Artem Dmitriev edited this page Jun 26, 2017 · 2 revisions

How it works

Reveno as a reliable transactional system has failover abilities, providing network clustering support with master-slave architecture implied.

In order to use it, you should include reveno-cluster jar to your classpath. After that, ClusterEngine implementation should be used instead of Engine in a single-node variant, like:

ClusterEngine reveno = new ClusterEngine("/tmp/reveno");

For the basic cases, there are no additional constructor argumentes should be passed to engine. Much of cluster configurations can be done using ClusterEngine#clusterConfiguration() call, which methods and detailed descriptions are here .

Please note, that for now you should always include all nodes addresses into your configuration, regardless of whether multicast or unicast is used as a concrete commands transfer protocol. All cluster-related communications (like leadership election) are done in a unicast fashion.

Leadership election

When any new node appears or disappears from the cluster, a separate Leadership election process is triggered, if there is quorum of live nodes (among all nodes listed in the configuration). It means that during this process no commands are allowed to be executed. Internally, in the course of it new leader node is elected. Nodes also synchronize the latest data with each other. You should expect next guarantess from it:

  • The single leader is elected among other nodes.
  • All nodes share the same up-to-date state.

Then cluster continues to operate normally. In case of any error, the process repeats until it eventually succeed.

Failover process

In a normal cluster execution flow there is a single master and possible multiple slave nodes. Every command executed on the master is sent down to all slaves. Currently, failover mechanism has two reliable transport implementations: Multicast and Unicast, which can be chosen via configuration.

By no means, IP multicast is the fastest option, but it works only in a local networks and requires very high quality, and fine-tuned OS to achieve decent latency and throughput results.

For both multicast and unicast transports the wide range of configurations are presented.

Both transports use NAK approach as main source of reliability. It requires very little overhead during normal operation. But in case of node failure or network gap of duration more than resend interval, there will be full Leadership election process triggered again.

Implementation details

Inside Reveno there are two possibly distinct cluster implementation parts reside together - part for inter-cluster communications (used by Leadership election, etc) and primary failover. We decided to separate this two because there two very distinct purposes they pay and significantly different requirements.

ClusterBuffer is used for the failover part. It extends Buffer interface, which allows it be used in serializers, which in its turn allows to use Zero-Copy approach here.

Cluster is used for inter-cluster communications part. It provides ClusterConnector for messaging abilities. It also allows to listen to cluster events and also set custom marshaller for messages (though default marshaller is most efficient and this option should be used only for very sophisticated needs).

You can pass your custom Cluster and ClusterBuffer implementations by implementing factory interface ClusterProvider and passing it to the ClusterEngine via constructor.

Clone this wiki locally