Skip to content
This repository has been archived by the owner on Aug 7, 2018. It is now read-only.

etcd-io/etcd-play

Repository files navigation

Note: This project is replaced with https://github.com/coreos/etcdlabs

etcd-play

Build Status Godoc

etcd-play is a playground for exploring the etcd distributed key-value database. Try it out live at play.etcd.io.

Play with etcd in a web browser

etcd uses the Raft consensus algorithm to replicate data on distributed machines in order to gracefully handle network partitions, node failures, and even leader failures. The etcd team extensively tests failure scenarios in the etcd functional test suite. Real-time results from this testing are available at the etcd test dashboard.

In Raft, followers are passive, only responding to incoming RPCs. Clients can make requests to any node, follower or leader. Followers, in turn, forward requests to their leader. Last, the leader appends those requests (commands) to its log and sends AppendEntries RPCs to all of its followers.

Follower failures

What if followers fail?

The leader retries RPCs until they succeed. As soon as a follower recovers, it will catch up with the leader.

follower-failures

In the animation above, notice the stress on the remaining nodes while two of followers (etcd1 and etcd2) are down. Nevertheless, notice that all data is replicated across the cluster, except those two failed ones. Immediately after the nodes recover, the followers sync their data from the leader, looping on this process until all hashes match.

Leader failure

What if a leader fails?

A leader sends periodic heartbeat messages to its followers to maintain its authority. If a follower has not received heartbeats from a valid leader within the election timeout, it assumes that there is no current leader in the cluster, and becomes a candidate to start a new election. Each node includes its last term and last log index in its RequestVote RPC, so that Raft can choose the candidate that is most likely to contain all committed entries. When the old leader recovers, it will retry to commit log entries of its own. The Raft term is used to detect these stale leaders: Followers deny RPCs if the sender's term is older, and then the sender (often the old leader) reverts back to follower state and updates its term to the latest cluster term.

leader-failure

The animation above shows the Leader going down, and shortly a new leader is elected.

All nodes failure

etcd is highly available as long as a quorum of cluster members are operational and can communicate with each other and with clients. 5-node clusters can tolerate failures of any two members. Data loss is still possible in catastrophic events, like all nodes failing. etcd persists enough information on stable storage so that members can recover safely from the disk and rejoin the cluster. In particular, etcd stores new log entries onto disk before committing them to the log to prevent committed entries from being lost on an unexpected restart.

all-node-failures

The animation above shows all nodes being terminated with the Kill button. etcd recovers the data from stable storage. You can see the number of keys and hash values match, before and after. The cluster can handle client requests immediately after recovery with a new leader.