-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathpaxos-from-the-ground-up.html
211 lines (176 loc) · 30.1 KB
/
paxos-from-the-ground-up.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
<html>
<head>
<meta name="description" content="This exposition introduces the widely used Paxos protocol for achieving distributed consensus. Instead of introducing Paxos with all its subtle details from the get-go, it derives the protocol step by step allowing you to better appreciate its simplicity and beauty.">
<title>Paxos from the ground up</title>
<meta charset="UTF-8"/>
<link rel="icon" type="image/x-icon" href="favicon.ico">
<script src="d3.v4.min.js"></script>
<script src="Diagram.js"></script>
<script src="PaxosDiagrams.js"></script>
<script src="Slideshow.js"></script>
</head>
<style>
body {
background-color: #f7f7f7;
font: normal 14px/1.65 "PT Sans", "Helvetica", "Arial", sans-serif;
}
#root-container {
text-align: center;
}
#root-container > root {
display: inline-block;
}
#presentation-header {
font-weight: bold;
font-size: 2.5em;
text-align: left;
margin-top: 2em;
}
.first-slide-title {
font-size: 35px;
}
a {
text-decoration: none;
}
.smaller-slide-text {
font-size: 20px;
}
.title {
display: inline-block;
border: 1px solid white;
background-color: #3b21a2;
color: white;
margin-top: 10px;
padding-top: 10px;
padding-bottom: 10px;
padding-left: 10px;
padding-right: 50px;
white-space: nowrap;
}
.title-text {
font-size: 55px;
vertical-align: bottom;
}
.section, .subsection, .subsubsection {
display: inline-block;
font-weight: normal;
border-bottom: 1px solid gray;
white-space: nowrap;
}
.section {
font-size: 25px;
margin-top: 35px;
}
.subsection {
font-size: 20px;
margin-top: 25px;
}
.subsubsection {
font-size: 16px;
margin-top: 15px;
}
.scenario {
max-width: 700px;
display: inline-block;
margin: 30px;
padding: 20px;
border: 1px solid silver;
font-size: 14px;
}
.postmortem-header {
display: inline-block;
font-size: 20px;
}
.algorithm-detail {
margin-left: 50px;
}
.algorithm-node-section {
margin-top: 30px;
text-align: left;
}
.algorithm-node-title {
margin-bottom: 10px;
border-bottom: 1px gray solid;
display: inline-block;
}
.algorithm-node-detail {
margin-left: 20px;
}
circle.node-down {
stroke: red !important;
stroke-dasharray: 10 4;
}
.multi-paxos-example {
max-width: 700px;
display: inline-block;
margin: 30px;
padding: 20px;
border: 1px solid silver;
}
.button-container {
text-align: center;
}
.button-container > .previous {
margin-right: 20px;
}
#progress-bar {
width: 50em;
}
#bottom-control-container {
text-align: center;
width: 1000px;
padding-top: 2em;
padding-bottom: 2em;
}
#next-slide-button {
margin-left: 1em;
}
#previous-slide-button {
margin-right: 1em;
}
#slide-container {
border: 1px solid;
text-align: center;
width: 1000px;
height: 650px;
}
#slide-container:before {
content: '';
display: inline-block;
height: 100%;
vertical-align: middle;
}
.slide-section {
display: inline-block;
}
.slide {
display: none;
margin-left: 30px;
margin-right: 30px;
font-size: 25px;
vertical-align: middle;
}
.slide > p {
margin-bottom: 3em;
}
#section-title-container {
font-style: italic;
}
</style>
<body>
<svg style="width: 0px; height: 0px;">
<defs>
<marker id="arrow" markerWidth="10" markerHeight="10" refX="8" refY="3" orient="auto" markerUnits="strokeWidth">
<path d="M0,0 L0,6 L9,3 z"/>
</marker>
<pattern id="circles-1" patternUnits="userSpaceOnUse" width="10" height="10">
<image xlink:href="data:image/svg+xml;base64,PHN2ZyB4bWxucz0naHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmcnIHdpZHRoPScxMCcgaGVpZ2h0PScxMCc+CiAgPHJlY3Qgd2lkdGg9JzEwJyBoZWlnaHQ9JzEwJyBmaWxsPScjZmZmJyAvPgogIDxjaXJjbGUgY3g9IjEiIGN5PSIxIiByPSIxIiBmaWxsPSIjMDAwIi8+Cjwvc3ZnPg=="
x="0" y="0" width="10" height="10">
</image>
</pattern>
</defs>
</svg>
<div id="root-container">
<root><div id="presentation-header">Paxos from the ground up</div><div id="slide-container"><div class="slide-section" title="Introduction"><div class="slide"><div><img src="images/paxos-symbol.png"/></div><p> </p><div class="first-slide-title">Paxos from the ground up</div><p><span class="smaller-slide-text">Immad Naseer <span style="padding-left: 5px"><a href="https://www.linkedin.com/in/immad-naseer-b0235014/"><img src="images/linkedin.svg" width="12px"/></a> <a href="https://twitter.com/immadnaseer"><img src="images/twitter.svg" width="12px"/></a></span></span></p></div><div class="slide">Paxos is an algorithm to reach distributed consensus</div><div class="slide"><p>So what is distributed consensus?</p><p>It is the process of having a set of machines agree upon a common value</p></div><div class="slide"><p>Why is distributed consensus an important problem?</p><p>If we can get all machines to agree upon some value in a consistent manner, we get resiliency in the face of temporary machine failures and thus higher availability of the system</p></div><p> </p><div class="slide"><p>Why is distributed consensus a challenging problem?</p><p>The machines have to agree upon a common value while communicating with each other over an unreliable network; furthermore the system should be able to gracefully handle the failure of a few machines</p></div><p> </p><div class="slide">Instead of describing the Paxos protocol in all its subtle details from the get-go, we'll derive it starting with the simplest protocol and incrementally refining it till we reach the complete protocol</div></div><div class="slide-section" title="Set up of the example"><div class="slide">Before we begin, let's carefully study the assumptions the protocol makes and the environment it executes in</div><div class="slide"><p>What do we mean by an unreliable network?</p><p>A message sent to a machine may never reach it<br/>A message sent later might arrive before a message sent earlier<br/>A message may be delivered more than once</p></div><div class="slide"><p>Are there other factors we should consider?</p><p>Machines can be malicious and can try to intentionally mislead the system so it doesn’t achieve consensus<br/>(this is called Byzantine fault tolerance in the literature)</p><p>Paxos assumes that this is NOT the case</p></div><div class="slide"><p>How many machines are we trying to reach consensus between?</p><p>On the order of 5 - 9 machines</p></div><div class="slide">Now that we know some background and assumptions, let’s try and come up with the protocol</div><div class="slide"><p>We’ll assume three classes of machines as we derive the protocol</p><p>Proposers<br/>Acceptors<br/>Learners</p></div><div class="slide"><p>Proposers are responsible proposing values to the acceptors</p><p>Clients contact one of the proposers if they want the system to choose a certain value and the proposers in turn try to get that value accepted <span class="smaller-slide-text"><i>(we'll not be talking about clients from here on out for simplicity and assume the proposers are trying to get a value suggested by a client accepted by the system)</i></span></p><p>It’s a good idea to have more than one proposer as the system won’t be functional if there was only one and it crashed and was down for a while</p><p>Let’s pick two as we derive the protocol</p></div><div class="slide"><p>Acceptors are responsible for accepting (or rejecting) proposals from proposers while ensuring that the complete set of acceptors doesn’t accept conflicting values</p><p>It’s good to have more than one acceptor as the system won’t be functional if there was only one and it crashed and was down for a while</p><p>Let’s pick two as we derive the protocol</p></div><div class="slide"><p>Learners learn what the set of acceptors have chosen as a whole</p><p>Let’s pick just one learner as we derive the protocol for simplicity</p></div><div class="slide"><p>Let’s only deal with the unreliable network at first as we come up with the protocol and assume that no machine can go down</p><p>We’ll extend this simpler algorithm to handle machine failures later</p></div></div><div class="slide-section" title="Protocol for when no machine goes down"><div class="slide"><p>What’s the simplest algorithm we can think of?</p><p>How about each proposer sends an <i>“accept”</i> message to all acceptors and each acceptor accepts the first message it receives?</p><p>What can possibly go wrong? ;)</p></div><div class="slide"><div class="scenario"><div id="scenario-1"></div></div></div><div class="slide"><p>What went wrong?</p><p>Both the proposers sent the <i>accept</i> messages at about the same time and since we couldn’t guarantee that the <i>accept</i> message for any one proposer will reach all acceptors before the other, chaos ensued</p></div><div class="slide"><p>How can we fix this?</p><p>One option is for a proposer to first send a <i>promise</i> message to all acceptors and make them promise that they won’t accept any other acceptors <i>accept</i> message before it sends its <i>accept</i> message</p><p>This may work</p></div><div class="slide"><p>How will the proposer identify itself in its promise message?</p><p>We can use the IP address of the proposer</p><p>Or better yet, we can identify the “proposal” instead of the proposer thus allowing a proposer to send multiple proposals</p><p>This gives us more flexibility and isn’t any more complicated so let’s go with this approach</p></div><div class="slide"><p>Each proposal can be identified by a pair <i>[sequence number, proposer name]</i></p><p>So the first proposal sent by p1 can be <i>[1, p1]</i><br/>If p1 ever decides to send a second proposal, that can be named <i>[2, p1]</i> etc<br/>etc</p></div><div class="slide">Let’s run this algorithm</div><div class="slide"><div class="scenario"><div id="scenario-accept-first-promise"></div></div></div><div class="slide"><p>So we bumped into a similar problem as before</p><p>The <i>promise</i> messages reached the acceptors in an inconvenient order just like the <i>accept</i> messages had reached earlier</p><p>This was actually quite predictable and we should have seen this coming</p></div><div class="slide"><p>Since acceptors might receive <i>promise</i> messages from multiple proposers, they will need to order them somehow so they accept one or the other but not both</p><p> We can achieve this by defining the “greater than” operator on proposal numbers; an acceptor only promises to accept a proposal if its proposal number is the highest one it has seen</p><p>[2, p1] > [1, p1] <span class="smaller-slide-text"><i>(the sequence number of first is greater, proposer name doesn’t matter)</i></span><br/> [1, p2] > [1, p1] <span class="smaller-slide-text"><i>(if the sequence numbers are same, we break the tie using proposer name)</i></span><br/>etc</p></div><div class="slide"><p>Let’s run this refined algorithm where a proposer first gets its <i>promise(proposal-number)</i> message acknowledged by all acceptors and only then sends its <i>accept</i> message</p><p>An acceptor only promises a proposal if it has the highest proposal number it has yet seen</p></div><div class="slide"><div class="scenario"><div id="scenario-accept-highest-promise"></div></div></div><div class="slide"><p>Good</p><p>We got a successful run</p></div><div class="slide"><p>There is still a problem lurking here</p><p>Can you think of it?</p></div><div class="slide"><div class="scenario"><div id="scenario-3"></div></div></div><div class="slide"><p>Let’s take a pause</p><p>Things are starting to get interesting and the state can evolve in multiple ways here</p></div><div class="slide"><p>Alternative #1</p><p>p1’s <i>accept</i> message can reach both acceptors before p2’s <i>promise</i> message reaches either</p></div><div class="slide"><div class="scenario"><div id="scenario-3-alternative-1"></div></div></div><div class="slide"><p>Alternative #2</p><p>p2’s <i>promise</i> message can reach both acceptors before p1’s <i>accept</i> message reaches either</p></div><div class="slide"><div class="scenario"><div id="scenario-3-alternative-2"></div></div></div><div class="slide"><p>Alternative #3<br/>This is the most interesting alternative</p><p>p1’s <i>accept</i> message reaches one acceptor while p2’s <i>promise</i> message reaches the other</p></div><div class="slide"><div class="scenario"><div id="scenario-3-alternative-3"></div></div></div><div class="slide">Hm, how do we break out of this one?</div><div class="slide"><p>So far the proposers have been competing to get their values chosen.</p><p>What if they co-operated instead of competing?</p><p>After all, the purpose of the protocol is to get "some" value chosen and not necessarily "my" value chosen</p></div><div class="slide"><p>What if p2 changed its mind and tried to get blue accepted instead of green if it learned that some acceptor has already accepted blue?</p><p>Will this work?</p><p>Will this work in all situations?</p></div><div class="slide">Let's think about it</div><div class="slide"><p>There are three logical alternatives</p><p>p1's accept message reaches all acceptors before p2's promise reaches any<br/><span class="smaller-slide-text"><i>(we are good in this case)</i></span></p><p>p2's promise message reaches all acceptors before p1's accept reaches any<br/><span class="smaller-slide-text"><i>(we are good in this case)</i></span></p><p>p1's accept reaches some acceptors while p2's promise message reaches others<br/><span class="smaller-slide-text"><i>(let's think this through)</i></span></p></div><div class="slide"><p>If p1's <i>acccept</i> message reaches some acceptor, even if just <i>one</i> acceptor, we know that p2 is bound to learn about it as it sends the <i>promise</i> message to all acceptors</p><p>Thus p2 is bound to change its mind as long as even a single acceptor has already accepted a value</p></div><div class="slide">Once a value has been accepted by some acceptor (even if just one), the other proposers with a higher number proposal number <i>have</i> to get that value chosen instead of whichever value they had in mind earlier</div><div class="slide">Let’s see this in action</div><div class="slide"><div class="scenario"><div id="scenario-4"></div></div></div><div class="slide"><p>This worked!</p><p>We cheated.</p><p>Just a little.</p></div><div class="slide">We had earlier said that a1 could never accept another message once it had already accepted some value</div><div class="slide"><p>But a1 <i>did</i> accept the higher numbered blue value sent by p2</p><p>What if p2 had sent green instead?</p><p>a1 had no choice but to accept that as well</p></div><div class="slide">So why all this emphasis on co-operation instead of competition?</div><div class="slide"><p>If p2 hadn't changed its mind to try and get blue accepted instead of green, there is no guarantee that yet another proposer (say, p3) could later come along and get its even higher numbered orange value accepted instead</p><p>This cycle would never end</p></div><div class="slide"><p>p2 changing its mind was thus critical for the correctness of the protocol</p><p>Acceptors can still accept a new value after having already accepted some value but the proposers make sure the new value they ask them to accept is the same as the old one (just with a different and higher numbered proposal)</p></div><div class="slide"><p>Congratulations!</p><p>We have derived the Paxos protocol assuming no machine goes down</p></div></div><div class="slide-section" title="Protocol for when machines can go down"><div class="slide"><p>Machines do go down however</p><p>And what’s the point of a distributed consensus if we can’t tolerate machine failures?</p></div><div class="slide"><p>The immediate next question is:</p><p>How many machine failures can we tolerate?</p></div><div class="slide"><p>Intuitively, we should be able to function fine as long as the majority of the machines are up</p><p>What’s the smallest majority? 51%</p></div><div class="slide"><p>As long as 51% of the machines are up, the protocol should be able to make forward progress</p><p>The machines which crashed can later come up and should be able to participate again without affecting the correctness of the protocol</p></div><div class="slide"><p>If we want to tolerate <i>f</i> failures, we should have 2f+1 machine failures (always an odd number)<br/>3 machines tolerate 1 failure <span class="smaller-slide-text"><i>(as smallest majority is 2)</i></span><br/>5 machines tolerate 2 failures <span class="smaller-slide-text"><i>(as smallest majority is 3)</i></span></p><p>2f+2 machines can still only tolerate <i>f</i> number of failures so having an even number of machines wastes an extra machine<br/>4 machines still tolerate 1 failure <span class="smaller-slide-text"><i>(as smallest majority is 3)</i></span><br/>6 machines still tolerate 2 failures <span class="smaller-slide-text"><i>(as smallest majority is 4)</i></span></p></div><div class="slide">Before we move forward, it’s useful to recognize a simple but important property of majorities</div><div class="slide"><p>Any two majorities always have at least one node in common</p><p>Let’s consider three acceptors a1, a2, a3 - there are three possible smallest majorities</p><p>a1, a2<br/>a2, a3<br/>a1, a3<br/><span class="smaller-slide-text"><i>(Pick any two of the above - they always have a machine in common)</i></span></p></div><div class="slide"><p>Let’s derive the protocol for when some machines can go down</p><p>Let’s consider two proposers as before (p1 and p2) but five acceptors (a1 - a5) and one learner as before</p><p>We’ll work towards tolerating the failure of at most two acceptors</p></div><div class="slide">To warm up, let’s do a run where only one proposer proposes and two acceptors go down in the middle</div><div class="slide"><div class="scenario"><div id="scenario-5"></div></div></div><div class="slide">The protocol worked much the same way as before with the difference that the proposer waited to hear back from only a majority of acceptors instead of all acceptors</div><div class="slide">Since a proposer doesn’t have to wait to hear back from all acceptors, we can have race conditions where the two proposers are talking with a mostly disjoint set of acceptors</div><div class="slide"><div class="scenario"><div id="scenario-6"></div></div></div><div class="slide"><p>Since both the proposers have to talk to the smallest majority, they must talk to at least one common node (p3 in our example)</p><p>There are two ways the situation can evolve</p></div><div class="slide"><p>Alternative # 1</p><p>a3 receives p1’s <i>promise</i> message, acks it and then receives p2’s <i>promise</i> message leading it to discard a1’s subsequent <i>accept</i> message</p></div><div class="slide"><div class="scenario"><div id="scenario-6-5"></div></div></div><div class="slide"><p>Alternative # 2</p><p>a3 can receive p1’s <i>promise</i> message, ack it and receive p1’s <i>accept</i> message all before it receives p2’s <i>promise</i> message</p><p>We have seen this play out before - p2 will cooperate in this case and change its mind about which color to get chosen</p></div><div class="slide"><div class="scenario"><div id="scenario-7"></div></div></div><div class="slide"><p>This shouldn’t have been surprising</p><p>Turns out the rules we derived for the case when no machine goes down work pretty well in general assuming we wait to hear back from at least a majority of acceptors</p></div><div class="slide"><p>Since a proposer doesn’t have to contact every acceptor, it’s possible that it can hear back from multiple acceptors each of which has accepted a different value</p><p>In this case, it changes its mind and tries to get the accepted value with the highest proposal-number accepted by the majority</p></div><div class="slide"><div class="scenario"><div id="scenario-8"></div></div></div><div class="slide"><p>We saw that a proposer can get acks with multiple accepted values if it only waits to hear back from majority</p><p>It changes its mind and gets the highest numbered value accepted from among the set of values it receives</p></div><div class="slide"><p>This is it</p><p>This is the complete Paxos protocol for choosing a single value among a set of machines while tolerating f machine failures</p></div><div class="slide">Here is the detailed protocol each of the three types of nodes follows</div><div class="slide"><div class="algorithm-node-section"><div class="algorithm-node-title">Proposer</div><div class="algorithm-node-detail"><p>Choose a globally unique proposal number and send promise(proposal-number) messages to all acceptors.</p><p>Wait to hear an acknowledgement from smallest majority.</p><p>If none of the received acknowledgements contain an already accepted value, feel free to send accept messages with the value of your choice.</p><p>If any of the received acknowledgements contain an already accepted value, pick the one with the higher proposal number and get that accepted instead of your own value.</p></div></div></div><div class="slide"><div class="algorithm-node-section"><div class="algorithm-node-title">Acceptor</div><div class="algorithm-node-detail"><p>When you receive a promise message, always acknowledge it if has the highest proposal number which you have seen; if the proposal number is lower than the one you had earlier acknowledged, feel free to not respond or send a negative acknowledgement</p><p>Acknowledge it even if you have already accepted another value but include your accepted value and its corresponding proposal number in your acknowledgement</p><p>When you receive an accept message, accept it if you had earlier acknowledged its proposal number which came through the promise message; also notify all the learners that you just accepted a value along with its proposal number</p></div></div></div><div class="slide"><div class="algorithm-node-section"><div class="algorithm-node-title">Learner</div><div class="algorithm-node-detail"><p>When you recieve a message from an acceptor informing you they accepted a value (along with its proposal number), remember it.</p><p>When you have received messages for the same proposal number from the smallest majority, you know the value the Paxos protocol chose.</p></div></div></div></div><div class="slide-section" title="Collapsing multiple roles into a machine"><div class="slide">Typical implementations of Paxos collapse the multiple roles into a single machine</div><div class="slide"><p>There is typically only one proposer as it’s counter-productive to have more than one proposer proposing in a race</p><p> The distinguished proposer (aka leader) can be chosen from the set of machines using Paxos itself (think of the “value” being chosen as who will be the leader)</p><p>The leader also serves as the distinguished learner and conveys the values it learns to other nodes instead of all of them learning by receiving <i>accepted</i> messages</p></div><div class="slide">Let’s see an exchange of messages in such a system where node n1 is the distinguished proposer and learner and n1, n2 and n3 are acceptors</div><div class="slide"><div class="scenario"><div id="scenario-9"></div></div></div></div><div class="slide-section" title="Multi-Paxos and Replicated State Machines"><div class="slide"><p>We have seen how Paxos can be used to choose a single value (but only a single value) in a consistent manner</p><p> In any real system, we will want to choose many different values</p></div><div class="slide"><p>Before moving forward, let's take a step back and talk about replicated state machines</p><p>We'll then see how we can use Paxos (or more precisely, Multi-Paxos) to implement a replicated state machine</p></div><div class="slide">A state machine is a computer which takes a sequence of commands and transitions into new a state as it executes each command</div><div class="slide"><p>A replicated state machine is that same state machine but replicated across multiple computers for better fault tolerance and higher availability</p><p>This allows us to view a collection of machines as one logical machine which has a higher uptime than any of them combined</p></div><div class="slide"><p>Each command is now executed on each replica of the state machine instead of at just one computer</p><p><span class="smaller-slide-text">As an important side note, command execution should be deterministic as a given command is executed multiple times, once for each replica</span></p></div><div class="slide"><p>Each machine maintains a list of input commands and the system runs an instance of Paxos in order to choose which command to execute for position <i>i</i></p><p>Since we run multiple instances of Paxos, one for each slot of the command list, this technique is called Multi-Paxos</p></div><div class="slide"><p>To make things concrete, let's consider a system with three nodes (which we know can tolerate a single machine failure) which has accepted and run two commands</p><p> </p><div class="multi-paxos-example">n1: [ c1, c2 ]<br/>n2: [ c1, c2 ]<br/>n3: [ c1, c2 ]</div><p>Each node has learned through two rounds of Paxos that the command for 1st position is c1 and 2nd position is c2</p></div><div class="slide"><p>When a client wants to execute a new command (c3), it contacts the distinguished proposer (aka leader) which in turn starts a round of Paxos to get the value for the 3rd position in the list chosen</p><p>Each machine executes the command c3 as it accepts and learns that the system has chosen it as the value for the 3rd position in the list<br/> </p><div class="multi-paxos-example">n1: [ c1, c2, c3 ]<br/>n2: [ c1, c2, c3 ]<br/>n3: [ c1, c2, c3 ]</div></div><div class="slide"><p>Now assume the client issues c4, two nodes accept and learn it (while third node hasn't yet received all the messages)</p><p> </p><div class="multi-paxos-example">n1: [ c1, c2, c3, c4 ]<br/>n2: [ c1, c2, c3, c4 ]<br/>n3: [ c1, c2, c3 ]</div><p>The system can move ahead as it can tolerate the failure (or slow down) of a single node</p></div><div class="slide"><p>The client can then issue command c5 and this command can reach all the nodes (including n3)</p><p> </p><div class="multi-paxos-example">n1: [ c1, c2, c3, c4, c5 ]<br/>n2: [ c1, c2, c3, c4, c5 ]<br/>n3: [ c1, c2, c3, ??, c5 ]</div><p>n1 and n2 are free to execute c5 but n3 will have to wait to hear the value for the 4th slot before it's able to execute c5</p></div><div class="slide"><p>Since n3 knows c5 has been chosen, it knows the value for the 4th slot must have been chosen by the majority</p><p> </p><div class="multi-paxos-example">n1: [ c1, c2, c3, c4, c5 ]<br/>n2: [ c1, c2, c3, c4, c5 ]<br/>n3: [ c1, c2, c3, ??, c5 ]</div><p>If the network dropped messages to n3, it can always start a dummy proposal round for the 4th slot and one of the nodes in the majority which accepted will reply back with the command it accepted for that slot</p></div><div class="slide"><p>Once n3 learns the value for the 4th slot, it can execute both c4 and c5</p><p> </p><div class="multi-paxos-example">n1: [ c1, c2, c3, c4, c5 ]<br/>n2: [ c1, c2, c3, c4, c5 ]<br/>n3: [ c1, c2, c3, c4, c5 ]</div></div><div class="slide"><p>Let's assume n3 gets unlucky and crashes while n1 and n2 continue to make progress</p><p> </p><div class="multi-paxos-example">n1: [ c1, c2, c3, c4, c5, c6, c7, c8 ]<br/>n2: [ c1, c2, c3, c4, c5, c6, c7, c8 ]<br/>n3: [ c1, c2, c3, c4, c5 ]</div></div><div class="slide"><p>When n3 comes back up, it suddenly learns the value for the 9th slot and realizes it missed all the updates in between</p><p> </p><div class="multi-paxos-example">n1: [ c1, c2, c3, c4, c5, c6, c7, c8, c9 ]<br/>n2: [ c1, c2, c3, c4, c5, c6, c7, c8, c9 ]<br/>n3: [ c1, c2, c3, c4, c5, ??, ??, ??, c9 ]</div></div><div class="slide"><p>n3 can start dummy proposal messages for the slots which it missed and learn the value chosen for those slots from one (or more) nodes in the majority which accepted values for those slots</p><p> </p><div class="multi-paxos-example">n1: [ c1, c2, c3, c4, c5, c6, c7, c8, c9 ]<br/>n2: [ c1, c2, c3, c4, c5, c6, c7, c8, c9 ]<br/>n3: [ c1, c2, c3, c4, c5, c6, c7, c8, c9 ]</div></div><div class="slide">There are optimizations and variations possible but you now know the crux of how Multi-Paxos is used to implement replicated state machines</div></div><div class="slide-section" title="Conclusion and wrap up"><div class="slide">You now have a working knowledge of not only of how the Paxos algorithm achieves consensus but also how its used to implement a replicated state machines</div><div class="slide"><p>Original Sources</p><p> </p><ul style="text-align: left"><li><a href="https://lamport.azurewebsites.net/pubs/paxos-simple.pdf">Paxos Made Simple</a> <i>by Leslie Lamport</i></li><li><a href="https://lamport.azurewebsites.net/pubs/lamport-paxos.pdf">The Part-Time Parliament</a> <i>by Leslie Lamport</i></li></ul></div><div class="slide">Thank you</div></div></div><div id="bottom-control-container"><quote><div id="section-title-container"></div></quote><quote><div id="progress-bar-container"><quote><input id="previous-slide-button" type="button" value="Previous" onclick="previousSlide()"/></quote><quote><progress id="progress-bar" max="100" value="0"></progress></quote><quote><input id="next-slide-button" type="button" value="Next" onclick="nextSlide()"/></quote></div></quote><quote><div id="progress-text"></div></quote></div></root>
</div>
</body></html>