17
17
The C11 memory model is fundamentally about trying to bridge the gap between the
18
18
semantics we want, the optimizations compilers want, and the inconsistent chaos
19
19
our hardware wants. * We* would like to just write programs and have them do
20
- exactly what we said but, you know, * fast* . Wouldn't that be great?
20
+ exactly what we said but, you know, fast. Wouldn't that be great?
21
21
22
22
23
23
@@ -35,20 +35,20 @@ y = 3;
35
35
x = 2;
36
36
```
37
37
38
- The compiler may conclude that it would * really * be best if your program did
38
+ The compiler may conclude that it would be best if your program did
39
39
40
40
``` rust,ignore
41
41
x = 2;
42
42
y = 3;
43
43
```
44
44
45
- This has inverted the order of events * and* completely eliminated one event.
45
+ This has inverted the order of events and completely eliminated one event.
46
46
From a single-threaded perspective this is completely unobservable: after all
47
47
the statements have executed we are in exactly the same state. But if our
48
- program is multi-threaded, we may have been relying on ` x ` to * actually* be
49
- assigned to 1 before ` y ` was assigned. We would * really * like the compiler to be
48
+ program is multi-threaded, we may have been relying on ` x ` to actually be
49
+ assigned to 1 before ` y ` was assigned. We would like the compiler to be
50
50
able to make these kinds of optimizations, because they can seriously improve
51
- performance. On the other hand, we'd really like to be able to depend on our
51
+ performance. On the other hand, we'd also like to be able to depend on our
52
52
program * doing the thing we said* .
53
53
54
54
@@ -57,15 +57,15 @@ program *doing the thing we said*.
57
57
# Hardware Reordering
58
58
59
59
On the other hand, even if the compiler totally understood what we wanted and
60
- respected our wishes, our * hardware* might instead get us in trouble. Trouble
60
+ respected our wishes, our hardware might instead get us in trouble. Trouble
61
61
comes from CPUs in the form of memory hierarchies. There is indeed a global
62
62
shared memory space somewhere in your hardware, but from the perspective of each
63
63
CPU core it is * so very far away* and * so very slow* . Each CPU would rather work
64
- with its local cache of the data and only go through all the * anguish* of
65
- talking to shared memory * only* when it doesn't actually have that memory in
64
+ with its local cache of the data and only go through all the anguish of
65
+ talking to shared memory only when it doesn't actually have that memory in
66
66
cache.
67
67
68
- After all, that's the whole * point* of the cache, right? If every read from the
68
+ After all, that's the whole point of the cache, right? If every read from the
69
69
cache had to run back to shared memory to double check that it hadn't changed,
70
70
what would the point be? The end result is that the hardware doesn't guarantee
71
71
that events that occur in the same order on * one* thread, occur in the same
@@ -99,13 +99,13 @@ provides weak ordering guarantees. This has two consequences for concurrent
99
99
programming:
100
100
101
101
* Asking for stronger guarantees on strongly-ordered hardware may be cheap or
102
- even * free* because they already provide strong guarantees unconditionally.
102
+ even free because they already provide strong guarantees unconditionally.
103
103
Weaker guarantees may only yield performance wins on weakly-ordered hardware.
104
104
105
- * Asking for guarantees that are * too* weak on strongly-ordered hardware is
105
+ * Asking for guarantees that are too weak on strongly-ordered hardware is
106
106
more likely to * happen* to work, even though your program is strictly
107
- incorrect. If possible, concurrent algorithms should be tested on weakly-
108
- ordered hardware.
107
+ incorrect. If possible, concurrent algorithms should be tested on
108
+ weakly- ordered hardware.
109
109
110
110
111
111
@@ -115,10 +115,10 @@ programming:
115
115
116
116
The C11 memory model attempts to bridge the gap by allowing us to talk about the
117
117
* causality* of our program. Generally, this is by establishing a * happens
118
- before* relationships between parts of the program and the threads that are
118
+ before* relationship between parts of the program and the threads that are
119
119
running them. This gives the hardware and compiler room to optimize the program
120
120
more aggressively where a strict happens-before relationship isn't established,
121
- but forces them to be more careful where one * is * established. The way we
121
+ but forces them to be more careful where one is established. The way we
122
122
communicate these relationships are through * data accesses* and * atomic
123
123
accesses* .
124
124
@@ -130,8 +130,10 @@ propagate the changes made in data accesses to other threads as lazily and
130
130
inconsistently as it wants. Mostly critically, data accesses are how data races
131
131
happen. Data accesses are very friendly to the hardware and compiler, but as
132
132
we've seen they offer * awful* semantics to try to write synchronized code with.
133
- Actually, that's too weak. * It is literally impossible to write correct
134
- synchronized code using only data accesses* .
133
+ Actually, that's too weak.
134
+
135
+ ** It is literally impossible to write correct synchronized code using only data
136
+ accesses.**
135
137
136
138
Atomic accesses are how we tell the hardware and compiler that our program is
137
139
multi-threaded. Each atomic access can be marked with an * ordering* that
@@ -141,7 +143,10 @@ they *can't* do. For the compiler, this largely revolves around re-ordering of
141
143
instructions. For the hardware, this largely revolves around how writes are
142
144
propagated to other threads. The set of orderings Rust exposes are:
143
145
144
- * Sequentially Consistent (SeqCst) Release Acquire Relaxed
146
+ * Sequentially Consistent (SeqCst)
147
+ * Release
148
+ * Acquire
149
+ * Relaxed
145
150
146
151
(Note: We explicitly do not expose the C11 * consume* ordering)
147
152
@@ -154,13 +159,13 @@ synchronize"
154
159
155
160
Sequentially Consistent is the most powerful of all, implying the restrictions
156
161
of all other orderings. Intuitively, a sequentially consistent operation
157
- * cannot* be reordered: all accesses on one thread that happen before and after a
158
- SeqCst access * stay* before and after it. A data-race-free program that uses
162
+ cannot be reordered: all accesses on one thread that happen before and after a
163
+ SeqCst access stay before and after it. A data-race-free program that uses
159
164
only sequentially consistent atomics and data accesses has the very nice
160
165
property that there is a single global execution of the program's instructions
161
166
that all threads agree on. This execution is also particularly nice to reason
162
167
about: it's just an interleaving of each thread's individual executions. This
163
- * does not* hold if you start using the weaker atomic orderings.
168
+ does not hold if you start using the weaker atomic orderings.
164
169
165
170
The relative developer-friendliness of sequential consistency doesn't come for
166
171
free. Even on strongly-ordered platforms sequential consistency involves
@@ -170,8 +175,8 @@ In practice, sequential consistency is rarely necessary for program correctness.
170
175
However sequential consistency is definitely the right choice if you're not
171
176
confident about the other memory orders. Having your program run a bit slower
172
177
than it needs to is certainly better than it running incorrectly! It's also
173
- * mechanically* trivial to downgrade atomic operations to have a weaker
174
- consistency later on. Just change ` SeqCst ` to e.g. ` Relaxed ` and you're done! Of
178
+ mechanically trivial to downgrade atomic operations to have a weaker
179
+ consistency later on. Just change ` SeqCst ` to ` Relaxed ` and you're done! Of
175
180
course, proving that this transformation is * correct* is a whole other matter.
176
181
177
182
@@ -183,15 +188,15 @@ Acquire and Release are largely intended to be paired. Their names hint at their
183
188
use case: they're perfectly suited for acquiring and releasing locks, and
184
189
ensuring that critical sections don't overlap.
185
190
186
- Intuitively, an acquire access ensures that every access after it * stays* after
191
+ Intuitively, an acquire access ensures that every access after it stays after
187
192
it. However operations that occur before an acquire are free to be reordered to
188
193
occur after it. Similarly, a release access ensures that every access before it
189
- * stays* before it. However operations that occur after a release are free to be
194
+ stays before it. However operations that occur after a release are free to be
190
195
reordered to occur before it.
191
196
192
197
When thread A releases a location in memory and then thread B subsequently
193
198
acquires * the same* location in memory, causality is established. Every write
194
- that happened * before* A's release will be observed by B * after* its release.
199
+ that happened before A's release will be observed by B after its release.
195
200
However no causality is established with any other threads. Similarly, no
196
201
causality is established if A and B access * different* locations in memory.
197
202
@@ -230,7 +235,7 @@ weakly-ordered platforms.
230
235
# Relaxed
231
236
232
237
Relaxed accesses are the absolute weakest. They can be freely re-ordered and
233
- provide no happens-before relationship. Still, relaxed operations * are* still
238
+ provide no happens-before relationship. Still, relaxed operations are still
234
239
atomic. That is, they don't count as data accesses and any read-modify-write
235
240
operations done to them occur atomically. Relaxed operations are appropriate for
236
241
things that you definitely want to happen, but don't particularly otherwise care
0 commit comments