-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathimplementation.tex
183 lines (142 loc) · 9.11 KB
/
implementation.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
\abschnitt{possible implementation strategies}\label{implementations}
\zs{This proposal does \so{NOT} seek to standardize any particular implementation or
impose any specific calling convention!}
Modern \bfs{micro-processors} are \bfs{register machines}; the content of
processor registers represents the execution context of the program at a given
point in time.
\bfs{Operating systems} maintain for each process all relevant data (execution
context, other hardware registers etc.) in the process table. The operating system's
\bfs{CPU scheduler} periodically suspends and resumes processes in order to
share CPU time between multiple processes. When a process is suspended, its
execution context (processor registers, instruction pointer, stack pointer, ...)
is stored in the associated process table entry. On resumption, the CPU
scheduler loads the execution context into the CPU and the process continues
execution.
The CPU scheduler does a \bfs{full context switch}. Besides preserving
the execution context (complete CPU state), the cache must be invalidated and
the memory map modified.
A kernel-level context switch is several orders of magnitude slower than a
context switch at user-level\cite{Tanenbaum2009}.
\uabschnitt{hypothetical fiber preserving complete CPU state} This strategy tries to
preserve the complete CPU state, e.g. all CPU registers. This requires that the
implementation identifies the concrete micro-processor type and supported processor
features. For instance the x86-architecture has several flavours of extensions
such as MMX, SSE1-4, AVX1-2, AVX-512.
Depending on the detected processor features, implementations of certain
functionality must be switched on or off. The CPU scheduler in the operating system
uses such information for context switching between processes.
A fiber implementation using this strategy requires such a detection mechanism
too (equivalent to swapper/\cpp{system_32()} in the Linux kernel).
Aside from the complexity of such detection mechanisms, preserving the complete
CPU state for each fiber switch is expensive.
\zs{A context switch facility that preserves the complete CPU state like an
operating system is possible but impractical for user-land.}
\uabschnitt{fiber switch using the calling convention}\label{callingconvention}
For \fiber, not all registers need be preserved because the context
switch is effected by a visible function call. It need not be completely transparent like
an operating-system context switch; it only needs to be as transparent as a call
to any other function. The calling convention -- the part of the ABI that
specifies how a function's arguments and return values are passed -- determines
which subset of micro-processor registers must be preserved by the called
subroutine.
The \bfs{calling convention}\cite{SYSVABI} of \bfs{SYSV ABI} for \bfs{x86\_64}
architecture determines that general purpose registers R12, R13, R14, R15, RBX
and RBP must be preserved by the sub-routine - the first arguments are passed
to functions via RDI, RSI, RDX, RCX, R8 and R9 and return values are stored in
RAX, RDX.
So on that platform, the \resume implementation preserves the \bfs{general
purpose registers} (R12-R15, RBX and RBP) specified by the calling convention.
In addition, the \bfs{stack pointer} and \bfs{instruction pointer} are
preserved and exchanged too -- thus, from the point of view of calling
code, \resume behaves like an ordinary function call.
In other words, \resume acts on the level of a simple function invocation --
with the same performance characteristics (in terms of CPU cycles).
This technique is used in \bcontext\cite{bcontext} which acts as building block
for (e.g.) \fbfibers\xspace and \bbquantum; see section \nameref{low_level}.
\uabschnitt{in-place substitution at compile time} During code generation,
a compiler-based implementation could inject the assembler code responsible
for the fiber switch directly into each function that calls \resume. That would save
an extra indirection (JMP + PUSH/MOV of certain registers used to
invoke \resume).
\uabschnitt{CPU state on the stack} Because each fiber must preserve CPU
registers at suspension and load those registers at resumption, some storage
is required.
Instead of allocating extra memory for each fiber, an implementation can use
the stack by simply advancing the stack pointer at suspension and pushing the
CPU registers (CPU state) onto the stack owned by the suspending fiber. When
the fiber is resumed, the values are popped from the stack and loaded into the
appropriate registers.
This strategy works because only a running fiber creates new stack frames
(moving the stack pointer). While a fiber is suspended, it is safe to keep the
CPU state on its stack.
Using the stack as storage for the CPU state has the additional advantage
that \fiber need not itself contain the stored CPU state: it need only contain
a pointer to the stack location.
Section \nameref{synthesizing} describes how global variables are avoided
by synthesizing a \fiber from the active fiber (execution context) and passing
this synthesized \fiber (representing the now-suspended fiber) into the resumed
fiber. Using the stack as storage makes this mechanism very easy to
implement.\footnote{The implementation of \bcontext\cite{bcontext} utilizes this
technique.}
Inside \resume the code pushes the relevant CPU registers onto the stack, and
from the resulting stack address creates a new \fiber. This instance is then
passed (or returned) into the resumed fiber (see \nameref{synthesizing}).
\zs{Using the active fiber's stack as storage for the CPU state is efficient because no
additional allocations or deallocations are required.}
\abschnitt{\exfns}\label{exc_scope}
Both \exfns should report exceptions solely on the current fiber.
Reporting exceptions thrown on any other fiber would make them
unreliable in practice.
A straightforward implementation could make \allresume save and restore the
data underlying \exfns as part of saving and restoring the rest of the fiber
state. Since \exfns data is necessarily thread-local, the likely cost would be
a TLS access on every \anyresume call.
Alternatively, \fiber's constructor could update an internal associative
container whose key is the high end of the new fiber stack area. \exfns could
call \cpp{upper\_bound()}, passing the current stack pointer, to discover
which stack is current. This would shift the cost from every context switch
to \exfns calls.
As of 2023, though, the zeitgeist appears to be that \exfns are not sufficiently
important to warrant altering exception-handling implementations to fix their
behavior. This is why, in the \nameref{api} section, it is
implementation-defined whether \exfns are specific to the current fiber
or the current \thread.
The following examples have been floated to illustrate problems that can arise
when \exfns are not local to the current fiber.
In these small examples, the problematic code is obvious. But the power of
fibers is that a function need not know whether some function it calls (or
some indirect callee thereof) will resume another fiber. It's not practical
simply to forbid coders from switching fibers within a catch block.
\uabschnitt{Premature destruction of exception object}
In \stdclause{except.throw} paragraph 4, the destruction of an exception
object is specified to potentially occur when an active handler for the
exception exits, not when a handler exits while the exception is still the
currently handled exception. It is possible to observe, with the Boost
implementation in an Itanium C++ ABI environment, cases where an exception is
destroyed at a different point than specified (and, in particular, when a
handler for the exception is still active in a fiber).
\cppf{early_exc_destroy}
\uabschnitt{Throw-expression with no operand}
Both \stdclause{expr.throw} paragraph 3 and \cpp{current\_exception()}
(\stdclause{propagation} paragraph 8) reference the ``currently handled
exception'' (\stdclause{except.handle} paragraph 8). Thus, the
construct \cpp{throw;} is by definition equivalent to
\cpp{std::rethrow\_exception(std::current\_exception());}
(\stdclause{propagation} paragraph 9).
The existing definition of currently handled exception:
``The exception with the most recently activated handler that is still active
is called the \emph{currently handled exception.}''
does not clearly constrain the scope to the current thread. This constraint
must be inferred from \stdclause{except.throw} paragraph 2:
``When an exception is thrown, control is transferred to the nearest handler
with a matching type \xref{except.handle}; ``nearest'' means the handler for
which the \nt{stmt.block}{compound-statement} or
\nt{class.base.init}{ctor-initializer} following the \cpp{try} keyword was
most recently entered by the thread of control and not yet exited.''
In any case, if ``currently handled exception'' means the exception with the
most recently activated handler within any fiber on the current thread, we can
get the following result.
\cppfl{nullary_throw}
Worse still, the exceptions in question aren't necessarily related to each
other, and line 36 is more likely to read \cpp{catch (const Bad& caught)} --
in which case the \cpp{throw;} on line 34 would \emph{not} be caught.