-
Notifications
You must be signed in to change notification settings - Fork 0
/
dthreadsdiscussion.tex
62 lines (44 loc) · 6.87 KB
/
dthreadsdiscussion.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
\label{sec:discussion} All DMT systems must impose an order on updates to shared memory and synchronization operations. The
mechanism used to isolate updates affects the limitations and performance of the
system. \dthreads{} represents a new point in the design space for DMT systems with some inherent advantages and limitations as follows.
\subsection{Design Tradeoffs}
CoreDet and \dthreads{} both use a combination of
parallel and serial phases to execute programs deterministically. These two systems take different approaches during parallel phases, as
well as the transitions between phases:
\textbf{Memory isolation:}
CoreDet orders updates to the shared memory by
instrumenting all memory accesses that could reference shared data. Synchronization operations and updates to shared memory must be performed in serial phases, unless those updates are performed by owners of a block, which can issued in parallel phases. This approach results in high instrumentation overhead during parallel phases, but incurs no additional overhead when exposing updates to the shared state since they are shared already.
\dthreads{} takes an alternate approach: updates proceed at full speed, but are isolated using hardware-supported virtual memory. When a serial phase is reached, these updates are committed to the shared state in a deterministic order, with
the help of the twinning-and-diffing mechanism described in Section~\ref{sec:twinning-and-diffing}.
A pleasant side-effect of \dthreads{} is the elimination of false sharing. Because threads work in separate address spaces, there is no need to keep caches coherent between threads during parallel phases. For some programs, this results in a performance improvement as large as $7\times$ when compared to
\pthreads{}.
\textbf{Phases:}
CoreDet employs a quantum-based scheduler: after the specified number of instructions is executed in a parallel phase, the scheduler transitions to a serial phase. This approach bounds the waiting time for any thread that are blocked to the quantum, reducing the load imbalance problem. One drawback of this approach is that transitions to a serial phase do not correspond to static program points. Any changes of code and input will result in a new, previously-untested schedule.
Transitions between phases are static in \dthreads{}. Any synchronization operation will result in a transition to a serial phase, and a parallel phase
will resume once all threads have finished their critical sections. This makes \dthreads{} susceptible to delays due to load imbalance between threads, but results in more robust determinism. With \dthreads{}, only the order of synchronization operations affects the schedule. For most programs, this means that different inputs, and even many code changes, will not change the schedule produced by \dthreads{}, as long as those changes won't affect the order of synchronizations.
\subsection{Limitations}
\label{sec:limitations}
This section analyzes some key limitations of \dthreads{} that restrict its ability to run certain programs, limit the extent of determinism it can guarantee, or potentially affect performance.
\textbf{Unsupported programs: }
\dthreads{} supports programs that
use the pthreads library, but does not support programs that
bypass it by using their own ad hoc synchronization operations, such as those that use atomic operations. However, the upcoming C++0X standard includes a library interface for atomic operations~\cite[pp. 1107--1128]{c++0xstandarddraft}, and a future version of \dthreads{} could intercept these library calls and treat them as synchronization points. While ad hoc synchronization is a common practice, it is also a notorious source of bugs; Xiong et al.\ show that 22--67\% of the uses of ad hoc
synchronization lead to bugs or severe performance issues~\cite{ad-hoc-considered-harmful}.
Currently, \dthreads{} also does not share the stack
across threads, so any updates to stack variables are only locally visible, which could cause a program to fail. However, communicating across different threads using stack variables is extremely error-prone and generally deprecated, making this a rare coding practice.
\textbf{External determinism: }
While \dthreads{} provides internal determinism, it does not guarantee determinism when a program's behavior depends on external sources of non-determinism, such as system time or I/O events. Incorporation of \dthreads{} in the dOS framework, an OS proposal that enforces system-level determinism, would provide full deterministic execution, although this remains future work~\cite{deterministic-process-groups}.
\textbf{Runtime performance: }
Section~\ref{sec:evaluation} shows that \dthreads{} can provide high performance for a number of applications; in fact, for the majority of the benchmarks examined, \dthreads{} matches or even exceeds the performance of \pthreads{}. However, \dthreads{} could occasionally degrade performance, sometimes substantially. One way it could do so would be to exhibit an intensive use of locks (that is, acquiring and releasing locks at very high frequency), which are much more expensive
in \dthreads{} than in \pthreads{}. However, because of its determinism guarantees, \dthreads{} could allow programmers to greatly reduce their use of locks, and thus improve performance. Other application characteristics, also explored in Section~\ref{sec:performance}, can also impair performance
with \dthreads{}.
% Since Surprise locking inside libraries. Not a limitation \emph{per
% se} but definitely an issue that could surprise programmers.
% Draft can be downloaded from http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3126.pdf.
%Fine once they are library calls, as they are in gcc and in the upcoming C++0X standard (cite!), since then we can intercept them.
\textbf{Memory consumption: }
Because \dthreads{} creates private, per-process copies of modified pages between commits, it can increase a program's memory footprint by the number of modified pages between synchronization points. This increased footprint does not seem to be a problem in practice, both because the number of modified pages is generally far smaller than the number of pages read, and because it is transitory: all private pages are relinquished to the operating system (via \texttt{madvise()}) at the end of every commit operation.
\textbf{Memory consistency:}
\dthreads{} provides a form of release consistency for parallel programs, where updates are exposed at static program points.
CoreDet's DMP-B mode also uses release consistency, but the update points depend on when the quantum counter reaches zero. To the best of our knowledge, \dthreads{} cannot produce an output that is not possible with \pthreads{}, although for some cases it will result in unexpected outputs. But the same unexpected output will be produced on every run
with \dthreads{}, making it easier for developers to track down the source of the problem than with \pthreads{}.